Raffay Hamid

HOME     RESEARCH     PUBLICATIONS     RESUME


My research is about building computational systems that can perceive, learn, and predict what is happening around them. Over the last few years, I have been focusing on approximate (particularly randomized) algorithms for large-scale optimization problems in Vision and Learning.

Following are some of the specific topics I have explored so far.

 


Heterogeneous Domain Adaptation for Satellite Images

The growing availability of high resolution (< 1m / pixel) satellite and aerial imagery has opened up unprecedented opportunities to monitor and analyze the evolution of land-cover and land-use across the world. To do so, multi-sensor (heterogeneous) images of the same geographical areas must be efficiently parsed to update maps and detect land-cover changes. However, a naive transfer of ground truth labels from one location in the source image to the corresponding location in the target image is not feasible, as these images are generally only loosely registered (with up to ± 50m of non-uniform errors). Furthermore, land-cover changes in an area over time must be taken into account. To tackle these challenges, we propose a mid-level sensor-invariant representation that encodes image regions in terms of the spatial distribution of their spectral neighbors. We incorporate this representation in a Markov Random Field to simultaneously account for unevenly spaced mis-registrations and to enforce locality priors to find matches between multi-sensor images. We show how our approach can be used to assist in several domain adaptation problems involving land-cover segmentation as well as change detection.

[PDF]

   
Hardware Compliant Approximate Image Codes

In recent years, several feature encoding schemes for the bags-of-visual-words model have been proposed. While most of these schemes produce impressive results, they all share an important limitation: their high computational complexity makes it challenging to use them for largescale problems. In this work, we propose an approximate locality-constrained encoding scheme that offers significantly better computational efficiency (~40×) than its exact counterpart, with comparable classification accuracy. Using the perturbation analysis of least-squares problems, we present a formal approximation error analysis of our approach, which helps distill the intuition behind the robustness of our method. We present a thorough set of empirical analyses on multiple standard data-sets, to assess the capability of our encoding scheme for its representational and discriminative accuracy.

[PDF]

   
   
Large-Scale Damage Detection Using Satellite Imagery

We present a semi-supervised learning framework for large-scale damage detection in satellite imagery. We present a comparative evaluation of our framework using ~88 million images collected from 4,665 KM2 from 12 different locations around the world. To enable accurate and efficient damage detection, we introduce a novel use of hierarchical shape features in the bags-ofvisual words setting. We analyze how practical factors such as sun, sensor-resolution, satellite-angle, and registration differences impact the effectiveness our proposed representation, and compare it to five alternative features in multiple learning settings. We demonstrate through a user-study that our semi-supervised framework results in a ten-fold reduction in human annotation time at a minimal loss in detection accuracy.

[PDF]

   
   

Background Subtraction for Cellphone Videos

We identify a novel instance of the background subtraction problem that focuses on extracting near-field foreground objects captured using handheld cameras. Given two user-generated videos of a scene, one with and the other without the foreground object(s), our goal is to efficiently generate an output video with only the foreground object(s)
present in it. We cast this challenge as a spatio-temporal frame matching problem, and propose an efficient solution for it that exploits the temporal smoothness of the video sequences. We present theoretical analyses for the error bounds of our approach, and validate our findings using a detailed set of simulation experiments. Finally, we present the results of our approach tested on multiple real videos
captured using handheld cameras, and compare them to several alternate foreground extraction approaches.

[PDF] [YouTube]

   
   
Compact Random Feature Maps

Previous approaches for polynomial kernel approximation create maps that can be rank deficient, and therefore do not utilize the capacity of the projected feature space effectively. To address this challenge, we propose compact random feature maps (CRAFTMaps) to approximate polynomial kernels more concisely and accurately. We prove the error bounds of CRAFTMaps demonstrating their superior reconstruction performance compared to previous approximation schemes. We show how structured random matrices can be used to efficiently generate CRAFTMaps, and present a single-pass algorithm using CRAFTMaps to learn non-linear multi-class classifiers. We present experiments on multiple standard data-sets with performance competitive with state-of-the-art results.

[PDF]

 
   
What Makes an Image Popular?

Millions of photographs are uploaded online every minute. While some of these images get lots of views, others are completely ignored. This begs the question: What makes a photograph popular? Can we predict the number of views a photograph will receive even before it is uploaded? In this work we investigate two key components that affect image popularity, namely its content and its social context. Using a dataset of ~2.3 million images from Flickr, we demonstrate that we can reliably predict the normalized view count of images with a rank correlation of 0.81 using both image content and social cues. We show the importance of image cues such as color, gradients, deep learning features and the set of objects present, as well as the importance of various social cues such as number of friends or number of photos uploaded that lead to high or low popularity of images.

[PDF]

   
   
   
Large Scale Video Summarization using Image Priors

Given the enormous growth in user-generated videos, it is becoming increasingly important to be able to navigate them efficiently. As these videos are generally of poor quality, summarization methods designed for well-produced videos do not generalize to them. To address this challenge, we propose to use web-images as a prior to facilitate summarization of user-generated videos. Our main intuition is that people tend to take pictures of objects to capture them in a maximally informative way. Such images could therefore be used as prior information to summarize videos containing a similar set of objects.

[PDF]

[Project Page]

   
   
   
Non-Rigid Point Matching using Random Projections

We present a robust and efficient technique for matching dense sets of points undergoing non-rigid spatial transformations. Our main intuition is that the subset of points that can be matched with high confidence should be used to guide the matching procedure for the rest. We propose a novel algorithm that incorporates these high-confidence
matches as a spatial prior to learn a discriminative subspace that simultaneously encodes both the feature similarity as well as their spatial arrangement. Furthermore, we propose the use of random projections for approximate subspace learning, which can provide significant time improvements over conventional approaches.

[PDF]

   
   
   
Palette Power: Enabling Visual Search through Colors

With the explosion of mobile devices with cameras online search has moved beyond text to other modalities like images, voice, and writing. For many applications like Fashion, image-based search offers a compelling interface as compared to text forms by better capturing the visual attributes. In this paper we present a simple and fast search algorithm that uses color as the main feature for building visual search. we show that low level cues such as color can be used to quantify image similarity and also to discriminate among products with different visual appearance. We demonstrate the effectiveness of our approach through a mobile shopping application. Our approach outperforms several other state-of-the-art image retrieval algorithms for large scale image data.

[PDF]

NOTE: This work is being commercially used in eBay Fashion App.

   
   
   
Multi-Camera Player Tracking for Sports Visualization

Visualizing multi-player sports has grown into a multi-million dollar industry. However, inferring state of a multi-player game is still an open challenge. This is specially true when the context of the game changes in a dynamic and continuous manner. Examples of such sports include soccer, field hockey, and basketball. Our work is geared towards automatic visualization of this particular subset of sports.

[PDF]

[Project Page]

   
   
   
Feature Sharing to Recognize Actions at a Distance

We present a boosting-based algorithm for sharing features among different human actions to efficiently learn their discriminative models. We propose a novel feature sharing mechanism that maintains a lower bound on the number of features shared by each class, and only considers classes that do not meet this criterion. We test our algorithm for the problem of monocular and multi-view action recognition problem.

[Project Page]

   
   
   
Learning Everyday Activity Structure Using Event Statistics

Models for activity structure for unconstrained environments are generally not available a priori. Recent representational approaches to this end are limited by their computational complexity, and ability to capture activity structure only up to some fixed temporal scale. In this work, we propose the usage of Suffix Trees as an activity representation to efficiently extract structure of activities by analyzing their constituent event-subsequences over multiple temporal scales.

[PDF]

   
   
   
Unsupervised Activity Analysis Using Event Motifs

For an active environment, how can one transform semantically agnostic low-level perceptual inputs, into some mid-level abstractions that sufficiently encode the activity structure? How can one represent such activity structure over a continuum of temporal resolutions? Finally, how can one automatically detect event subsequences that are locally atypical in a structural sense? In this work, we investigate these questions in the context of understanding everyday activities.

[PDF]

   
   
   
Activity Discovery & Characterization From Event Streams

A key step towards understanding what is happening in an active setting, is to discover the various kinds of frequently occurring similar activities in that domain. Equally important is the question of finding efficient characterizations for these different kinds of activities. In this work we tackle the question of activity class discovery and characterization, in the backdrop of analyzing everyday activities.

[PDF]

   
   
   
Activity Representation Using Event n-grams

Anomalies are sets of rare events which, for any reasonably unconstrained situation, are hard to completely define as a prior. For the reasons of rarity and large within-class variation of anomalies, techniques which try to model them, either statistically or through a set of rules, often prove to be brittle and over-fitted. We formulate the problem of anomalous activity explanation using a novel representation of activities as bags of n-grams of discrete events.

[PDF]

   
   
   
Graphical Models for Human Activity Recognition

A novel framework for recognizing complex multi-agent activities using probabilistic graphical models is presented. We employ statistical feature based particle filter to robustly track multiple objects in cluttered environments. Spatio-temporal features extracted from tracking are thereon used to learn graphical models for modeling these activities.

[PDF]

   
   
   
Classifier Adaptation For Person Detection

Due to the large variation in the physical attributes of different environments, a generic classifier trained on extensive data-sets my still perform sub-optimally in a new test environment. In this work we present a general framework for classifier adaptation that allows an already trained generic classifier to perform better in new test environments. The work was done at Microsoft Research.

[PDF]

NOTE: This research got commercially used in Microsoft RoundTable

   
   
   
Ensemble Boosting For Activity Recognition

The weighted Ensemble Boosting method combines Bayesian Averaging strategy coupled with Boosting framework, finding useful conjunctive features-combinations and achieving lower error rates than traditional Boosting algorithm. The method demonstrates a comparable level of stability with respect to the classifier selection pool. We compare its performance with different classifier combination methods, including Approximate Bayesian Combination, Boosting, Feature Stacking and the more traditional Sum and Product rules. The work was done at Mitsubishi Electronic Research Lab.

[PDF]

   
   
   
Context Aware Applications by Activity Demonstration

Context-aware applications take their context of use into account by adapting to changes in a user's activities and environments. No one has more intimate knowledge about these activities and environments than end-users themselves. Currently there is no support for end-users to build context-aware applications for these dynamic settings. To address this issue, we present a programming by demonstration context aware prototyping environment. The work was done for Intel Research Lab Berkeley.

[PDF]

   
   
   
A Variational Approach to Audio-Visual Flow Estimation

The flow field of a moving sound source not only has an optical component, but also an audio component; something we call audio-visual flow. We present a common structure-tensor based variational framework for dense audio-visual flow-field estimation.

[PDF]

   
   
   
Automatic Automobile Occupancy Detection

Decision Tree based Object Classifiers for automatic automobile detection system. The project was a collaborative effort between General Motors, & Techlogix Inc.  The project resulted in a US patent and a publication.

[PDF]

   
   
   
Mobile ADVICE: Design of an Accessible Mobile Device

The visually impaired have limited access to the world of mobile devices. Our goal was to design a handheld mobile device to overcome limitations such as reliance on visual display and lack of audio and tactile feedback. We built a prototype handheld device using tactile feedback and auditory display information.

[PDF]

   
   
   
   
  HOME     RESEARCH     PUBLICATIONS     RESUME

Copyright © 2010 Raffay Hamid. All rights reserved.