 
CVPR 2010 Paper: Player Localization Using Multiple Static Cameras for Sports Visualization Authors: Raffay Hamid, Ram Krishan Kumar, Matthias Grundmann, Kihwan Kim, Irfan Essa, Jessica Hodgins
NEWS!: We recently (Aug. 2010) collected a new soccer data set at ESPN Wide World of Sports. We used scissor lifts with adjustable heights to mount 3 synchronized 720PHD cameras for covering one half of the field. We collected two games with camera heights of 60 feet. One of these games was recorded at night under flood lights. We captured the third game with cameras' height of about 25 feet. We tested our framework on 22,000 frames of this data (this is besides the results on 60,000 frames of PIXAR soccer data given in our CVPR paper). Results on the new data can be found in our journal paper (in preparation).
1. Introduction:
Visualizing multiplayer sports has grown
into a multimillion dollar industry. However, inferring state of One of the main technical challenge for sports visualization systems is to infer accurate player positions in the face of occlusion and visual clutter. One solution to this end is to use multiple overlapping cameras, provided the observations from these cameras could be fused reliably. Our work explores this question of efficient and robust fusion of visual data observed from multiple synchronized cameras, and apply this information for generating sports visualizations. These include displaying a virtual offside line in soccer games, highlighting players in passive offside positions, and showing players' motion patterns.
Our key contribution is the modeling and
analysis for the problem of fusing corresponding players' positional information
as finding minimum weight Klength cycles in complete K partite graphs. The
algorithmclass we propose to this end uses a dynamic programming based
approach, that varies
2. Framework Overview: Following are the main steps we have in our sports visualization framework:
2a. Background Subtraction We begin by adaptively learning perpixel Gaussian mixture models for scene background. These models are used for foreground extraction by theresholding appearance likelihoods of scene pixels. The input and output to this step are given in the following figure. Note that the while this step allows player pixels quite successfully, it also extracts the shadow pixels as a part of the foreground. Such shadow pixels can be problematic for player tracking, and therefore need to be removed.
2b. Shadow Removal While there are numerous appearance based methods for shadow removal [21], they mostly work best for relatively soft shadows. In soccer games however, shadows can be quite strong. We therefore rely on geometric constraints of our multicamera setup for robust shadow removal. Consider the following figure, where only shadow pixels of the player are view independent. This enables us to remove shadows by warping extracted foreground in one view onto another, and filtering out the overlapping pixels. We begin by finding 3X3 planner homographies between each pair of views, such that for any point in one view, we know a distinct mapping for it in the second view.
In cases where a player is partially occluded by a
shadow, simply relying on these geometric constraints might result in losing
image regions belonging to occluded parts of players. To avoid this, we apply
chromatic similarity constraints of original and projected pixels before
classifying them as shadow versus nonshadow. The intuition here is that the
appearance similarity of shadow pixels across multiple
The input and output of the shadow removal step are shown in the following figure. Notice that some parts of the player are also removed while removing the shadows, however by and large this method of shadow removal performs quite well.
2c. Player Tracking We track the player blobs using a particle filter based framework. We represent the state of each player using a multimodal distribution, which is sampled by a set of particles. To propagate the previous particle set to the next, we perform the threestep procedure of Selection, Prediction and Measurement. Here Selection implies the step of importance sampling of a set of particles from the previous step based on how well they fit the measurement for the last frame. Prediction implies the application of a dynamic model on the selected particles. Finally, measurement relates to ranking the particles in terms of how well they match the measurement from the current frame. These three steps are repeated for each of the frame in the video. This entire process is illustrated in the following figure.
2d. View Dependent Blob Classification
We classify the tracked blobs on a perframe and
perview basis. We precompute the hue and saturation histograms of a few (~5)
playertemplates of both teams as observed from each view. During testing, we
compute this hue and saturation histograms for the detected blobs, and find
their Bhattacharyya distances from the playertemplates of the corresponding
view. We classify each blob into offense or defense teams based on the label of
their nearest neighbor templates. The pipeline of blobclassification for one
The output of the tracking and player classification on an example frame is shown in the following figure.
2e. Data Fusion for Player Classification
To transform players’ location observed from
multiple cameras into a shared space, we project the basepoint of all blobs
observed from each camera into realworld coordinates of the field. We pose
fusing location evidence of players observed from multiple cameras as
iteratively finding minimum weight Klength cycles in a complete Kpartite
graph. Nodes in each partite of this graph represent blobs of detected players
in different cameras. The edgeweights in this graph are a function of pairwise
similarity
Specifically, we can state our problem as given a complete Kpartite graph G with K tiers, we want to find the minimum weight cycle c in G, such that c passes through each tier in K once and only once. A complete Kpartite graph and a nodecycle are shown in the following left more and right most figures respectively. We iteratively find and remove Klength minimum weight cycles from G until there remain no more cycles in in the graph.
Note that as our problem is cyclic in nature, the
edges we find must start and end at the same node. Note that while using
traditional dynamic programming, there is no guarantee that the shortest path
returned by the algorithm would necessarily end at the same node as the source
node. We therefore need to modify our graph representation such that we could
satisfy the cyclic constraint of our problem, while still using a Assume the size of all nodes V in G is n. For each node v in V , we can construct a subgraph G_{v} with K + 1 tiers, such that the only node in the 1^{st} and the (K + 1)^{st} tier of G_{v} is v. Besides the 1^{st} and the (K + 1)^{st} tiers of G_{v}, its topology is the same as that of G. This is illustrated in the 2^{nd} figure above. Note that the shortest cycle in G involving node v is equivalent to the shortest path in G_{v} that has v as its source and destination. Our problem can now be restated as given G, construct G_{v} for all v in V . Find shortest K length paths P = {p_{v} in G_{v} for all v in V} that span each tier in G_{v} once and only once. Find shortest cycle in G by searching for shortest path in P. There is an inherent tradeoff between efficiency and optimality of this search problem, which is analyzed in detail in the paper.
3. MultiPlayer Sports Visualization We use our framework to generate various automatic sports visualization, three of which are enlisted below.
3a. Offside Line Visualization An important foul in soccer is the offside call, where an offense player receives the ball while being behind the second last defense player (SLD). We want to detect the SLD player, and to draw an offside line underneath him/her. To test the robustness of our proposed system, we ran it on approximately 60,000 frames of soccer footage captured over 5 different illumination conditions, play types, and teams’ attire.
We compared the performance of our proposed system with that of finding the SLD player in each camera individually, and with naively fusing this information by taking their average. Our proposed fusion mechanism out performs the rest with an average accuracy of 92.6%. The naive fusion produces an average accuracy of 75.7%. The average accuracy across all 3 individual cameras over all 5 sets is 82.7%. To the best of our knowledge, this is the most thorough test of automatic offsideline visualization for soccer games available.
3b. Passive Offside Visualization
Offence players can be in an offside state either
actively (get directly involved in the play while being behind the SLD), or
passively (be present behind the SLD and not get directly involved in the play).
Fig. 10 shows an example illustrating the offense player in passive offside
state automatically highlighted using our proposed framework. Visualizations
such as these can be used in assisting viewers
3c. Passive Offside Visualization
Visual broadcast of soccer games only shows an
instantaneous representation of the sport, where no visual record of what
happened over some preceding time is usually maintained. There are two important
challenges in having a lapsed representation of a game. Firstly, automatic
detection of players’ actions is hard. And secondly, summarizing these actions
in an informative manner is nonobvious. To this end, we consider players’
movement as a basic representation of the state of a game, and use our framework
to
4. Conclusions and Future Work
We have presented a novel modeling and search
method for fusing evidence from multiple information sources as iteratively
finding minimum weight Klength cycles in complete Kpartite graphs. As an
application of the proposed algorithmclass, we have presented a framework for
soccer player localization using multiple synchronized static cameras. We have
used this fused information to generate various In the future we want to apply our algorithmclass for a wider set of correspondence finding problems, including matching for depth estimation, trajectory matching using multiple cameras, and motion capture reconstruction. Furthermore, we want to use our visualization framework for a variety of sports, including rugby, hockey, and baseball.


HOME ● RESEARCH : SPORTS VISUALIZATION  
Copyright © 2010 Raffay Hamid. All rights reserved. 