Ball trajectory tracking in sports broadcast videos using Amazon Web Services machine learning

by Wei Teh and Mecit Gungor | on

In the past few years, professional sports has applied machine learning (ML) in ways that revolutionize game analysis, accelerate innovation, and improve fan experiences. For instance, NFL Next Gen Stats uses machine learning and data analytics to boost accuracy, speed, and insights in real time to enhance viewing experiences. NHL Face-off Probability uses ML-driven data to generate predictions about which team will win an upcoming face-off based on the players on the ice, location of the face-off, and current game situation.

When it comes to ball sports, ball trajectory analysis is one of the most useful techniques for evaluating players’ performance and enhancing analysis of game strategies. However, it is challenging to identify ball positions, especially those that involve tiny balls moving at high-speed in sports like tennis, badminton, and baseball where ball images are relatively small, blurry, and sometimes have afterimage tracks or are even invisible. Deep learning applied to computer vision has shown promising results in tracking tiny objects in sports broadcast video. A solution inspired by a deep learning architecture called TrackNet achieved highly accurate predictions for badminton ball positions within a broadcast badminton match.

In this blog post, we describe an end-to-end, deep-learning based solution that enables ball tracking for broadcast sports video without using expensive camera equipment. We focus on using managed services from Amazon Web Services (Amazon Web Services) to develop the solution. The solution includes a machine learning workflow that begins with the application of labels to broadcast video, followed by data transformation performed on annotated labels, feeding data to model training, and deployment of the model to an endpoint that allows an application to create ball trajectory video from provided inputs.

We provide a Github repository that demonstrates the solution so you can try it in your own Amazon Web Services account, or adopt it for your own use case.

 Architecture

Figure 1:End to-End Architecture Diagram

Figure 1:End to-End Architecture Diagram

The solution architecture depicts the key components involved in a ball trajectory tracking ML workflow, orchestrated in the particular order shown previously. The steps can be broken into the following:

  1. User uploads sports broadcast video files to an Amazon S3 bucket
  2. Perform labeling tasks for the video frames
  3. Perform data transformation to process annotated videos frames
  4. Use the features from the previous step to train a deep learning model
  5. Deploy the trained model to a real time HTTPS endpoint
  6. Produce ball trajectory video through the deployed endpoint

Labeling video frames

TrackNet is a supervised machine learning model that uses deep learning in computer vision. The model requires the position of a ball in a 2D plane as labels in order to learn the ball trajectory accurately. To achieve a high-quality training dataset, labels are applied at the frame level. There could be a large volume of data to label depending on the duration and frame rate of the video. To simplify the labeling process, we created an Amazon SageMaker Ground Truth video labeling job that allows a private workforce to annotate ball positions within the video. If you have additional labeling requirements, Amazon SageMaker Ground Truth Plus provides an expert workforce that is trained on ML tasks and can help meet your data security, privacy, and compliance requirements. Following is a screenshot of a user interface provided by SageMaker Ground Truth video labeling job.

Figure 2: SageMaker Ground Truth Labeling Portal

Figure 2: SageMaker Ground Truth Labeling Portal

Feature engineering

As described in this research paper , TrackNet is designed to train on consecutive video frames to achieve better object detection accuracy by learning the trajectory pattern. The number of input frames is a network parameter that can be tuned based on the speed of the ball in the video. The Ground Truth label for training is an amplified 2D heatmap located at the center of the ball position. We used an Amazon SageMaker Processing job to process the SageMaker Ground Truth labeled dataset in Amazon S3 by combining consecutive frames as input, and generate a corresponding 2D Gaussian heatmap as the ground truth label as part of the feature engineering step. The following diagram depicts the feature engineering workflow in more detail.

Figure 3: Processing Job Flow

Figure 3: Processing Job Flow

Once the SageMaker Processing job is complete, both feature and label datasets are uploaded to the given Amazon S3 locations available for training.

Model training

At a high level, TrackNet combines 2 deep neural networks to form a model architecture. The goal of the design is for the first network (VGG-16) to learn the compressed features from the consecutive input frames, then feed those to the second network (DeconvNet) for up-sampling to reconstruct a final image. The following diagram illustrates the network architecture in greater detail. Given the model is implemented in Tensorflow 2, we could train a TrackNet model using a SageMaker Training job without building any custom containers. This documentation provides guidance on the SageMaker supported pre-built deep learning docker images including PyTorch, Hugging Face, and others. To train the model, we simply passed a training script as one of the job parameters via SageMaker SDK. SageMaker takes care of executing the training script in the training environment.

Figure 4: TrackNet Architecture

Figure 4: TrackNet Architecture

Deploying the ball tracking model

The trained model artifact is stored in the specified Amazon S3 location after the Amazon SageMaker Training job completes. At this point, there are a few approaches to integrate your application with the trained model. First, you could host an HTTPS endpoint that provides inferences that identify ball positions based on the input frames. Another approach is to download the model artifact from Amazon S3 location locally, then build an application that couples the specified deep learning libraries along with the associated dependencies to process the input frames in an offline environment. The first pattern is the preferred option because it decouples the application code from the machine learning model lifecycle, and allows inference requests to scale independently. SageMaker inference supports various use case scenarios, including low latency (a few milliseconds) and high throughput (hundreds of thousands of requests per second) to long-running inference. For our use case, we deploy a real time SageMaker endpoint with GPU support that allows an application to create ball trajectory video. Generally, deep learning applications could benefit from GPUs that enable multiple, simultaneous computations. We observed a 5x lower latency with a GPU endpoint over a CPU to perform the same task.

Putting it all together

With the availability of a real-time inference endpoint, we can leverage the TrackNet model to begin ball trajectory tracking for given videos. A working sample is provided in the Github repository to illustrate endpoint integration with the input video. But before we run anything, let’s review the components involved in with the model endpoint, and how it interprets the prediction to create an accurate ball position. As mentioned in the previous section, TrackNet takes 3 consecutive images from the video frames as input to predict a pixel representation of an image. Similarly, the application combines consecutive video frames into an input to invoke the endpoint. The predicted images that did not meet the probability threshold (0.5) are excluded from further processing. For images where a ball is detected, a circle is created around the center of the ball, then overlaid onto the original video frame to mark the ball position. This process is repeated until all the frames in the video are analyzed, with a new video file created in the last step of the application. Following is an example video produced by the application for a single badminton match video.

Gif image of ball tracking

Conclusion

The sports industry is using machine learning to predict play outcomes, game strategy decision making, and to gain competitive advantage. In this blog post, we showed an end-to-end machine learning workflow that provides ball trajectory tracking in a ball sport using Amazon SageMaker features. The solution uses a deep-learning model to achieve ball tracking in broadcast sports video with high accuracy and without using expensive camera equipment. We start with a video labeling workflow using an Amazon SageMaker Ground Truth labeling job, then use a SageMaker Processing job for feature engineering. For building a deep learning model, we described how to fire a SageMaker Training job to train a model at scale. Finally, we described an application that integrates with an inference endpoint to produce ball tracking video through Amazon SageMaker Inference hosting service.