ACTS Blog Selection
We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
Ball trajectory tracking in sports broadcast videos using Amazon Web Services machine learning
In the past few years, professional sports has applied machine learning (ML) in ways that revolutionize game analysis, accelerate innovation, and improve fan experiences. For instance,
When it comes to ball sports, ball trajectory analysis is one of the most useful techniques for evaluating players’ performance and enhancing analysis of game strategies. However, it is challenging to identify ball positions, especially those that involve tiny balls moving at high-speed in sports like tennis, badminton, and baseball where ball images are relatively small, blurry, and sometimes have afterimage tracks or are even invisible. Deep learning applied to computer vision has shown promising results in tracking tiny objects in sports broadcast video. A solution inspired by a deep learning architecture called
In this blog post, we describe an end-to-end, deep-learning based solution that enables ball tracking for broadcast sports video without using expensive camera equipment. We focus on using managed services from Amazon Web Services (Amazon Web Services) to develop the solution. The solution includes a machine learning workflow that begins with the application of labels to broadcast video, followed by data transformation performed on annotated labels, feeding data to model training, and deployment of the model to an endpoint that allows an application to create ball trajectory video from provided inputs.
We provide a
Architecture
The solution architecture depicts the key components involved in a ball trajectory tracking ML workflow, orchestrated in the particular order shown previously. The steps can be broken into the following:
- User uploads sports broadcast video files to an
Amazon S3 bucket - Perform labeling tasks for the video frames
- Perform data transformation to process annotated videos frames
- Use the features from the previous step to train a deep learning model
- Deploy the trained model to a real time HTTPS endpoint
- Produce ball trajectory video through the deployed endpoint
Labeling video frames
TrackNet is a supervised machine learning model that uses deep learning in computer vision. The model requires the position of a ball in a 2D plane as labels in order to learn the ball trajectory accurately. To achieve a high-quality training dataset, labels are applied at the frame level. There could be a large volume of data to label depending on the duration and frame rate of the video. To simplify the labeling process, we created an
Feature engineering
As described in this
Once the SageMaker Processing job is complete, both feature and label datasets are uploaded to the given Amazon S3 locations available for training.
Model training
At a high level, TrackNet combines 2 deep neural networks to form a model architecture. The goal of the design is for the first network (VGG-16) to learn the compressed features from the consecutive input frames, then feed those to the second network (DeconvNet) for up-sampling to reconstruct a final image. The following diagram illustrates the network architecture in greater detail. Given the model is implemented in Tensorflow 2, we could train a TrackNet model using a SageMaker Training job without building any custom containers. This
Deploying the ball tracking model
The trained model artifact is stored in the specified Amazon S3 location after the Amazon SageMaker Training job completes. At this point, there are a few approaches to integrate your application with the trained model. First, you could host an HTTPS endpoint that provides inferences that identify ball positions based on the input frames. Another approach is to download the model artifact from Amazon S3 location locally, then build an application that couples the specified deep learning libraries along with the associated dependencies to process the input frames in an offline environment. The first pattern is the preferred option because it decouples the application code from the machine learning model lifecycle, and allows inference requests to scale independently. SageMaker inference supports various use case scenarios, including low latency (a few milliseconds) and high throughput (hundreds of thousands of requests per second) to long-running inference. For our use case, we deploy a real time SageMaker endpoint with GPU support that allows an application to create ball trajectory video. Generally, deep learning applications could benefit from GPUs that enable multiple, simultaneous computations. We observed a 5x lower latency with a GPU endpoint over a CPU to perform the same task.
Putting it all together
With the availability of a real-time inference endpoint, we can leverage the TrackNet model to begin ball trajectory tracking for given videos. A working sample is provided in the
Conclusion
The sports industry is using machine learning to predict play outcomes, game strategy decision making, and to gain competitive advantage. In this blog post, we showed an end-to-end machine learning workflow that provides ball trajectory tracking in a ball sport using Amazon SageMaker features. The solution uses a deep-learning model to achieve ball tracking in broadcast sports video with high accuracy and without using expensive camera equipment. We start with a video labeling workflow using an Amazon SageMaker Ground Truth labeling job, then use a SageMaker Processing job for feature engineering. For building a deep learning model, we described how to fire a SageMaker Training job to train a model at scale. Finally, we described an application that integrates with an inference endpoint to produce ball tracking video through Amazon SageMaker Inference hosting service.