Implement a multi-object tracking solution on a custom dataset with Amazon SageMaker

by Gordon Wang, Yanwei Cui, Melanie Li, and Guang Yang | on

The demand for multi-object tracking (MOT) in video analysis has increased significantly in many industries, such as live sports, manufacturing, and traffic monitoring. For example, in live sports, MOT can track soccer players in real time to analyze physical performance such as real-time speed and moving distance.

Since its introduction in 2021, ByteTrack remains to be one of best performing methods on various benchmark datasets, among the latest model developments in MOT application. In ByteTrack, the author proposed a simple, effective, and generic data association method (referred to as BYTE) for detection box and tracklet matching. Rather than only keep the high score detection boxes, it also keeps the low score detection boxes, which can help recover unmatched tracklets with these low score detection boxes when occlusion, motion blur, or size changing occurs. The BYTE association strategy can also be used in other Re-ID based trackers, such as FairMOT . The experiments showed improvements compared to the vanilla tracker algorithms. For example, FairMOT achieved an improvement of 1.3% on MOTA ( FairMOT: On the Fairness of Detection and Re-Identification in Multiple Object Tracking ), which is one of the main metrics in the MOT task when applying BYTE in data association.

In the post Train and deploy a FairMOT model with Amazon SageMaker , we demonstrated how to train and deploy a FairMOT model with Amazon SageMaker on the MOT challenge datasets . When applying a MOT solution in real-world cases, you need to train or fine-tune a MOT model on a custom dataset. With Amazon SageMaker Ground Truth , you can effectively create labels on your own video dataset.

Following on the previous post, we have added the following contributions and modifications:

  • Generate labels for a custom video dataset using Ground Truth
  • Preprocess the Ground Truth generated label to be compatible with ByteTrack and other MOT solutions
  • Train the ByteTrack algorithm with a SageMaker training job (with the option to extend a pre-built container )
  • Deploy the trained model with various deployment options, including asynchronous inference

We also provide the code sample on GitHub , which uses SageMaker for labeling, building, training, and inference.

SageMaker is a fully managed service that provides every developer and data scientist with the ability to prepare, build, train, and deploy machine learning (ML) models quickly. SageMaker provides several built-in algorithms and container images that you can use to accelerate training and deployment of ML models. Additionally, custom algorithms such as ByteTrack can also be supported via custom-built Docker container images. For more information about deciding on the right level of engagement with containers, refer to Using Docker containers with SageMaker .

SageMaker provides plenty of options for model deployment, such as real-time inference, serverless inference, and asynchronous inference. In this post, we show how to deploy a tracking model with different deployment options, so that you can choose the suitable deployment method in your own use case.

Overview of solution

Our solution consists of the following high-level steps:

  1. Label the dataset for tracking, with a bounding box on each object (for example, pedestrian, car, and so on). Set up the resources for ML code development and execution.
  2. Train a ByteTrack model and tune hyperparameters on a custom dataset.
  3. Deploy the trained ByteTrack model with different deployment options depending on your use case: real-time processing, asynchronous, or batch prediction.

The following diagram illustrates the architecture in each step.


Before getting started, complete the following prerequisites:

  1. Create an Amazon Web Services account or use an existing Amazon Web Services account.
  2. We recommend running the source code in the us-east-1 Region.
  3. Make sure that you have a minimum of one GPU instance (for example, ml.p3.2xlarge for single GPU training, or ml.p3.16xlarge) for the distributed training job. Other types of GPU instances are also supported, with various performance differences.
  4. Make sure that you have a minimum of one GPU instance (for example, ml.p3.2xlarge) for inference endpoint.
  5. Make sure that you have a minimum of one GPU instance (for example, ml.p3.2xlarge) for running batch prediction with processing jobs.

If this is your first time running SageMaker services on the aforementioned instance types, you may have to request a quota increase for the required instances.

Set up your resources

After you complete all the prerequisites, you’re ready to deploy the solution.

  1. Create a SageMaker notebook instance . For this task, we recommend using the ml.t3.medium instance type. While running the code, we use docker build to extend the SageMaker training image with the ByteTrack code (the docker build command will be run locally within the notebook instance environment). Therefore, we recommend increasing the volume size to 100 GB (default volume size to 5 GB) from the advanced configuration options. For your Amazon Web Services Identity and Access Management (IAM) role, choose an existing role or create a new role, and attach the AmazonS3FullAccess, AmazonSNSFullAccess, AmazonSageMakerFullAccess, and AmazonElasticContainerRegistryPublicFullAccess policies to the role.
  2. Clone the GitHub repo to the /home/ec2-user/SageMaker folder on the notebook instance you created.
  3. Create a new Amazon Simple Storage Service (Amazon S3) bucket or use an existing bucket.

Label the dataset

In the data-preparation.ipynb notebook, we download an MOT16 test video file and split the video file into small video files with 200 frames. Then we upload those video files to the S3 bucket as the data source for labeling.

To label the dataset for the MOT task, refer to Getting started . When the labeling job is complete, we can access the following annotation directory at the job output location in the S3 bucket.

The manifests directory should contain an output folder if we finished labeling all the files. We can see the file output.manifest in the output folder. This manifest file contains information about the video and video tracking labels that you can use later to train and test a model.

Train a ByteTrack model and tune hyperparameters on the custom dataset

To train your ByteTrack model, we use the bytetrack-training.ipynb notebook. The notebook consists of the following steps:

  1. Initialize the SageMaker setting.
  2. Perform data preprocessing.
  3. Build and push the container image.
  4. Define a training job.
  5. Launch the training job.
  6. Tune hyperparameters.

Especially in data preprocessing, we need to convert the labeled dataset with the Ground Truth output format to the MOT17 format dataset, and convert the MOT17 format dataset to a MSCOCO format dataset (as shown in the following figure) so that we can train a YOLOX model on the custom dataset. Because we keep both the MOT format dataset and MSCOCO format dataset, you can train other MOT algorithms without separating detection and tracking on the MOT format dataset. You can easily change the detector to other algorithms such as YOLO7 to use your existing object detection algorithm.

Deploy the trained ByteTrack model

After we train the YOLOX model, we deploy the trained model for inference. SageMaker provides several options for model deployment, such as real-time inference, asynchronous inference, serverless inference, and batch inference. In our post, we use the sample code for real-time inference, asynchronous inference, and batch inference. You can choose the suitable code from these options based on your own business requirements.

Because SageMaker batch transform requires the data to be partitioned and stored on Amazon S3 as input and the invocations are sent to the inference endpoints concurrently, it doesn’t meet the requirements in object tracking tasks where the targets need to be sent in a sequential manner. Therefore, we don’t use the SageMaker batch transform jobs to run the batch inference. In this example, we use SageMaker processing jobs to do batch inference.

The following table summarizes the configuration for our inference jobs.

Inference Type Payload Processing Time Auto Scaling
Real-time Up to 6 MB Up to 1 minute Minimum instance count is 1 or higher
Asynchronous Up to 1 GB Up to 15 minutes Minimum instance count can be zero
Batch (with processing job) No limit No limit Not supported

Deploy a real-time inference endpoint

To deploy a real-time inference endpoint, we can run the bytetrack-inference-yolox.ipynb notebook. We separate ByteTrack inference into object detection and tracking. In the inference endpoint, we only run the YOLOX model for object detection. In the notebook, we create a tracking object, receive the result of object detection from the inference endpoint, and update trackers.

We use SageMaker PyTorchModel SDK to create and deploy a ByteTrack model as follows:

from sagemaker.pytorch.model import PyTorchModel
pytorch_model = PyTorchModel(
endpoint_name =<endpint name>

After we deploy the model to an endpoint successfully, we can invoke the inference endpoint with the following code snippet:

with open(f"datasets/frame_{frame_id}.png", "rb") as f:
    payload =

response = sm_runtime.invoke_endpoint(
    EndpointName=endpoint_name, ContentType="application/x-image", Body=payload
outputs = json.loads(response["Body"].read().decode())

We run the tracking task on the client side after accepting the detection result from the endpoint (see the following code). By drawing the tracking results in each frame and saving as a tracking video, you can confirm the tracking result on the tracking video.

aspect_ratio_thresh = 1.6
min_box_area = 10
tracker = BYTETracker(

online_targets = tracker.update(torch.as_tensor(outputs[0]), [height, width], (800, 1440))
online_tlwhs = []
online_ids = []
online_scores = []
for t in online_targets:
    tlwh = t.tlwh
    tid = t.track_id
    vertical = tlwh[2] / tlwh[3] > aspect_ratio_thresh
    if tlwh[2] * tlwh[3] > min_box_area and not vertical:
online_im = plot_tracking(
    frame, online_tlwhs, online_ids, frame_id=frame_id + 1, fps=1. / timer.average_time

Deploy an asynchronous inference endpoint

SageMaker asynchronous inference is the ideal option for requests with large payload sizes (up to 1 GB), long processing times (up to 1 hour), and near-real-time latency requirements. For MOT tasks, it’s common that a video file is beyond 6 MB, which is the payload limit of a real-time endpoint. Therefore, we deploy an asynchronous inference endpoint. Refer to Asynchronous inference for more details of how to deploy an asynchronous endpoint. We can reuse the model created for the real-time endpoint; for this post, we put a tracking process into the inference script so that we can get the final tracking result directly for the input video.

To use scripts related to ByteTrack on the endpoint, we need to put the tracking script and model into the same folder and compress the folder as the model.tar.gz file, and then upload it to the S3 bucket for model creation. The following diagram shows the structure of model.tar.gz.

We need to explicitly set the request size, response size, and response timeout as the environment variables, as shown in the following code. The name of the environment variable varies depending on the framework. For more details, refer to Create an Asynchronous Inference Endpoint .

pytorch_model = PyTorchModel(
        'TS_MAX_REQUEST_SIZE': '1000000000', #default max request size is 6 Mb for torchserve, need to update it to support the 1GB input payload
        'TS_MAX_RESPONSE_SIZE': '1000000000',
        'TS_DEFAULT_RESPONSE_TIMEOUT': '900' # max timeout is 15mins (900 seconds)


When invoking the asynchronous endpoint, instead of sending the payload in the request, we send the Amazon S3 URL of the input video. When the model inference finishes processing the video, the results will be saved on the S3 output path. We can configure Amazon Simple Notification Service (Amazon SNS) topics so that when the results are ready, we can receive an SNS message as a notification.

Run batch inference with SageMaker processing

For video files bigger than 1 GB, we use a SageMaker processing job to do batch inference. We define a custom Docker container to run a SageMaker processing job (see the following code). We draw the tracking result on the input video. You can find the result video in the S3 bucket defined by s3_output.

from sagemaker.processing import ProcessingInput, ProcessingOutput
        ProcessingInput(source=s3_input, destination="/opt/ml/processing/input"),
        ProcessingInput(source=s3_model_uri, destination="/opt/ml/processing/model"),
        ProcessingOutput(source='/opt/ml/processing/output', destination=s3_output),

Clean up

To avoid unnecessary costs, delete the resources you created as part of this solution, including the inference endpoint.


This post demonstrated how to implement a multi-object tracking solution on a custom dataset using one of the state-of-the-art algorithms on SageMaker. We also demonstrated three deployment options on SageMaker so that you can choose the optimal option for your own business scenario. If the use case requires low latency and needs a model to be deployed on an edge device, you can deploy the MOT solution at the edge with Amazon Web Services Panorama .

For more information, refer to Multi Object Tracking using YOLOX + BYTE-TRACK and data analysis .

About the Authors

Gordon Wang, is a Senior AI/ML Specialist TAM at Amazon Web Services. He supports strategic customers with AI/ML best practices cross many industries. He is passionate about computer vision, NLP, Generative AI and MLOps. In his spare time, he loves running and hiking.

Yanwei Cui, PhD, is a Senior Machine Learning Specialist Solutions Architect at Amazon Web Services. He started machine learning research at IRISA (Research Institute of Computer Science and Random Systems), and has several years of experience building artificial intelligence powered industrial applications in computer vision, natural language processing and online user behavior prediction. At Amazon Web Services, he shares the domain expertise and helps customers to unlock business potentials, and to drive actionable outcomes with machine learning at scale. Outside of work, he enjoys reading and traveling.

Melanie Li, PhD, is a Senior AI/ML Specialist TAM at Amazon Web Services based in Sydney, Australia. She helps enterprise customers to build solutions leveraging the state-of-the-art AI/ML tools on Amazon Web Services and provides guidance on architecting and implementing machine learning solutions with best practices. In her spare time, she loves to explore nature outdoors and spend time with family and friends.

Guang Yang, is a Senior applied scientist at the Amazon ML Solutions Lab where he works with customers across various verticals and applies creative problem solving to generate value for customers with state-of-the-art ML/AI solutions.