How Exberry built a cloud-native matching engine on Amazon Web Services that can process 1 million trades per sec, with 20 microseconds latency

by Ronen Nachmias, Yaniv Barak, Mike Perna, and Alex Mirarchi | on

Who is Exberry 

Exberry offers a cloud-native matching engine platform that enables traditional and alternative exchanges to launch ultra-low latency markets quickly and run them cost-effectively.

We launched Exberry to meet the rising demand from exchange operators for a flexible, high-performance matching engine platform that could support multiple asset classes and trading strategies and was also scalable, flexible, and accessible to a broad range of market participants.

Exberry’s Products

Exberry’s exchange technology platform includes a highly-scalable and flexible Core Trading Engine, that features a central order book, circuit breaker functionality, and market data support. It supports ultra-low latency execution and can power regulated and unregulated markets across Amazon Web Services’ 99 Availability Zones within 31 geographic regions worldwide.

Elblog1

The Broker System is a comprehensive solution that includes order management, smart order routing, trading UI, asset management, and tokenization capabilities. It enables brokers to manage their orders and assets efficiently and tokenize traditional assets, such as stocks, bonds, and commodities, for digital asset exchanges.

Exberry Nebula is a turn-key Managed Service that integrates the Exberry Exchange System, Broker System, and settlement solution, offering an all-in-one white-label solution for market hosting.
Elblog2

Why Build on the Cloud

As a company that was born in the cloud, all of our products and services run natively in Amazon Web Services. To us, building our trading platform in the cloud was straightforward because of the unrivaled speed to market, flexibility, and security that the cloud offers compared to on-premises. By leveraging cloud infrastructure and multi-tenant architecture, we were able to build a highly-scalable, resilient, and customizable ultra-low latency matching engine solution that we can deliver to customers at much more economical prices.

We chose to build our platform on Amazon Web Services because of Amazon Web Services’ reach within financial services and extensive range of cloud services. Amazon Web Services provides a highly reliable, scalable, and integrated platform with a broad range of services that we utilize to improve the performance, scalability, and availability of our system.

Design Principles

Given Exberry is the foundational system that powers high-performance, ultra-low latency markets, we needed to build a platform with a focus on high-performance computing, optimized to process large volumes of data in real-time. We wanted our platform to be modular and utilize a distributed and brokerless architecture that used reliable UDP unicast and multicast for cloud deployments. It also needed to include fault tolerance mechanisms at the data layer, so we would need highly-durable storage and automatic replication, to ensure no data loss and fast recovery.

Specifically, we wanted to develop our technology with the principles of mechanical sympathy. This enables efficient interaction between software and hardware, and Reactive-Microservices architecture, which allows for scalability by designing applications as a collection of independent services that communicate using lightweight protocols.

Reliable UDP unicast and multicast are essential for high-speed trading systems, providing low-latency, real-time data delivery, scalability, resilience, and reliability. Multicast support enables efficient data distribution to multiple recipients simultaneously, making it useful for a brokerless architecture. Furthermore, cloud deployment is well-suited for UDP unicast and multicast due to their low overhead and scalability, providing the necessary infrastructure for high-speed and reliable trading.

We also wanted to build a Distributed System architecture to enable the physical separation of components while communicating over a network, which is critical for achieving scalability and fault tolerance. The Location Transparency feature of the platform allows it to run in any environment without modification, improving portability and simplifying deployment.

Back-pressure is another important consideration for high-performance systems, such as financial trading platforms, because it helps prevent data loss and maintain low latency. It does this by providing a mechanism for the system to handle incoming data at a sustainable rate, preventing bottlenecks, improving stability, and minimizing delays in data processing.

Back-pressure can also help to improve fairness in high-performance systems by ensuring that all components of the system are given equal consideration. By regulating the flow of data, back-pressure can prevent any one component from becoming overloaded and slowing down the entire system. This ensures that each component of the system is given an equal opportunity to process data and contribute to the system’s overall performance. Back-pressure also helps with the equitable distribution of system resources, ensuring that no component is starved of resources while others are over-provisioned, which improves the system’s overall fairness and efficiency.

 High-Level System Architecture

Elblog3
How We Built Our Platform 

With these design principles in mind, we set out to build our matching engine. Starting with compute, we use graviton-based Amazon EC2 instances , which provide our customers with a highly scalable, flexible, and secure computing environment. By running on Amazon EC2, our platform can dynamically allocate resources on-demand, quickly scale up or down based on our customers’ needs, and achieve low latency and high performance. Additionally, EC2 provides a range of security features, such as firewalls and encrypted data storage, that help ensure the confidentiality and integrity of customer data.

We decided to use Graviton 2 processors because of their performance, cost-efficiency, and environmental benefits. Graviton 2 processors are based on Amazon’s ARM architecture, which provides higher performance and lower costs compared to traditional x86 processors. Graviton-based servers consume up to 60% less energy and have a greatly reduced carbon footprint for the same performance as comparable EC2 instances, making them a more energy-efficient and environmentally friendly compute option. This significantly helps Exberry’s customers to reduce their carbon footprint and enables Exberry to contribute to the overall goal of reducing greenhouse gas emissions.

We decided to use Amazon Elastic Container Service for Kubernetes (EKS) to build our containerized architecture for several reasons:

  • Scalability: Amazon EKS automatically scales the number of worker nodes, making it easier to handle the increased demand for the platform’s services.
  • High Availability: Amazon EKS provides automatic failover, ensuring that the services remain available even in case of node failure.
  • Manageability: Amazon EKS provides a centralized control plane, making it easier to manage the entire cluster and its services.
  • Integration with Amazon Web Services Services: Amazon EKS integrates with a variety of Amazon Web Services services, such as Amazon Web Services Load Balancer, Amazon Web Services RDS, and Amazon Web Services S3, making it easier to build and deploy services on Amazon Web Services.
  • Kubernetes Support: Amazon EKS is built on Kubernetes, the industry-leading container orchestration platform, providing a robust and feature-rich foundation for the services.

Ease of Use: By using Amazon EKS, Exberry can take advantage of the benefits of Amazon Web Services and Kubernetes, and provide a highly available, scalable, and manageable trading platform for its customers.

At the networking layer, our platform uses both Amazon Web Services Transit Gateway and Amazon Web Services Direct Connect . We utilize Amazon Web Services Transit Gateway to connect our customers’ Amazon Web Services accounts and networks, providing secure and scalable communication between them. This allows Exberry customers to have a single and centralized management point for their network connections while maintaining full visibility and control over the network traffic. By using Amazon Web Services Transit Gateway, Exberry ensures that our customers have a seamless, fast, and secure connection between their VPCs, on-premises data centers, and Amazon Web Services. This enhances the overall performance and reliability of Exberry’s services, while also reducing the operational complexity and costs for our customers.

Exberry uses Amazon Web Services Direct Connect because it provides a dedicated network connection from the customer’s infrastructure to Amazon Web Services, which offers low latency and high throughput network connectivity. Amazon Web Services Direct Connect also offers more secure and reliable connections compared to using the public internet, which is important for the sensitive financial data involved in trading operations.

Lastly, for disaster recovery, we again use Amazon EC2 instances. Amazon EC2 provides a flexible, scalable, and secure solution for disaster recovery. Exberry’s disaster recovery plan leverages the Amazon Web Services globally-distributed infrastructure to provide redundancy and resiliency in case of a disaster. With AmazonEC2, Exberry can quickly launch new instances in a different geographic region to keep its operations running, ensuring business continuity even in the face of a disaster. Additionally, Amazon EC2 also provides built-in backup and recovery options, such as Amazon Elastic Block Store (EBS) snapshots, to ensure that data is secure and recoverable in case of a disaster. Put simply, Amazon EC2 helps us maintain a cost-effective, reliable, and scalable disaster recovery solution that keeps our business running smoothly even in the face of unforeseen circumstances.

Built on top of Amazon Web Services services, Exberry is a distributed system that uses the RAFT replicated log to provide fault-tolerant, ultra-low-latency, and high-performance state management for clustered services with inter-service sequenced messaging. With Exberry, multiple client connections can be sequenced into a single, replicated, log that ensures a consistent state across all nodes. This allows for efficient fault tolerance with multiple nodes, ensuring the system can continue operating even if one or more nodes fail.

The combination of Amazon Web Services services and Exberry engineering provides a reliable and efficient solution for applications that require distributed and fault-tolerant ultra-low-latency state management.

“Graviton truly impressed us. With it, we witnessed a remarkable improvement in tail latency jitter, enabling us to deliver more reliable and predictable performance. Our round-trip latency was reduced to less than 20 microseconds while achieving a throughput of over 1 million messages per second.” Ronen Nachmias, Exberry, Co-founder and CTO 

The results

Exberry’s exchange infrastructure boasts reliable and predictable transport with less than 20-microsecond round-trip latency and can support over 1 million messages per second, making it one of the fastest and most efficient exchange technology providers in the industry running on Amazon Web Services*.

Additionally, Exberry’s Software as a Service (SaaS) model offers cost savings, scalability, flexibility, up-to-date technology, and ease of use for its customers. By utilizing SaaS, customers can focus on their core business operations, quickly launch new markets without having to worry about data center contracts or procuring hardware, and can leave the management and maintenance of software to Exberry. This combination of performance, speed-to-market, and elasticity at economic prices has resonated with our customers and is what our customers, like Abaxx Exchange , look for as they set out to launch their new marketplace.

Based in Singapore, Abaxx Exchange introduces centrally-cleared, physically-delivered futures contracts and derivatives, providing price discovery and risk management tools in a market ecosystem with efficient correlation to physical markets.

Abaxx Exchange needed a powerful, ultra-low latency platform that offers brokers and Futures Commission Merchants (FCMs) seamless market access. Exberry provided Abaxx Exchange with an off-the-shelf matching engine and central order book solution built on the cloud, allowing for flexible and fast deployment across multiple environments on demand. Working in an agile manner with continuous delivery, Exberry customized our platform to meet Abaxx Exchanges’ needs, ensuring that additional features were delivered and Abaxx Exchange’s feedback was incorporated in real time. As a result, Exberry built a fully functional system to power a robust and transparent platform, supporting Abaxx Exchange in their mission to build SmarterMarkets™ .

We look forward to sharing more about Abaxx Exchange’s journey and helping more exchange operators launch and scale cutting-edge, ultra-low latency markets on Amazon Web Services.

“Solving fundamental problems in today’s global commodity and energy markets requires a technology partner capable of producing cloud-based solutions with precision at an expeditious pace; we found that in Exberry.” Dan McElduff, Abaxx Exchange, President

*Exberry has multiple different deployment models and technical architectures for their matching engine software to meet the unique needs of each of their customers. These performance statistics reflect a deployment optimized for ultra-low latency performance. Other technology referenced in this blog may be reflective of other distributed system deployment models.