ACTS Blog Selection
We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
How Goldman Sachs leverages Amazon Web Services PrivateLink for Amazon S3
As a multinational investment bank and financial services company,
Today, Goldman Sachs owns and operates several thousand accounts for production and non-production workloads and deployments. A company-wide Core Engineering team is responsible for enabling cloud adoption based on best practices and reusable patterns. To provide secure access to Amazon Web Services services, Goldman Sachs uses VPC endpoints for both hybrid and cloud-native (VPC based) workloads. Hybrid workloads with traffic flows from on-premises environments usually interact with cloud native services over PrivateLink. This
In this blog, we walk through Goldman Sachs’ S3 VPC endpoint adoption journey and architectural evolution, outlining advantages and disadvantages to each approach, and sharing key learnings for success at scale. We start with architectures based on
Evolution of Amazon S3 access at Goldman Sachs
Throughout this section we will take you on the journey of Goldman Sachs’ solutions for hybrid connectivity to S3. We will progress from the initial
Initial approach and challenges: Amazon Elastic Compute Cloud (EC2) proxy fleet
Goldman Sachs’ first version of hybrid access to Amazon S3 via Amazon Web Services Direct Connect was deployed using gateway endpoints for S3, before the launch of the PrivateLink feature. Since gateway VPC endpoints are only accessible from VPC hosted entities, hybrid requests to S3 have to be routed via proxies. Goldman Sachs originally achieved this using a fleet of EC2 instances running ‘Squid’ HTTP proxy software, managed by the Core Engineering team and hosted within a Direct Connect attached VPC.
Figure 1: Amazon EC2 proxy fleet solution
Aligned to Figure 1:
- Traffic from on-premises systems routes via firm-wide Direct Connect connections.
- To a core-infrastructure owned VPC.
- Which hosts a fleet of core-infrastructure managed EC2 instances operating in an Auto Scaling group running Squid proxy software.
- These proxies initiate connections onwards via a gateway VPC endpoint.
- To S3 buckets which may be owned by core or line of business accounts.
To enforce controls on cloud native endpoints such as S3, Goldman Sachs uses VPC endpoint policies to restrict access to specific consumers (
This approach had a few of drawbacks that resulted in issues operating at significant scale:
- Reduced availability and performance: It required a fleet of proxies dedicated to routing requests to Amazon S3. This proxy fleet was subject to unpredictable workload spikes, which could not be scaled for in time, and their management carried operational overhead for the Core Engineering team. The additional components in the path effectively reduced the overall availability of connectivity to S3 for hybrid users. In some cases, jobs running on compute farms would exhaust the resources of the proxy fleet, significantly increasing observed service latency for all users.
- Increased complexity: To mitigate these occurrences, teams with high-bandwidth requirements were encouraged to manage their own proxy deployment, reducing blast radius but increasing cost and complexity for business units.
- Scalability challenges: Goldman Sachs in 2018 had only a handful of Amazon Web Services accounts. By Jan 2021, this had grown significantly and continues to increase. As the list of permitted accounts within the VPC endpoint policy grew larger and larger, Goldman Sachs reached the 20-KB VPC endpoint policy character limit. Although the
aws:PrincipalOrgID IAM key can be used in many cases to simplify this, it is sometimes necessary to secure access for specific buckets/endpoints to specific accounts. GS was initially successful in tuning S3 bucket policy and handling cross-team bucket allow-listing management challenges, but eventually determined that this approach was simply not manageable at the required scale.
Next iteration and challenges: Amazon Elastic Container Service (ECS) proxy fleet
The next iteration of the architecture, outlined in the following diagram, solved for some of these challenges:
Figure 2: Amazon ECS proxy fleet solution
Aligned to Figure 2:
- Traffic from on-premises systems would route via the same firm-wide Direct Connect connections.
- In a core-infrastructure owned Amazon S3 proxy VPC.
- Which hosts a Network Load Balancer (NLB) providing static IPs for on-premises connectivity.
- This NLB distributes traffic to core managed ECS proxy tasks operating in an Auto Scaling group.
- These proxies initiate connections onwards via a gateway VPC endpoint.
- To S3 buckets which may be owned by core or line-of-business accounts.
This approach helped significantly in increasing availability, scalability, and working around the 20-KB policy limit by having multiple ECS deployments in different GS routable S3 Proxy VPCs. However, this still required deployment and operation of multiple customer managed proxy platforms, each of which was still constrained individually by the 20-KB policy limit and 55,000 connections per minute per NLB target.
Latest evolution: Elimination of hosted proxy solution
With the announcement of
Figure 3: Amazon Web Services PrivateLink for Amazon S3 solution
Aligned to Figure 3:
- Traffic from on-premises systems would route via the same firm-wide Direct Connect connections.
- To a core infrastructure owned S3 endpoint VPC.
- This VPC hosts S3 interface VPC endpoints PrivateLink.
- Which provide direct access to S3 buckets which may be owned by core or line-of-business accounts.
This update delivered the following benefits:
- Improved operational efficiency: Eliminating the need to manage a proxy fleet for hybrid use cases to connect to cloud native services, delivering great benefit to the Core Engineering team.
- Reduced blast radius: Each Business Unit (BU) or Line of Business (LoB) has dedicated Amazon S3 connectivity for production workloads, with a shared general-purpose endpoint used for cost effective development and testing purposes. The dedicated endpoints reduced potential impact of BU cross talk (‘noisy neighbor’ scenario) while providing simple, clear boundaries for BUs to operate within. This is particularly helpful for LoBs with a high number of accounts and use cases with high and unpredictable throughput leading to contention and the potential to swamp other LoBs on a common endpoint.
- Security guardrails enforcement: Multiple interface endpoints can be configured in a single VPC, each with unique VPC endpoint policies. Each endpoint can be allocated to a single line of business or use case, improving security posture and adhering to the principles of least privilege.
To dive deep a little on the final point above, it’s most effective to bring this to life with sample
Here is a sample minimalistic endpoint policy that combines these possibilities:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:Get<List of Get S3 Permissions defined by enterprise or required by line of business / use case>",
"s3:List<List of List S3 Permissions defined by enterprise or required by line of business / use case>",
"s3:Put<List of Put S3 Permissions defined by enterprise or required by line of business / use case>",
// Other S3 actions
],
"Resource": "*",
"Principal": "*", // A specific Principal or list of Principals can be provided for highly restricted use cases
"Condition": {
// Note that some combination of the following conditions would be used, not necessarily all of them
// PrincipalOrgID is used to easily provide a boundary mandating the use of enterprise credentials
"StringEquals": {
"aws:PrincipalOrgID": ${Your enterprise Org IDs}
},
// PrincipalOrgPaths is used to easily provide access to Principals within a specific line of business, in this example Machine Learning
"ForAllValues:StringLike": {
"aws:PrincipalOrgPaths": ["o-myorganizatio-1/*/ou-machinelearn/*"]
},
// ResourceOrgID is used to easily provide a boundary only permitting access to enterprise owned S3 buckets
"StringEquals": {
"aws:ResourceOrgID": ${Your enterprise Org IDs}
},
// ResourceOrgPaths is used to easily provide access to resources owned by a specific line of business
"ForAllValues:StringLike": {
"aws:ResourceOrgPaths": ["o-myorganization/r-org-path-1/*","o-myorganization-2/r-org-path-2/*"]
},
// A specific ResourceAccount or list of ResourceAccounts can be provided for highly restricted use cases
"StringEquals": {
"aws:ResourceAccount": ${Target resource account IDs}
}
}
}
]
}
You can use similar methods to create curated endpoints for third party vendor services, such as Snowflake. Keeping these vendor flows independent helps improve security posture, supports dedicated tenancy, and enables more targeted performance monitoring.
Here is a sample minimalistic endpoint policy focused on a third party vendor service use case:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:Get<List of Get S3 Permissions required by the third party vendor>",
"s3:List<List of List S3 Permissions required by the third party vendor>",
"s3:Put<List of Put S3 Permissions required by the third party vendor>",
// Other S3 actions
],
"Resource": "*",
"Principal": "*",
"Condition": {
// PrincipalOrgPaths is used to easily provide access to Principals within a specific line of business, in this example Machine Learning
"ForAllValues:StringLike": {
"aws:PrincipalOrgPaths": ["o-myorganizatio-1/*/ou-machinelearn/*"]
},
// Provider either ResourceAccount or ResourceOrgPaths based on the level of isolation provided and guaranteed by the vendor
"ForAllValues:StringLike": {
"aws:ResourceOrgPaths": ["o-vendororg-1/r-org-path-1/*"]
},
"StringEquals": {
"aws:ResourceAccount": ${Target resource account IDs}
}
}
}
]
}
The key elements of how the policy above are different from enterprise policy is as follows:
- Vendors often provide a dedicated resource account per tenant / customer, so using the S3:ResourceAccount permission provides a reasonably constrained approach. The principal in this case can be a unique set of predefined principals, or can be constrained to a given org path according to business needs.
- For vendors who offer several dedicated resource accounts per tenant, multiple accounts can be listed or replaced with ResourceOrgPath to support more dynamic growth subject to suitable architecture and isolation boundaries being in use by the vendor. GS seek to use this approach when onboarding with large vendors such as Snowflake.
Please note that the VPC Endpoint policies outlined above are examples included to bring the implementation to life, and in practice are deployed in conjunction with suitably aligned
How has this evolution helped improve adoption?
Before adoption of PrivateLink for S3, Goldman Sachs’ connectivity to Amazon Web Services averaged several hundred Mbps of traffic to S3 due to scalability and operational challenges across Direct Connect, EC2/ECS, and the self-managed proxy software. Shortly following introduction of S3 PrivateLink, this ramped up to a peak of 12 Gbps with steady state now around 5-6 Gbps, driven by enhanced ease of adoption and ease of use, performance, and stability. The following diagrams show network throughput (egress and ingress) for the period December 2020 to February 2022, with key increases visible in early March 2021 when PrivateLink was enabled.
Adoption also enabled onboarding of applications that do not support HTTP proxies, but can use the Amazon S3 interface endpoint via
As a result of adopting PrivateLink for S3, Goldman Sachs has also been successful in enabling batch processing jobs to complete tasks including heavy read and write operations, which previously overwhelmed the proxies. Long-term monitoring has shown that Application Programming Interface (API) calls over PrivateLink are more efficient and more stable compared to the previous proxy-based solution. Latencies have also improved by almost half in some cases, averaging 10ms for us-east-1 across nearly all request types, with only large multipart upload tests operating slightly slower.
Where can you get the most value from using Amazon Web Services PrivateLink for Amazon S3?
The Core Engineering team at Goldman Sachs has delivered increased business value through the ease of deployment and adoption of PrivateLink. The benefits include improved security posture, improved workload isolation between lines of business, and an improvement in performance, all while reducing operational overhead associated with deploying and managing proxy fleets.
It is worthy of note that this pattern is most impactful for use cases involving hybrid workloads, where producers or consumers of S3 data are on premises. For purely cloud based use cases, such as where workloads are in a private VPC, S3 gateway VPC endpoints may still be preferable due to their lower cost in high throughput environments.
Key advantages of PrivateLink for S3 include the ability to use security groups to control which traffic flows / sources are permitted to use them, and simpler VPC routing tables. Conversely, Gateway VPC endpoints do not charge for data transfer. With both approaches, VPC endpoints offer significant advantages in improving security and ease of control of access to S3.
Conclusion
In this blog post, we outlined Goldman Sachs’ evolution from Amazon S3 gateway endpoints with EC2 based proxies, through ECS based proxies, to Amazon Web Services PrivateLink based S3 interface endpoints.
We’ve shared key advantages, and outlined solutions to secure management and adoption of S3 VPC endpoints at scale. We’ve also outlined the most advantageous use cases for each endpoint solution, such as gateway endpoints for high-volume on-cloud deployments, and interface endpoints for hybrid architectures. PrivateLink for Amazon S3 has delivered business benefits such as reduced operational overhead and improved availability, and technical benefits such as improving the security of third-party vendor integrations. These benefits have enabled Goldman Sachs to innovate with greater agility, and resulted in a continued increase in S3 adoption. At the time of writing, Goldman Sachs has transferred tens of petabytes of data via PrivateLink for Amazon S3.
For further information, review the