We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
Bayer Creates Secure Self-Service Solution for Data Scientists on Amazon Web Services
This blog is guest authored by Dr. Stefan Schmitz , lead product owner of Bayer’s cross-divisional data science platform, and Maciej Wroblewski , Amazon Web Services architect from the Accenture Advanced Technology Center.
Leading global life sciences organization
Bayer built a cross-functional data science platform to provide curated, self-service access to a range of
Creating a Self-Service Platform for Data Scientists
As one of the world’s largest life sciences organizations with operations spread across 83 countries, Bayer’s data science teams needed greater efficiency and cost optimization in their operations. Bayer’s centralized, self-service data analytics platform allows data scientists to access a curated set of needed technologies and IT capabilities, while adhering to corporate compliance and security standards.
“Using this platform, teams no longer need to duplicate efforts and costs of setting up the basic infrastructure components and services for individual projects,” says Dr. Stefan Schmitz, lead product owner of Bayer’s cross-divisional data science platform. “There’s no need to reinvent the wheel over and over again.” In addition, data scientists can control the full lifecycle of their models and manage compute-intensive projects by spinning up customizable instances with preconfigured tooling.
Built on Amazon Web Services, the platform gives access to secure and resizable compute capacity using Amazon EC2. It provides a multi-tenant configuration within
The platform also uses
“Our main priority while working with Bayer to design the platform was to simplify the typical operations executed by data scientists”, says Maciej Wroblewski, Amazon Web Services architect from the Accenture Advanced Technology Center which collaborated with Bayer to envision, implement and support development of the platform. “With this platform, Bayer is empowering its data scientists to focus on data processing instead of the deployment of infrastructure components.”
Bayer hosts its computing resources in three
Controlling Access to Individual Models through Amazon API Gateway
As the platform grew in size and scale, Bayer’s technical team realized that data scientists were interacting with a sizable number of production models through APIs that lacked a consistent URL structure and didn’t scale well to larger numbers of parallel deployments. “We realized that scalability of API deployments and access management for APIs were becoming more relevant,” Schmitz says. “Data scientists needed more governance and control over how individual model APIs were being used.”
The team then developed a self-service API Management Service to address the data scientists’ needs. Using the REST API service within
Using the Custom Authorizer feature within Amazon API Gateway allows for customization of the identity-based policies that control who can access particular API endpoints for specific models. Amazon API Gateway not only authorizes incoming requests but also extracts details for logging, such as the model requested, the URL, the HTTP method, and user information. The platform integrates with Bayer’s corporate security standards to allow access controls through their Active Directory groups. Dedicated
The solution fully automates the provisioning and configuration of underlying resources and services. Model developers self-register their own API endpoints so they can be invoked for inference. As part of this process, developers can associate their models with access policies that are designed and maintained by dedicated policy stewards. Amazon EKS Namespaces and IAM policies facilitate secure access to the models and data within an Amazon Web Services account to which many tenants have access in a way that isolates them against each other.
“Bayer scientists have the power,” says Wroblewski. “They have the permissions to create, modify, and delete all the API endpoints without needing to reach out to any technical team to maintain the platform. They can do it on their own.”
Growth in Adoption and Future Outlook
While the platform is already seeing much success across the Pharmaceuticals, Consumer Health and Enabling Function divisions its stakeholder base is still growing in multiple areas. For example, Bayer Pharmaceuticals Research recently decided to adopt the new platform capabilities for model API management discussed above to maintain and govern their analytical models. Dr. Andreas Poehlmann, research scientist in the Machine Learning Research Group at Bayer Pharmaceuticals says, “This setup streamlines the way we deploy our model APIs internally to make them easily available for researchers at Bayer while ensuring compliance. Having this centralized data science platform available allows us to quickly move from prototype to production and lets us focus on solving the scientific questions. For example, we’re already using it to make an in-house-developed and open-sourced molecular representation extraction tool available to our scientists, which allows extracting CDDDs (continuous and data-driven molecular descriptors) from molecular SMILES (Simplified Molecular Input Line Entry Specification) strings. It’s great to have these data-science services provided internally through a unified and centralized platform at Bayer.”
Bayer’s cross-divisional data science platform will continue to evolve. The 2023 roadmap includes topics like the integration of
The mentioned AWS GenAI Services service names relating to generative AI are only available or previewed in the Global Regions. Amazon Web Services China promotes AWS GenAI Services relating to generative AI solely for China-to-global business purposes and/or advanced technology introduction.