Amazon Redshift extends data warehouse queries to your data lake, with no loading required. You can run analytic queries against petabytes of data stored locally in Redshift, and directly against exabytes of data stored in Amazon S3. It is simple to set up, automates most of your administrative tasks, and delivers fast performance at any scale.
Amazon Redshift delivers fast query performance on datasets ranging in size from gigabytes to exabytes. Redshift uses columnar storage, data compression, and zone maps to reduce the amount of I/O needed to perform queries. It uses a massively parallel processing (MPP) data warehouse architecture to parallelize and distribute SQL operations to take advantage of all available resources. The underlying hardware is designed for high performance data processing, using local attached storage to maximize throughput between the CPUs and drives, and a high bandwidth mesh network to maximize throughput between nodes.
Amazon Redshift uses machine learning to deliver high throughout, irrespective of your workloads or concurrent usage. Redshift utilizes sophisticated algorithms to predict incoming query run times, and assigns them to the optimal queue for the fastest processing. For example, queries such as dashboards and reports with high concurrency requirements are routed to an express queue for immediate processing. As concurrency increases further, Amazon Redshift predicts when queuing may begin and automatically deploys transient resources with the Concurrency Scaling feature to ensure consistently fast performance, irrespective of variability in demand on the cluster.
Amazon Redshift uses result caching to deliver sub-second response times for repeat queries. Dashboard, visualization, and business intelligence tools that execute repeat queries experience a significant performance boost. When a query executes, Redshift searches the cache to see if there is a cached result from a prior run. If a cached result is found and the data has not changed, the cached result is returned immediately instead of re-running the query.
Easy to setup, deploy, & manage
Amazon Redshift is simple to set up and operate. You can deploy a new data warehouse with just a few clicks in the Amazon Web Services Management cConsole, and Redshift automatically provisions the infrastructure for you. Most administrative tasks are automated, such as backups and replication, so you can focus on your data, not the administration. When you want control, Redshift provides options to help you make adjustments tuned to your specific workloads. New capabilities are released transparently, eliminating the need to schedule and apply upgrades and patches.
Amazon Redshift automatically and continuously backs up your data to Amazon S3. Redshift can asynchronously replicate your snapshots to S3 in another region for disaster recovery. You can use any system or user snapshot to restore your cluster using the Amazon Web Services Management Console or the Redshift APIs. Your cluster is available as soon as the system metadata has been restored, and you can start running queries while user data is spooled down in the background.
Amazon Redshift has multiple features that enhance the reliability of your data warehouse cluster. Redshift continuously monitors the health of the cluster, and automatically re-replicates data from failed drives and replaces nodes as necessary for fault tolerance.
Amazon Redshift gives you the flexibility to execute queries within the console or connect SQL client tools, libraries, or Business Intelligence tools you love. Query Editor on the Amazon Web Services console provides a powerful interface for executing SQL queries on Redshift clusters and viewing the query results and query execution plan (for queries executed on compute nodes) adjacent to your queries.
Integrated with third-party tools
Enhance Amazon Redshift by working with industry-leading tools and experts for loading, transforming and visualizing data.
No upfront costs, pay as you go
Amazon Redshift is the most cost-effective data warehouse, and you pay only for the resources you provision. Redshift is the only cloud data warehouse that offers On-Demand pricing with no up-front costs, Reserved Instance pricing which can save you up to 75% by committing to a 1- or 3-year term, and per query pricing based on the amount of data scanned in your Amazon S3 data lake. For more information, see the Amazon Redshift pricing page.
Choose your node type
You can select from two node types to optimize Redshift for your data warehousing needs. Dense Compute (DC) nodes allow you to create very high performance data warehouses using fast CPUs, large amounts of RAM, and solid-state disks (SSDs). If you want to scale further or reduce costs, you can switch to our more cost-effective Dense Storage (DS) node types that use larger hard disk drives for a very low price point. Scaling your cluster or switching between node types requires a single API call or a few clicks in the Amazon Web Services Management Console.
Scale quickly to meet your needs
Petabyte-scale data warehousing
Amazon Redshift is simple and quickly scales as your needs change. With a few clicks in the console or a simple API call, you can easily change the number or type of nodes in your data warehouse, and scale up or down as your needs change.
Exabyte-scale data lake analytics
Redshift Spectrum, a feature of Redshift, enables you to run queries against exabytes of data in Amazon S3 without having to load or transform any data. You can use S3 as a highly available, secure, and cost-effective data lake to store unlimited data in open data formats.
Amazon Redshift provides consistently fast performance, even with thousands of concurrent queries - whether they query data in your Amazon Redshift data warehouse, or directly in your Amazon S3 data lake.
Query your data lake
Amazon S3 data lake
Amazon Redshift is the only data warehouse that extends your queries to your Amazon S3 data lake without loading data. You can query open file formats you already use, such as Avro, CSV, Grok, JSON, ORC, Parquet, and more, directly in S3. This gives you the flexibility to store highly structured, frequently accessed data on Redshift local disks, keep exabytes of structured and unstructured data in S3, and query seamlessly across both to provide unique insights that you would not be able to obtain by querying independent datasets.