Get Started with the Project

7 Steps  |  60 Minutes

Q: What is data warehousing?

Analytics is ubiquitous. We all use reports and dashboards to manage our work, report our progress to stakeholders, and perform ad-hoc analytics to support decision making. Under the hoods, these reports, dashboards and BI tools are powered by data warehouses, which store data efficiently to minimize I/O and deliver query results at blazing speeds to hundreds and thousands of users concurrently. Unlike transactional databases, data warehouses use specialized architectures and storage for fast query and data load performance. Data warehouses also need to be highly scalable so that you can add more data sources all the time to enrich analytics and insights. Lastly, data warehouses should integrate seamlessly with 3rd party business intelligence tools and SQL clients, and support standard SQL so that customers can use skills they already have.

Q: Why should I run data warehousing on Amazon Web Services?

Amazon Redshift, our data warehousing solution, is fast, easy-to-use, and fully managed. It automates infrastructure provisioning and administrative tasks such as backups, replication, and patching. It integrates seamlessly with 3rd party BI and ETL tools, so you can get to your first report in just a few minutes. And, there is no limit to the amount of data you can load and analyze. As your data grows, you don’t have to worry about expensive system upgrades or slow performance. Amazon Redshift is fast at any scale because it uses columnar storage and several optimization techniques. Amazon Redshift is also cost-effective and you only pay for what you use. 

Q: What is Amazon Redshift?

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. 

Q: How does the performance of Amazon Redshift compare to most traditional databases for data warehousing and analytics?

Amazon Redshift uses a variety of innovations to achieve up to ten times higher performance than traditional databases for data warehousing and analytics workloads:

  • Columnar Data Storage: Instead of storing data as a series of rows, Amazon Redshift organizes the data by column. Unlike row-based systems, which are ideal for transaction processing, column-based systems are ideal for data warehousing and analytics, where queries often involve aggregates performed over large data sets. Since only the columns involved in the queries are processed and columnar data is stored sequentially on the storage media, column-based systems require far fewer I/Os, greatly improving query performance.
  • Advanced Compression: Columnar data stores can be compressed much more than row-based data stores because similar data is stored sequentially on disk. Amazon Redshift employs multiple compression techniques and can often achieve significant compression relative to traditional relational data stores. In addition, Amazon Redshift doesn't require indexes or materialized views and so uses less space than traditional relational database systems. When loading data into an empty table, Amazon Redshift automatically samples your data and selects the most appropriate compression scheme.
  • Massively Parallel Processing (MPP): Amazon Redshift automatically distributes data and query load across all nodes. Amazon Redshift makes it easy to add nodes to your data warehouse and enables you to maintain fast query performance as your data warehouse grows.

Q: How do I access my running data warehouse cluster?

Once your data warehouse cluster is available, you can retrieve its endpoint and JDBC and ODBC connection string from the Amazon Web Services Management Console or by using the Redshift APIs. You can then use this connection string with your favorite database tool, programming language, or Business Intelligence (BI) tool. You will need to authorize network requests to your running data warehouse cluster. For a detailed explanation please refer to our Getting Started Guide.

Q: Is Amazon Redshift compatible with my preferred business intelligence software package and ETL tools?

Amazon Redshift uses industry-standard SQL and is accessed using standard JDBC and ODBC drivers. You can download Amazon Redshift custom JDBC and ODBC drivers from the Connect Client tab of our Console. We have validated integrations with popular BI and ETL vendors, a number of which are offering free trials to help you get started loading and analyzing your data.

Get Started with the Project