Amazon EMR is a managed service that makes it fast, easy, and cost-effective to run Apache Hadoop and Spark to process vast amounts of data. Amazon EMR also supports powerful and proven Hadoop tools such as Presto, Hive, Pig, HBase, and more. In this project, you will deploy a fully functional Hadoop cluster, ready to analyze log data in just a few minutes. You will start by launching an Amazon EMR cluster and then use a HiveQL script to process sample log data stored in an Amazon S3 bucket. HiveQL, is a SQL-like scripting language for data warehousing and analysis. You can then use a similar setup to analyze your own log files.
What you'll need before starting:
An Amazon Web Services Account: You will need an Amazon Web Services account to begin provisioning resources to host your website. Sign up for Amazon Web Services.
IT Experience: Prior experience with Hadoop is recommended, but not required, to complete this project.
Amazon Web Services Experience: Basic familiarity with Amazon S3 and Amazon EC2 key pairs is suggested, but not required, to complete this project.