Q: How does Kinesis Data Analytics for SQL differ from Amazon Managed Service for Apache Flink Studio and Amazon Managed Service for Apache Flink? Which should I use?

We do not recommend that customers build new applications using Kinesis Data Analytics for SQL. We recommend that customers build new applications using Amazon Managed Service for Apache Flink Studio or Amazon Managed Service for Apache Flink. We also recommend customers that have Kinesis Data Analytics for SQL migrate them to the other two offerings. You can use the Migration Examples in the developer guide to help with this task.

Q: How do I migrate to Amazon Managed Service for Apache Flink Studio?

You can find a library of code examples and architectures here. The examples cover common queries such as windows and aggregation, transforming date time values, joins, alerts and errors, and more. Additionally, we have created example architectures that you can use. 

Configuring input for SQL applications

Q: What inputs are supported in a Kinesis Data Analytics SQL application?

SQL applications in Kinesis Data Analytics support two types of inputs: streaming data sources and reference data sources. A streaming data source is continuously generated data that is read into your application for processing. A reference data source is static data that your application uses to enrich data coming in from streaming sources. Each application can have no more than one streaming data source and no more than one reference data source. An application continuously reads and processes new data from streaming data sources, including Amazon Kinesis Data Streams or Amazon Kinesis Data Firehose. An application reads a reference data source, including Amazon S3, in its entirety for use in enriching the streaming data source through SQL JOINs.

Q: What is a reference data source?

A reference data source is static data that your application uses to enrich data coming in from streaming sources. You store reference data as an object in your S3 bucket. When the SQL application starts, Kinesis Data Analytics reads the S3 object and creates an in-application SQL table to store the reference data. Your application code can then join it with an in-application stream. You can update the data in the SQL table by calling the UpdateApplication API.

Q: How do I set up a streaming data source in my SQL application?

A streaming data source can be an Amazon Kinesis data stream or an Amazon Kinesis Data Firehose delivery stream. Your Kinesis Data Analytics SQL application continuously reads new data from streaming data sources as it arrives in real time. The data is made accessible in your SQL code through an in-application stream. An in-application stream acts like a SQL table because you can create, insert, and select from it. However, the difference is that an in-application stream is continuously updated with new data from the streaming data source.

You can use the Amazon Web Services Management Console to add a streaming data source. You can learn more about sources in the Configuring Application Input section of the Kinesis Data Analytics for SQL Developer Guide.

Q: How do I set up a reference data source in my SQL application?

A reference data source can be an Amazon S3 object. Your Kinesis Data Analytics SQL application reads the S3 object in its entirety when it starts running. The data is made accessible in your SQL code through a table. The most common use case for using a reference data source is to enrich the data coming from the streaming data source using a SQL JOIN.

Using the Amazon CLI, you can add a reference data source by specifying the S3 bucket, object, IAM role, and associated schema. Kinesis Data Analytics loads this data when you start the application, and reloads it each time you make any update API call.

Q: What data formats are supported for SQL applications?

SQL applications in Kinesis Data Analytics can detect the schema and automatically parses UTF-8 encoded JSON and CSV records using the DiscoverInputSchema API. This schema is applied to the data read from the stream as part of the insertion into an in-application stream.

For other UTF-8 encoded data that does not use a delimiter, uses a different delimiter than CSV, or in cases were the discovery API did not fully discover the schema, you can define a schema using the interactive schema editor or use string manipulation functions to structure your data. For more information, see Using the Schema Discovery Feature and Related Editing in the Amazon Kinesis Data Analytics for SQL Developer Guide. 

Q: How is my input stream exposed to my SQL code?

Kinesis Data Analytics for SQL applies your specified schema and inserts your data into one or more in-application streams for streaming sources, and a single SQL table for reference sources. The default number of in-application streams is the one that meets the needs of most of your use cases. You should increase this if you find that your application is not keeping up with the latest data in your source stream as defined by CloudWatch metric MillisBehindLatest. The number of in-application streams required is impacted by both the amount of throughput in your source stream and your query complexity. The parameter for specifying the number of in-application streams that are mapped to your source stream is called input parallelism. 

Authoring application code for SQL applications

Application code is a series of SQL statements that process input and produce output. These SQL statements operate on in-application streams and reference tables. An in-application stream is like a continuously updating table on which you can perform the SELECT and INSERT SQL operations. Your configured sources and destinations are exposed to your SQL code through in-application streams. You can also create additional in-application streams to store intermediate query results. 

You can use the following pattern to work with in-application streams: 

  • Always use a SELECT statement in the context of an INSERT statement. When you select rows, you insert results into another in-application stream.
  • Use an INSERT statement in the context of a pump. You use a pump to make an INSERT statement continuous, and write to an in-application stream.
  • You use a pump to tie in-application streams together, selecting from one in-application stream and inserting into another in-application stream.

The following SQL code provides a simple, working application:  

CREATE OR REPLACE STREAM "DESTINATION_SQL_STREAM" (
    ticker_symbol VARCHAR(4),
    change DOUBLE,
    price DOUBLE);

CREATE OR REPLACE PUMP "STREAM_PUMP" AS 
  INSERT INTO "DESTINATION_SQL_STREAM"    
    SELECT STREAM ticker_symbol, change, price    
    FROM "SOURCE_SQL_STREAM_001";

For more information about application code, see Application Code in the Amazon Kinesis Data Analytics for SQL Developer Guide. 

Q: How does Kinesis Data Analytics help me with writing SQL code?

Kinesis Data Analytics includes a library of analytics templates for common use cases including streaming filters, tumbling time windows, and anomaly detection. You can access these templates from the SQL editor in the Amazon Web Services Management Console. After you create an application and navigate to the SQL editor, the templates are available in the upper-left corner of the console.

Q: How can I perform real-time anomaly detection in Kinesis Data Analytics?

Kinesis Data Analytics includes pre-built SQL functions for several advanced analytics including one for anomaly detection. You can simply make a call to this function from your SQL code for detecting anomalies in real-time. Kinesis Data Analytics uses the Random Cut Forest algorithm to implement anomaly detection. 

Configuring destinations in SQL applications

Q: What destinations are supported?

Kinesis Data Analytics for SQL supports up to three destinations per application. You can persist SQL results to Amazon S3, Amazon Redshift, and Amazon Elasticsearch Service (through Amazon Kinesis Data Firehose), and Amazon Kinesis Data Streams. You can write to a destination not directly supported by Kinesis Data Analytics by sending SQL results to Amazon Kinesis Data Streams, and leveraging its integration with Amazon Lambda to send to a destination of your choice.

Q: How do I set up a destination?

In your application code, you write the output of SQL statements to one or more in-application streams. Optionally, you can add an output configuration to your application to persist everything written to specific in-application streams to up to four external destinations. These external destinations can be an Amazon S3 bucket, Amazon Redshift table, Amazon Elasticsearch Service domain (through Amazon Kinesis Data Firehose) and an Amazon Kinesis data stream. Each application supports up to four destinations, which can be any combination of the above. For more information, see Configuring Output Streams in the Amazon Kinesis Data Analytics for SQL Developer Guide.

Q: My preferred destination is not directly supported. How can I send SQL results to this destination?

You can use Amazon Lambda to write to a destination that is not directly supported using Kinesis Data Analytics for SQL applications. We recommend that you write results to an Amazon Kinesis data stream, and then use Amazon Lambda to read the processed results and send it to the destination of your choice. For more information, see the Example: Amazon Lambda Integration in the Amazon Kinesis Data Analytics for SQL Developer Guide. Alternatively, you can use a Kinesis Data Firehose delivery stream to load the data into Amazon S3, and then trigger an Amazon Lambda function to read that data and send it to the destination of your choice.

Q: What delivery model does Kinesis Data Analytics provide?

SQL applications in Kinesis Data Analytics uses an "at least once" delivery model for application output to the configured destinations. Kinesis Data Analytics applications take internal checkpoints, which are points in time when output records were delivered to the destinations and there was no data loss. The service uses the checkpoints as needed to ensure that your application output is delivered at least once to the configured destinations. For more information about the delivery model, see Configuring Application Output in the Amazon Kinesis Data Analytics for SQL Developer Guide. 

Start to Build for Free with Amazon Web Services

Start to Build for Free with Amazon Web Services

Close
Hot Contact Us

Hotline Contact Us

1010 0766
Beijing Region
Operated By Sinnet
1010 0966
Ningxia Region
Operated By NWCD