ACTS Blog Selection
We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
Tracking Pixel driven web analytics with Amazon Web Services Edge Services: Part 1
Being able to analyze web traffic and user behavior is essential to understanding the impacts of new features, content updates, or current product iterations for websites and applications. Tracking website activity can provide insight into who visits your website, where they come from, and what content they view. A web beacon is a common technique used to track user behavior. By embedding a small piece of HTML, a company can increase visibility into how users interact with its products. Common metrics tracked by web beacons are events-per-hour, visitor count, user agents, abnormal events, aggregate event count, referrers, and recent events. Although there are many types of HTML elements that can be used as beacons, images were the first web beacons used and are the focus of this post.
A tracking pixel consists of using a 1×1 pixel image to leverage the image loading call to send the tracking information to a backend server. Instead of using a traditional JavaScript API call, the information is sent in the parameters of the image GET request and on the HTTP headers themselves, making it possible to include it in any component supporting HTML, like a webpage or even an email.
We covered building a tracking pixel in our post
Architecture overview
The architecture for the pixel tracking solution starts with CloudFront, which is a CDN service built for high performance and security. A 1×1 pixel image is stored in an origin
is specified where CloudFront sends requests for files. A distribution can use several different kinds of origins. Using Amazon S3 as an origin for the distribution has CloudFront deliver files stored in the S3 bucket. When a request is made to fetch the pixel image, the data identified is passed with the request. The information on this request is collected and stored by c-ip
, timestamp
, cs-uri-query
, etc. The sampling rate is used to control the percentage of log records to be received. For a pixel tracking use-case, 100% should be set. Then, real-time log configuration is attached to the distribution’s cache behavior.
With real-time logs, you can customize the information collected and where it gets delivered. The real-time logs are integrated with
Amazon Kinesis Data Firehose is a streaming ETL service used as the easiest way to load streaming data into data stores and analytics tools. The streaming information is buffered to 15 minutes to consolidate the information into fewer files to store in the data lake. This helps minimize the costs of storage and future queries. Data lakes help break down data silos to maximize end-to-end data insights. Amazon S3 is the best place to build data lakes because of its unmatched durability, availability, scalability, security, compliance, and audit capabilities. A data catalog is used to provide the search and query capabilities of the data stored in the data lake.
Cost optimization
Data transfer out (DTO) for CloudFront is free for origin fetches from any Amazon Web Services origin. Using Amazon S3 to host the pixel image ensures no charges are incurred when fetching the image. A further improvement that can be made is adding aggregation. Aggregation refers to the storage of multiple records in a Kinesis data stream record, and it increases the number of records sent per API call and increases producer throughput. This is a cost optimization improvement because, at the time of this publication, Kinesis Data Firehouse ingestion pricing is tiered and billed per GB ingested in 5KB increments. You can use
KPL is currently only available as a Java API wrapper around a C++ executable, which may not be suitable for all deployment environments. LATEST
shard iterator, processing incoming records, and combining them into the Kinesis user record. The Lambda function can be configured to produce a user record that contains roughly 5KB of Kinesis records. Then, the Lambda function publishes to another Kinesis Data Stream and the data is compatible with consumers using the
Potential use cases
Advertising Technology (Ad Tech)
Ad tech exchange servers track billable beacons through a combination of data collection, tracking mechanisms, and reporting systems. Cloudfront real-time logs can help you monitor and analyze the incoming beacons in real-time, providing valuable insights into ad performance and enabling prompt decision-making. Ad tech platforms collect data from various sources, such as publishers, advertisers, and third-party providers, including user behavior, ad impressions, clicks, conversions, and other relevant metrics. Tracking mechanisms like cookies, pixels, tags, or JavaScript code embedded within web pages or mobile apps are used to monitor user interactions with ads. Furthermore, when a user interacts with an ad, a beacon is sent to the ad tech exchange server.
Personal blog
A personal blog would leverage pixel tracking in the way demonstrated in this post for similar reasons to ad tech, but at a smaller scale and more personal level. A person could track activities across particular blog topics or have a deeper understanding of their viewers’ geographic locations to drive their content selection. If more of their users are consuming their content on a mobile platform, then perhaps it makes more sense to tailor blogging toward more short-form content to be easily digested on-the-go. On the smaller scale of a personal blog, each impression is inherently more valuable than impressions at scale, and using pixel tracking to squeeze every insight possible out of these impressions can be a differentiator for growth.
E-Commerce
With the objective to ultimately getting a user to make a sale, e-commerce platforms that have a better understanding of the online patterns and habits of their users can drive increased sales. Is the front page not recommending pertinent items and then users end their browsing session? Are users abandoning their cart at a specific step in the sales cycle? Answering these questions through the analysis of information collected through tracking pixels can create a more integrated and streamline online shopping experience for customers.
Conclusion
In this post, we showed how you can replace web beacon servers using Amazon Web Services edge services with CloudFront. Amazon Web Services edge services improve performance by moving compute, data processing, and storage closer to end-user devices. CloudFront then leverages Amazon Web Services analytical services Kinesis Data Streams, Kinesis Data Firehose, Amazon S3, Glue Data Catalog, and Athena to enable insight into website activity. This solution’s usage of managed and serverless services makes it advantageous over a traditional beacon server by providing automatic scaling and cost savings, with a pay-for-use billing model. If you are interested in learning other ways to customize at the edge, then checkout out our
In Part 2 of this series, we demonstrate how to implement the pixel tracking solution with CloudFront real-time logs.