This Git repository hosts a comprehensive data engineering project centered around real-time streaming data analysis within the AWS ecosystem. Leveraging a suite of AWS tools, including Cloud9, Kinesis Streams / Data Firehose , S3 Buckets, Glue, and Athena, the project showcases the end-to-end data pipeline for ingesting, processing, and querying streaming data.
Key Features:
Utilizes AWS Cloud9 for event generation and management in the cloud. Harnesses boto3 for seamless interaction with AWS services. Transmits and stores data records efficiently in the Parquet format within an S3 Bucket through Kinesis Data Firehose. Implements AWS Glue for automated data catalog creation, ensuring data accessibility and management. Empowers real-time data exploration and analysis using Athena. This repository serves as a valuable resource for data engineers looking to architect and implement robust, real-time data processing pipelines in an AWS environment.