Amazon Athena Training Summary
Overview of Amazon Athena
- Serverless Query Service: Athena is a serverless service that allows you to analyze data in S3 using SQL.
- Built on Presto: Utilizes the Presto engine for SQL queries.
- Direct Analysis: Analyzes data directly in S3 without the need to move it.
- Supports Various Formats: Works with CSV, JSON, ORC, Avro, Parquet, among others.
- Cost-Effective: Charges are based on the amount of data scanned per terabyte, with no need for database provisioning.
Integration with Amazon QuickSight
- Reporting and Dashboards: Athena is often used with Amazon QuickSight for creating visual reports and dashboards.
Use Cases for Amazon Athena
- Ad-hoc querying
- Business intelligence (BI)
- Analytics and reporting
- Log analysis from various AWS services (e.g., VPC flow logs, ELB logs, CloudTrail)
Performance Improvements
- Columnar Data Types: Use Apache Parquet or ORC for cost savings and performance, as they scan only needed columns.
- Data Compression: Implement data compression to reduce the amount of data to scan.
- Data Partitioning: Organize data in S3 with a clear structure to improve query efficiency.
- Larger Files: Prefer larger files over many small ones to reduce overhead and improve scan efficiency.
Federated Query Feature