ETL Technologies

Parquet Data Store and AWS Glue

Parquet

Apache parquet is an open source file format and stores data in a columns rather than rows based storage for csv files. This columnar structure lowers latency and improves hardware utilization.

Columnar Storage

Unlike in a CSV file, data is stored in columns in a parquet file. Columnar storage ensures that queries in a large table can be routed directly to columns that those query pertain to. This structure greatly reduces the I/O processes and costs.

Superior Compression

Given data type for all values in a column is same, compression of such data is faster.

AWS Glue

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development.

Data Integration

You can integrate data easily by using visual or code-based interface in AWS Glue.

Access Data

Users can easily access and find data using AWS Glue Data Catalog.