Hive Tables And File Formats

Hive Tables And File Formats

Nov 19, 2022ยท

1 min read

Hive Tables

Internal Table Screenshot 2022-11-19 at 2.05.28 PM.png

  • internal data managed by hive
  • external data stored in warehouse

External Table

Screenshot 2022-11-19 at 2.11.42 PM.png

  • internal data managed by hive
  • external data is point to source'

File Formats

Parquet File

Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes with enhanced performance to handle complex data in bulk

ORC File

[The Optimized Row Columnar (ORC)]( file format provides a highly efficient way to store Hive data. It was designed to overcome limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data

Avro File

Avro files include markers that can be used to split large data sets into subsets suitable for Apache MapReduce processing. Some data exchange services use a code generator to interpret the data definition and produce code to access the data. Avro doesn't require this step, making it ideal for scripting languages

CSV File

A CSV (comma-separated values) file is a text file that has a specific format which allows data to be saved in a table structured format

Comparision between Parquet, Orc, Avro