An AVRO file is a data file created by Apache Avro, an open source data serialization system used by Apache Hadoop. It contains data serialized in a compact binary format and schema in JSON format that defines the data types. AVRO files may also store markers if the datasets are too large and need to be split into subsets when processed by Apache MapReduce in Apache Hadoop.
Avro was developed within the Apache Hadoop project, which is an open source platform used to store and process structured, semi-unstructured, and structured data without any format requirements. Avro is utilized in Apache Hadoop as a serialization format for persistent, or infrequently accessed, data. Since it stores data in a compact binary format, the Avro format is especially useful for the exchange of extremely large datasets.
NOTE: Avro can also be accessed as a data source by Apache Spark SQL.