Apache Avro

2019. 11. 13. 12:17Backend

Wikipedia def

Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.

 

Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services. Avro uses a schema to structure the data that is being encoded. It has two different types of schema languages; one for human editing (Avro IDL) and another which is more machine-readable based on (JSON).

 

It is similar to Thrift and Protocol Buffers, but does not require running a code-generation program when a schema changes (unless desired for statically-typed languages).

Apache Spark SQL can access Avro as a data source.

 


Avro provides:

  • Rich data structures.
  • A compact, fast, binary data format.
  • A container file, to store persistent data.
  • Remote procedure call (RPC).
  • Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.

 

 

 

Reference

https://avro.apache.org/docs/1.9.1/

 

Apache Avro™ 1.9.1 Documentation

Apache Avro™ 1.9.1 Documentation Introduction Apache Avro™ is a data serialization system. Avro provides: Rich data structures. A

avro.apache.org

https://en.wikipedia.org/wiki/Apache_Avro

'Backend' 카테고리의 다른 글

Nginx deep dive  (0) 2020.04.09
Code review (Worklog)  (0) 2020.04.09
Abstract Factory pattern  (0) 2019.11.12
State pattern VS strategy pattern, what is the difference?  (0) 2019.11.10
Logger, Log4J? Slf4j?  (0) 2019.11.10