2019. 11. 13. 12:17ㆍBackend
Wikipedia def
Avro is a row-oriented remote procedure call and data serialization framework developed within Apache's Hadoop project. It uses JSON for defining data types and protocols, and serializes data in a compact binary format.
Its primary use is in Apache Hadoop, where it can provide both a serialization format for persistent data, and a wire format for communication between Hadoop nodes, and from client programs to the Hadoop services. Avro uses a schema to structure the data that is being encoded. It has two different types of schema languages; one for human editing (Avro IDL) and another which is more machine-readable based on (JSON).
It is similar to Thrift and Protocol Buffers, but does not require running a code-generation program when a schema changes (unless desired for statically-typed languages).
Apache Spark SQL can access Avro as a data source.
Avro provides:
- Rich data structures.
- A compact, fast, binary data format.
- A container file, to store persistent data.
- Remote procedure call (RPC).
- Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.
Reference
'Backend' 카테고리의 다른 글
Nginx deep dive (0) | 2020.04.09 |
---|---|
Code review (Worklog) (0) | 2020.04.09 |
Abstract Factory pattern (0) | 2019.11.12 |
State pattern VS strategy pattern, what is the difference? (0) | 2019.11.10 |
Logger, Log4J? Slf4j? (0) | 2019.11.10 |