Apache Arrow 0.9.0 发布了。Apache Arrow 是 Apache 基金会的顶级项目之一。它的目的是作为一个跨平台的数据层来加快大数据分析项目的运行速度。它包含一组规范的内存中的平面和分层数据表示,以及多种语言绑定以进行结构操作。 它还提供低架构流式传输和批量消息传递,零拷贝进程间通信(IPC)和矢量化的内存分析库。
更新内容:
新特性和改进
- ARROW-1021 – [Python] Add documentation about using pyarrow from other Cython and C++ projects
- ARROW-1035 – [Python] Add ASV benchmarks for streaming columnar deserialization
- ARROW-1394 – [Plasma] Add optional extension for allocating memory on GPUs
- ARROW-1463 – [JAVA] Restructure ValueVector hierarchy to minimize compile-time generated code
- ARROW-1579 – [Java] Add dockerized test setup to validate Spark integration
- ARROW-1580 – [Python] Instructions for setting up nightly builds on Linux
- ARROW-1623 – [C++] Add convenience method to construct Buffer from a string that owns its memory
- ARROW-1632 – [Python] Permit categorical conversions in Table.to_pandas on a per-column basis
- ARROW-1643 – [Python] Accept hdfs:// prefixes in parquet.read_table and attempt to connect to HDFS
- ARROW-1705 – [Python] Create StructArray from sequence of dicts given a known data type
- ARROW-1706 – [Python] StructArray.from_arrays should handle sequences that are coercible to arrays
- ARROW-1712 – [C++] Add method to BinaryBuilder to reserve space for value data
- ARROW-1757 – [C++] Add DictionaryArray::FromArrays alternate ctor that can check or sanitized “untrusted” indices
- ARROW-1815 – [Java] Rename MapVector to StructVector
转自 https://www.oschina.net/news/94482/apache-arrow-0-9-0-released