Apache Arrow 0.9.0 发布，内存数据交换格式-Linuxeden开源社区

Apache Arrow 0.9.0 发布了。Apache Arrow 是 Apache 基金会的顶级项目之一。它的目的是作为一个跨平台的数据层来加快大数据分析项目的运行速度。它包含一组规范的内存中的平面和分层数据表示，以及多种语言绑定以进行结构操作。它还提供低架构流式传输和批量消息传递，零拷贝进程间通信（IPC）和矢量化的内存分析库。

更新内容：

新特性和改进

ARROW-1021 – [Python] Add documentation about using pyarrow from other Cython and C++ projects
ARROW-1035 – [Python] Add ASV benchmarks for streaming columnar deserialization
ARROW-1394 – [Plasma] Add optional extension for allocating memory on GPUs
ARROW-1463 – [JAVA] Restructure ValueVector hierarchy to minimize compile-time generated code
ARROW-1579 – [Java] Add dockerized test setup to validate Spark integration
ARROW-1580 – [Python] Instructions for setting up nightly builds on Linux
ARROW-1623 – [C++] Add convenience method to construct Buffer from a string that owns its memory
ARROW-1632 – [Python] Permit categorical conversions in Table.to_pandas on a per-column basis
ARROW-1643 – [Python] Accept hdfs:// prefixes in parquet.read_table and attempt to connect to HDFS
ARROW-1705 – [Python] Create StructArray from sequence of dicts given a known data type
ARROW-1706 – [Python] StructArray.from_arrays should handle sequences that are coercible to arrays
ARROW-1712 – [C++] Add method to BinaryBuilder to reserve space for value data
ARROW-1757 – [C++] Add DictionaryArray::FromArrays alternate ctor that can check or sanitized “untrusted” indices
ARROW-1815 – [Java] Rename MapVector to StructVector

更多内容请完整更新列表和下载地址。

转自 https://www.oschina.net/news/94482/apache-arrow-0-9-0-released

新特性和改进

相关推荐