
Linux 基金会和 Graviti 宣布 OpenBytes 项目,使开放数据更容易被所有人访问

Graviti 领导社区的开发人员和数据科学家创建数据标准和格式,使任何人都能做出贡献

Linux 基金会会员峰会,加利福尼亚州纳帕谷,2021 年 11 月 2 日——Linux 基金会,一个通过开源实现大规模创新的非营利组织,今天宣布了由 Graviti 牵头的新的 OpenBytes 项目[1]。OpenBytes 项目致力于通过创建数据标准和格式使开放数据更加可用和可访问。

Edward Cui 是 Graviti 的创始人,之前是 Uber 高级技术组的机器学习专家。“在很长一段时间里,大量的人工智能项目都因为缺乏来自真实用例的高质量数据而停滞不前,”Edward 说。“如果人工智能发展要取得进展,获取更高质量的数据是至关重要的。为了实现这一目标,迫切需要一个建立在协作和创新基础上的开放数据社区。Graviti 认为,扮演好自己的角色是我们的社会责任。”

通过创建一个开放的数据标准和格式,Project OpenBytes 可以降低数据贡献者的责任风险。数据集持有人通常不愿意公开分享他们的数据集,因为他们缺乏各种数据许可的知识。如果数据贡献者明白他们对数据的所有权得到了很好的保护,他们的数据不会被滥用,就会有更多的开放数据可以访问。

OpenBytes 项目还将在其开放平台上创建数据发布、共享和交换的标准格式。统一的格式将帮助数据贡献者和用户轻松地找到他们需要的相关数据,并使协作更容易。OpenBytes 的这些功能将使高质量的数据更加可用和可访问,这对整个人工智能社区非常有价值,并将节省大量重复数据收集的金钱和人力资源。

Linux 基金会项目总经理和高级副总裁 Mike Dolan 说:“OpenBytes 项目和社区将使所有人工智能开发人员受益,包括学术和专业人员,以及大型和小型企业的人员,使他们能够访问更高质量的开放数据集,并使人工智能部署更快、更容易。”

最大的科技公司已经意识到开放数据的潜力,以及它如何能带来新的学术机器学习突破,并产生重大商业价值。然而,目前还没有一个成熟的开放数据社区,在不同的组织之间进行中立和透明的协作治理。在 Linux 基金会的管理下,OpenBytes 的目标是创建数据标准和格式,支持高质量数据的贡献,更重要的是,以协作和透明的方式进行管理。

Supporting Quotes


“As one of the earliest AI/ML companies in the U.S., ElectrifAi is happy to support the OpenBytes project. We believe OpenBytes will help in the sharing of trusted datasets and accelerate practical AI/ML to solve real business problems,” said Luming Wang, CTO, ElectrifAi.

Jina AI

“The future of software is being eaten by open source, as well as data-sharing. OpenByte’s announcement is a great signal for all developers on the accessibility of datasets. We are very excited to see standardized datasets available to a broader community, which will massively benefit AI engineers,” said Bing He, Co-founder & COO at Jina AI.


“Project OpenBytes will be essential to establish a vibrant open source dataset community. At Motional we are happy to contribute our freely available nuScenes and nuPlan datasets to this community. By standardizing datasets and licenses, we are making an important step towards interoperable machine learning systems and in particular safer autonomous vehicles,” said Holger Caesar, Data-Algorithms Team Lead at Motional.


“At Predibase, we’re building the open source Ludwig AI project to make state-of-the-art deep learning accessible to everyone, but the biggest barrier to tackling more tasks has always been the lack of standards for training datasets over unstructured data like text and images. Project OpenBytes provides a common structure to unstructured data that makes it possible for low-code deep learning tools like Ludwig to automate a host of advanced computer vision, NLP, and other machine learning tasks that previously required bespoke solutions. I’m excited to see how the combination of OpenBytes and Ludwig can enable data scientists and ML engineers to spend less time figuring out how to stitch data and models together, and more time solving their business problems.”


“Data is crucial to the success of any Artificial Intelligence project. By sharing open datasets, Project OpenBytes will help more developers to understand, develop, and adopt AI/ML technologies. Project OpenBytes will be a fundamental component of the open-source AI ecosystem. At Zilliz, we are glad to participate and make contributions to this significant initiative,” said Jun Gu, Partner of Zilliz.


[1] OpenBytes 项目: https://www.openbytes.io/