How Reddit Users Created the First Public Owned LLM
R/Datadao Launched the first Initial Model Offering with ORA and Vana
During the last couple of weeks, the conversation has been centered on Algorithmic trading. This is a theme that excites me for two reasons. I like patterns, trading is all about finding patterns. Second algorithmic trading is about using public data to customize a sequence of predefined actions based on probable scenarios.
The main purpose of this substack is to explore the power of decentralized artificial intelligence applied to repetitive and predictive tasks. Using the public knowledge to improve it.
At the beginning of my AI experience, I worked as Head of Ecosystem for a Decentralized LLM with a train-to-earn data processing model. The logic is simple, the better data you share to train the model better rewards you get. The main challenge was to come up with a transparent and scalable model to weigh the data quality.
Last week I heard about Reddit Data DAO, the r/datadao boasts data from 140K Reddit users, which have come together to train the first user-owned AI model, previously shown above, which it is currently integrating into a key feature on its homepage, and intends to launch with ORA as the world’s first Initial Model Offering (IMO). The DAO has also navigated and completed a sale of its data to an external AI company, the fruits of which it will distribute to members. I think this is a huge step toward Open Source and Decentralization empowerment.
The social meaning of owning an LLM publicly lies in democratizing technology and giving individuals control over the products they help create, ensuring economic upside for contributors.
How the Technology Works
Vana’s Data Liquidity Pools (DLPs)
The technology behind public ownership of AI models is powered by blockchain infrastructure. Vana, an EVM-compatible Layer 1 blockchain, supports Data DAOs, where users contribute data to Data Liquidity Pools (DLPs). These pools validate the quality of data using Vana’s proof-of-contribution mechanism. High-quality data improves AI models, and users earn rewards based on their contributions. The blockchain architecture consists of three key layers:
Data Liquidity Layer: Data is pooled, validated, and ranked by governance tokens (VANA tokens) staked by users.
Data Portability Layer: This layer enables developers to build AI models using the pooled data, creating a marketplace where users can benefit from the value their data creates.
Connectome: A decentralized ledger that tracks data transactions and ensures cross-ecosystem functionality.
Together, these layers ensure transparency, security, and governance in how data is utilized to train AI models. The Reddit Data DAO, for example, has aggregated user data to develop the Reddit Language Model (RLM), which simulates Reddit-like responses to posts. This collective model ownership blurs the line between open-source and proprietary AI by establishing onchain ownership where everyone who contributes to the training data shares in the AI's success.
ORAs Onchain AI Oracle (OAO) Framework
ORA's Onchain AI Oracle (OAO) framework enables r/datadao to connect its decentralized data pools with AI model training in a secure, transparent, and decentralized manner. This framework helps bridge data contributions from users with AI model building by ensuring that data is validated, securely transferred, and properly utilized to train AI models onchain.
For the Reddit Language Model (RLM), ORA's OAO framework manages how user-contributed Reddit data (from r/datadao) is processed and integrated into the LLM, ensuring the AI model captures the tone and sentiment of Reddit conversations.
Support for Initial Model Offering (IMO)
ORA helps enable r/datadao to conduct an Initial Model Offering (IMO), which allows the community to tokenize and commercialize the AI model they have collectively built. The IMO structure helps raise funding for further model development and enables users to own tokens representing stakes in the trained AI model. The collaboration between ORA and Vana allows r/datadao to launch the world’s first user-owned AI model, the Reddit Language Model, by facilitating the deployment of the model onchain through the IMO.
ORA’s infrastructure ensures that user data is kept private while still being leveraged to improve AI capabilities. ORA’s role includes handling the secure aggregation, encryption, and management of data to ensure that the entire AI training and deployment process complies with privacy regulations.
What Reddit users have accomplished can reshape AI conversation.
Appendix
I took this article and a research document I made to prompt GPT the Following.
“If we want to go deep into what social changes represent a user-owned LLM what will you add, go deep, go Noam Chomsky, go Amartya Sen. “
I think it is worth checking out its answer.