Exploring the intersection of AI and Blockchain
As Generative AI moves mainstream, the opportunity for Blockchain to demonstrate a new case for product-market-fit emerges.
AI has clearly captured the narrative in the tech world and mainstream tech coverage broadly. The accessibility of the technology through tools like ChatGPT is easy for the public to experiment with and grab ahold of. The potential for the continued evolution of these tools is easy to get excited about (and terrified by, depending upon who you speak with. What is equally exciting from an investment standpoint, yet far less focused upon, is the potential impact that blockchain technology can have in catalyzing this field and the sets of solutions that can be explored with AI and machine learning models beyond what LLMs are demonstrating today.
“It’s Airbnb - But for Compute”
Perhaps the leading idea that has been explored here surrounds the potential to solve the critical supply constraints in the market for the latest generation GPUs that are at the forefront of this acceleration of AI development by creating decentralized networks of computational resources. Demand for these newest generation chips is so acute that you now see venture funds reportedly investing $100M into their communal data centers as an enticement for founders to take their capital and access these scarce resources.
The past few years have seen companies like Render, primarily focused on graphic rendering via decentralized GPUs, Akash, and recently launched Gensyn, which enables anybody to provide access to compute resources for use in ML training and other computationally intensive activities all emerge to attempt to address this supply problem when it comes to accessing computing resources.
Another way we’ve seen businesses look to solve these computational bottlenecks is the Filecoin Virtual Machine (FVM). By functioning as an operating system for the Filecoin network, it allows more intricate operations to occur - including processing data directly at its source by AI models. This could greatly boost the development of AI and create new opportunities to harness its capabilities. The FVM also plays well with Ethereum, and AI systems built on Ethereum can tap into Filecoin's storage capabilities effortlessly, creating more innovation potential. It doesn’t grant builders access to the scarce GPU resources actively sought-after today. But it exemplifies some of the innovation occurring in the data and compute space as AI and data science continue to accelerate.
While decentralized GPU networks, used to train AI models, are an intriguing idea, but they have challenges and advantages. One of the main difficulties is related to the speed and reliability of data transfer. Since the GPUs are spread across different locations, sending data between them can sometimes be slow and uncertain, potentially slowing down the AI training process and impacting its effectiveness.
Another challenge is data security and privacy. Considering a network of GPUs strung together globally, ensuring that every GPU location is secure and trustworthy in a decentralized network can be difficult without a set design and compliance standard assigned to the network participants regarding the environments they enter into the network. Without some assurances of certain security standards, concerns arise about the safety and privacy of the data being processed.
Despite these challenges, decentralized GPU networks have their advantages. They are particularly beneficial when the focus is on preventing potential data censorship or when there's a need to ensure the utmost privacy of the data involved- i.e., a scenario where centralized actors like AWS, GCP, or Azure can censor or provide outside access to the data running through their systems. In these cases, the decentralized nature of the network provides a unique advantage by making it harder for any single party to control or access the data.
Lastly, it's important to note that running a decentralized GPU network requires significant computational power and energy. For some tasks, traditional centralized solutions (where all the data and processing power are located in one place) might be a more cost-effective and efficient choice.
So, while decentralized GPU networks come with some technical difficulties, they also offer unique benefits, and one can see a pathway to where these challenges are alleviated and where this model could have legs under the right design and oversight. Long term I would bet on some form of this model being a real alternative to the centralized systems we have access to today.
The Data Opportunity for Blockchain AI
During my 4+ year stint at Two Sigma, I learned the immense value of data in constructing exclusive models that helped generate alpha. I also noticed an intriguing paradox: while copious amounts of data are up for analysis, there is just as much invaluable data out there securely stored away in corporate data centers and restricted databases (like HIPAA-compliant EMRs or corporate ERP systems).
For example, Electronic Medical Records (EMR) represent perhaps one of the more exciting design spaces considering how blockchain-based data models can be combined with cryptographic techniques to unlock new possibilities in AI and healthcare research. Current estimates are that any individual patient produces more than 80 Mb/year of electronic imaging and medical records data annually, which equates to >26 petabytes (26 million GB) of data in just the United States alone.
Bringing more data to the blockchain could break down these data silos, unlock sensitive data currently protected by privacy concerns and corporate secrets, and potentially address issues such as bias in AI and privacy concerns to ensure that AI systems are transparent and trustworthy.
These advancements could pave the way for a new era of responsible AI that respects individual rights and data privacy, demonstrating blockchain technology's potential in catalyzing AI and machine learning models.
Examples: MPC, ZK, and Federated Learning
It is worth exploring three main approaches to decentralized data analysis and AI that could uniquely benefit from decentralized approaches enabled by blockchain technology. These ideas include Multi-Party Computation (MPC), Zero Knowledge (ZK) Proofs, and Federated Learning.
Multi-Party Computation (MPC) is a cryptographic technique that enables joint computation on private inputs from multiple individuals. It is a powerful tool in the realm of blockchain for leveraging sensitive data to perform calculations in a secure and efficient distributed environment. Researchers across the globe can participate and provide their own data securely through MPC.
MPC can immensely benefit healthcare institutions striving to collaborate on scientific research while preserving the privacy of sensitive data from researchers at other institutions. Specific proprietary datasets can allow hospital systems to generate insightful analyses from larger datasets than they could access individually. By overcoming critical privacy limitations, MPC could pave the way for medical breakthroughs and foster collaborative innovation in various fields.
The second technique is Zero Knowledge (ZK) Proofs, which allow one party to prove to another that they know a value x without conveying any information apart from the fact they know the value x. In the context of AI and EMR data, ZK proofs can be used to verify the integrity and authenticity of the data without revealing the data itself. This can be used to ensure that AI models are trained on valid and relevant data while preserving the privacy of individual patients. Here we can imagine a scenario where one could reliably index vast medical records and identify individual datasets matching specific criteria without access to the underlying datasets. This could be especially interesting in the context of clinical trial recruitment or other research activities, and companies like Genobank provide examples where blockchain-based models for collecting, storing, and verifying genetic data while putting the power of this data in the hands of the individual instead of other third-party institutions.
The third technique is Federated Learning, which is not a cryptographic technique but can be combined with MPC and ZK proofs to train AI models on decentralized data. In federated learning, the AI model is trained on each user's device, and only the model updates (not the actual data) are sent back to the server. This allows the AI to learn from a large dataset without the data ever leaving the user's device. J.P. Morgan’s internal Onyx Blockchain team has demonstrated some unique applications here, a primary example of how this combination has unique potential in financial data applications.
Federated learning is also exciting in the context of “X to Earn” models that have proven to be successful in the Decentralized Physical Infrastructure Network (DePIN) market, where we could see collaborative modeling efforts rewarding individual researchers with shared ownership of IP they generate, verified cryptographically by attaching their signatures and identity to their work and then collectively sharing in the monetization of this work. In contrast, traditional research models have resulted in IP being held at a centralized entity level, as with OpenAI and other research labs.
Oasis Protocol is another group in this area, developing solutions for responsible AI, addressing fairness and bias in AI models, pipelines for AI that protect individual data, and flexible confidentiality tools for NFTs. They are also building infrastructure for Data DAOs to support individual data rights, reward data owners for data use, and handle data with confidentiality and privacy, with verifiable transparency in its operations. They or an upstart could emerge as a critical piece of this decentralized data and AI stack.
While these techniques provide strong privacy guarantees, they also come with computational and complexity costs. MPC and ZK proofs, in particular, can be computationally intensive, making them slower than traditional methods. Additionally, implementing these techniques requires careful attention to security to ensure that they don't introduce new vulnerabilities.
But by applying these privacy-preserving tools like MPC, ZK proofs, and federated learning, researchers can analyze sensitive EMR data without compromising privacy. These tools can pave the way for revolutionary blockchain-based solutions and broaden the applicability of generative AI models in healthcare and other privacy-focused fields. As the field advances, we can expect these techniques to become increasingly widespread, fuelled by the evolution of blockchain and hardware infrastructure. Costs will likely decrease, making these technologies more accessible and efficient.
Identity in a world of ambiguous content
As AI permeates all creative processes, it's becoming increasingly critical to differentiate between reality and what's generated through generative models like ChatGPT, Stable Diffusion, and an exponentially expanding set of generative tools.
Identity in the context of AI is of increasing concern as we see numerous examples of deep fakes emerging and generative AI leading to where it becomes increasingly difficult to ascertain whether everything from content to images to audio has originated from a human-generated source or a generative AI model. Proving authenticity will be just as consequential as the potential unlocked by using AI in society.
Blockchains enable us to verify the validity of data and its provenance from either a generative model, a human contributor, or a content producer. Furthermore, this provenance and cryptographic proof of origin is similarly important when we consider the models that we’re seeing created globally - it will be important at some point soon for one to demonstrate not just whether the content you produce or consume came from a human or an AI - but also which AI model(s) were used to generate the content to ensure they aren’t generated by known malicious sources or biased actors. Yes, things are getting complicated…
We can utilize this provable identity model embedded in the immutable nature of blockchains to have all types of data and IP verifiably linked to their creators, which is critical in both assigning ownership and validity of data as well as combating misinformation that is likely to arise in greater volume online through the use of these powerful AI tools.
While much of the investment world has seemingly migrated away from blockchain and focused on AI as the next exciting area of investment, it is important to recognize the investment opportunities that are emerging in the blockchain ecosystem that specifically benefit from this renewed focus on AI and could be amazing accelerants to the work being done across the range of generative AI businesses that are emerging daily. They can also help us navigate an increasing set of challenges brought on by generative AI as it becomes a more ubiquitous element of our daily life.
At Factor, we’re actively looking to invest in founders building with this intersection in mind, as it is an area of specific interest and expertise. We hope to see more founders who share in the optimism that these technologies can be so complimentary and potentially impactful for society.
Thanks for reading - Subscribe for free to receive new posts and follow our investments and research.