It’s hard to outrun The Terminator or escape The Matrix

When it comes to thinking about AI, it’s hard to set aside the immersive world of sci-fi blockbusters. Futurists and sci-fi writers have always had enormous influence on Silicon Valley. But often their assumptions and visions have proven faulty. Star Trek is great, but does it seem like we’re heading for a future without money?

The Matrix, Terminator, and other gripping stories paint AGI as a singular, omnipotent, all-controlling entity. These visions of super-intelligent AIs taking over the world permeate the psyche of Silicon Valley, shaping how leaders perceive the trajectory of artificial intelligence. While the logic that leads us there makes sense, like many utopian or dystopian visions (crypto?), it lacks some street smarts.

Instead of a singular all-knowing GPT 10 doing all the thinking for mankind, we see a different future.

Introducing: “Intelligence Islands”

Imagine an archipelago, where each island represents a distinct domain of knowledge. Some islands are vast, covering broad topics like public coding frameworks. Others are smaller, more intricate, representing specialized knowledge like a company’s pricing strategy. These are “Intelligence Islands.”

Our conviction is that many thousands of Intelligence Islands will be serviced by their own purpose-built system for the foreseeable future. The massive economic incentives to keep trade secrets, codebases, business processes and more hidden from the prying eyes of foundational models aren’t going to just disappear overnight.

Maybe Skynet can’t do my job

A good demonstration case is in an area we already see being transformed by AI.

Software development is one of the jobs AI is already changing rapidly. Developers I speak to are at all different stages when it comes to using LLMs, but it’s clear that the disruption has begun. Trained on the vast public corpus of functional code, the likes of GPT 4 and Bard are already competent script writers. Clearly, developers post-GPT will be much much more productive, but it doesn’t seem like a given to most developers—even those who are deep on the latest LLM-powered tools—that their days as a profession are numbered.

The obvious reason is that LLMs know a lot about writing code, but without purpose-built systems, they know nothing about the intricate patterns of a specific company’s legacy systems, the unique quirks of dated software, and the labyrinth of undocumented fixes from yesteryear. They have general information, but they know nothing about local information.

The example of CodeRabbit, a tool that aims to automate and replace peer code reviews is instructive. To do an effective code review, a developer or AI needs to know more than just how to write code. You have to know

The goal of the PR
The code it fits into
The response patterns of any internal or external services it calls, and more.

In this blog post published by CodeRabbit, the challenges and layers of local information required come to life. Here’s an excerpt of the post:

CodeRabbit is not just a simple wrapper that passes-through calls and responses of LLM models. To circumvent context size limits, CodeRabbit uses an innovative, multi-LLM and multi-stage approach to scale reviews for larger change sets. Unlike AI-based code completion tools, code reviews are a much more complex problem. The reviewer context is much broader than the developer context, as the reviewer needs to uncover not just obvious issues but also understand the larger context of the pull request and changes across multiple files. Below is a glimpse into the challenges we faced and the solutions we came up with:

Context window size: The LLM models have limited context windows, for instance, gpt-3.5-turbo has a context window of 4K or 16K tokens and gpt-4 has a context window of 8K tokens. This is often insufficient to pack larger change sets. To circumvent this, we provide various summaries while reviewing changes to each file and by smartly prioritizing context that is packed in each request.

Inputting and outputting structured content: LLMs are particularly bad at understanding and generating structured content and mathematical computation. We had to design new input formats, that are closer to how humans understand changes, instead of using the standard unified diff format. We also had to provide few-shot examples to the LLMs to get the desired results.

Noise: LLMs are terrible at differentiating between noise and signal. For instance, if you ask LLMs for 20 suggestions, you will get them, but only a few of them will be useful. This is particularly true for code reviews. We had to design a multi-stage review process that reinforces the signal and filters out the noise.

Costs: While advanced models like gpt-4 are great in performing complex tasks, they are several orders of magnitude more expensive than models like gpt-3.5-turbo. We had to design a multi-model approach that uses simpler models for summarizations, while complex models are used for tasks such as reviewing code. In addition, simpler models act as a triage filter that identifies the changes that need to be thoroughly reviewed by more complex models.

Inaccuracies: LLMs are not perfect and often return inaccurate results, and they sometimes even ignore instructions and completely fabricate a response. Rather than keep fighting the LLMs we wrote layers of sanity checks to fix or hide the inaccuracies from the user.

Data privacy: The biggest concern from our users is whether their code is being stored and used to train the models. We made sure that all queries to LLMs are ephemeral, and the data is discarded right away from our service. At the same time, it’s challenging to provide stateful incremental reviews without storing the data. We had to design a system that stores all state within the pull request itself and not in our service for maximum privacy.

CodeRabbit

This dispatch from the front lines of creating LLM-powered solutions provides clear understanding of how Intelligence Islands come to be. Some of these limitations may be obviated by future OpenAI releases or AGI advances. That said, there are some physics to these challenges that won’t go away anytime soon.

What defines the boundaries of an Intelligence Island?

We’ve identified the following factors that set the scale and boundaries of an Intelligence Island:

Required Precision

When answers have to be precise, that means that more control has to be exercised within an LLM solution. As noted in the CodeRabbit example, if the solution has to identify as many or more issues than an expert human agent, usually that is a pretty high bar of precision. The more a solution aims to replace a human (as in Intelligence-as-a-Service), a higher precision bar is required. Controlling the weight of inputs into an answer is one of the most important parts of quality control. These decisions tend to require weighting local, up-to-date, or high-trust information sources—hallmarks of an intelligence island. Conversely, where precision is less important, general answers direct from the general intelligence are acceptable.

Trust Boundaries

If information required for a quality answer from an AI system isn’t already included in the foundational model training set, it suggests the need for an intelligence island. Any answers that require sensitive information, or simply rapidly changing information aren’t ever going to be included in any general model. As noted above, this kind of information can simply be called up by an intelligent agent working with a foundational model. But this requires trust. Our conviction is that over time, enterprises and every other player will come to trust that their sensitive data won’t be ingested into the foundational models they rely on. However, we think that trust will take time. To address the most significant trust gaps, companies and their vendors will build purpose-built intelligent systems leveraging open LLM foundational models. These are the epitome of Intelligence Islands.

Information Scale

Some sensitive information is narrow and clear. For example, a system that interacts with an LLM could serve up a short summary of all the transactions the company has had with a given client. This likely would easily fit in a context window. If, however, a quality answer requires synthesizing a thousand different source documents, that’s a different ballgame. A system that builds consulting deliverables out of thousands of documents given to past clients requires some special-purpose semi-intelligence. Not a full-scale LLM trained on the document repository, but at least vector embeddings and some semantic search capabilities to dynamically create usable context for an LLM system.

Network Effects

Some domains are so sensitive, their local information can’t be combined across participants. For example, it might be incredibly powerful for an AI to know all the pricing models used internally by all players in a market, but natural anti-competitive forces and even regulation will likely make it impossible to unlock that value. Other domains are less sensitive, so combining non-public local information from many sources may be possible. A single system used by competing hospitals may be able to create shared learnings on how patient outcomes can be optimized. Where the value of network effects overcome the costs of information sharing, we’ll see larger Intelligence Islands.

Derivative Adjacencies

Some Intelligence Islands are likely to merge over time. These expanding boundaries will exist when the information needs required to dominate one island reach a threshold where questions about another domain start to achieve higher and higher levels of accuracy. For example, a system that is intended to score investment properties gets enough prospectus documents ingested, it may well become adept at writing said documents, or predicting certain local market conditions.

Conclusions

Our conviction is that many systems with intelligence will rise to dominate their island. While the foundational model titans will improve and widen their zone of competence, there are major forces constraining their access to local information. Startups and investors that understand the contours of these land masses will have a decisive advantage in the coming years. The structure of information, criticality of private training sets, and economic incentives will all play a key role. As compute costs decrease and AI makes it easier to unlock the value sequestered information, we will see a kind of plate tectonics—mashing together islands into larger continents. Even so, the criticality of protected information may prove decisive in reserving significant opportunities for far more players than the sci-fi writers would predict.

We’re looking for AI founders and like-minded limited partners.