Should You Care about Open LLMs?

Jul 30, 2023

TL;DR For most users of AI or developers using AI to build, "open-source" large language models are not very interesting and should be thought of as a distraction or curiosity and only engaged with lightly and infrequently. Proprietary models, offered as cloud-based APIs are, for the time being, the AIs to focus on.

What does is mean for a large language model to be open?

This is an important question with a complicated answer.

The definition of "open-source" as it applies to traditional software is simple: the source code of the software is fully available and under a license that guarantees the ability of anyone to use and adapt the software for a variety of uses and without restriction.

Machine learning models, by their nature, are built not (just) by programming them, but by getting them to "learn" from input data. Therefore, "openness" can include three different aspects (in addition to the licensing terms):

The source code for the model's architecture and training process
The data set used to train the models
The resulting model "weights" (that is, the information resulting from the training - required for making use of the model)

A combination of all three is possible, but not always available. Many people are happy to only get the model weights, since that's what they can use to run the model. That's similar to getting a compiled program of traditional software, available to run for free. In many cases the source code from the model's architecture and training process is also available, making it possible for engineers and researchers to learn about the techniques used for creating the model or create similar or derivative architectures. In some cases, the training data is also available under an open license, making it possible for researchers and engineers to recreate the model (assuming they have the budget for the very expensive training process) or at least learn about how the model was trained.

As for licensing terms, those vary, with some models available completely under "open-source" licenses, and others coming with some restrictions that would not normally be considered open by purists, but may nevertheless be acceptable to some users (for example, Meta's Llama2 comes under a license that places some restrictions that make the model not usable for very large companies serving hundreds of millions of users - Meta's commercial competitors, in other words).

What is the current state of open and closed large language models?

For simplification, let's bucket large language models from various sources to "generations", where each generation offers a range of capabilities.

4th-gen: the most capable models, of extremely large size (~trillion parameters), exhibiting highly sophisticated reasoning and understanding capabilities.
- At the time of writing this, the only model in this category is OpenAI's GPT-4 (proprietary and available exclusively as an API from OpenAI or Microsoft Azure). It is highly likely that other models will enter this category in coming months (with Google DeepMind already making announcements about a future model dubbed "Gemini") and those will almost certainly be closed too.
- There are no open models in this category.
3rd-gen: previous generation models of a very large size (tens~hundreds of billions of parameters), providing very powerful language capabilities, as well as strong, but more limited understanding, analysis, and reasoning.
- Several proprietary models are available in this category, including OpenAI's GPT-3.5 (ChatGPT), Google's PaLM 2 (Bard), and Anthropic's Claude, all available exclusively as cloud-based APIs. In addition, some other proprietary models like Cohere's LLMs may be available for private deployment.
- In the open category there are many contenders but in practice, only two come close to the capabilities of the best closed models: the largest versions of Meta's Llama 2 (70bn parameters), and Falcon (40bn parameters) from Abu Dhabi's Technology Innovation Institute. Neither model is as powerful and versatile as the best closed models of this generation, but both come close and may be used as a replacement for a closed model for some use cases.
2nd-gen: earlier, smaller and less capable models are seeing less use now due to limited capabilities, but there are plenty of closed and mostly open models in this category, including OpenAI's GPT-2 and many similar models from both commercial AI developers and research and community projects. These models can be used primarily for learning about LLMs and for creating simple and fast customised models for specific tasks.

In summary: for state-of-the-art of the art capabilities there is no competition to closed models. Users and developers who can settle for the more limited capabilities of previous generation LLMs may be able to use an open model as a replacement, but for most, proprietary models are the only game in town.

Will open models ever catch up with or exceed the capabilities of closed models?

This is very hard to predict, and there are good arguments in both directions.

For: a lot of people all over the world want open models to succeed and will do whatever they can, including banding together in worldwide organisations and communities and collaborating with / coopting commercial companies, academic and government institutions to improve their chances. Why wouldn't they eventually succeed in catching up? Also, once the market is saturated with enough commercial players relying on a closed strategy, the only way to compete is open, even if the economic rewards are smaller. That's what we've seen with the Linux wave of open-source software, and may be what we're seeing with Meta's approach releasing its Llama models under increasingly permissive licenses.

Against: the costs of developing the best models are enormous, and developers will naturally want to protect their strategic advantage and make money by selling access. Commercial developers of proprietary models can have an additional significant advantage if they can use high quality proprietary data sets (for example licensed books and internet publications) to train their models. Costs will decrease with time, due to falling hardware costs, as well as a reduction in the marginal advantage of proprietary knowledge as the industry and research community catch up with new techniques, but the next generation of models will always be closed, with open models only ever achieving previous generation capabilities.

Is running an open model cheaper than using a closed model?

Not really. Maybe. It depends. The cost for operating a 3rd-gen or 4th-gen large language model is attributed primarily to the cost of using the hardware - large arrays of very expensive GPUs. Commercial developers offering metered access via an API will likely want to add some margin on top of that, in order to cover development costs and eventually make a profit, but with the high cost of operation, and competitive pressure from other commercial providers, this margin is likely to be rather thin. In addition, the largest commercial providers can run an operation that is near optimal, with the costs of managing the data centre driven to the lowest possible and extremely good utilisation (the expensive hardware never sits idle).

Open models need to run on the same hardware. All major cloud providers now offer the ability to deploy and use open models, and you can even purchase your own hardware for operating the models in your basement. Unless you are running a very large operation, though, you will find it difficult to replicate the efficiency and utilisation of a commercial provider.

Assuming equivalent capabilities, it may be possible, in theory, to save costs by using an open model instead of a closed commercial API. In practice, though, it will be hard to save enough on the fees paid to a closed API provider to make up for the relative inefficiency of running your own setup.

Why care about open models?

If the economics of using open LLMs are not advantageous and the capabilities are, at best, equivalent, if not worse, then why do some people care so much about open models? And should you care about them?

One reason people are passionate about open models is because free is its own special pricing category. We simply love free stuff and without stopping to think carefully about the real costs of using AI, it is initially tempting to be lured by the promise of a model that is offered for free. As we already saw, this doesn't really matter much in practice.

For some people, open source is important for what can be called ideological reasons. We should not be dismissive. The last few decades of software development taught us an important lesson in the value of open source as a way to advance societal goals and improve software quality and availability through collaboration. That's clearly a good reason to care about open models, but as long as there are no open models that compete well with closed ones, that might not be a good enough reason to use them.

Open models are very interesting to AI researchers and engineers who want to learn about the current state of large language models and help develop them further. This is really important, because only a small number of people can join one of the few labs that develop the best closed models, but open models make the field accessible to everyone else. The problem, of course, is that most people not working for the largest commercial developers don't have the budget for training these models. And of course the large majority of people using AI to build applications and services are not AI researchers.

The best practical reason to care about open models is if you have no other choice but to use them. There are several scenarios where this might be the case:

You are required to use models in your own controlled environment for security, privacy, or compliance reasons.
You are required to use open-source software exclusively (this is rare, but some companies or government agencies do have rules like this in place)
You want to use AI for applications that are prohibited by the commercial providers, like illegal activities or "controversial" uses like pornography. The commercial providers are rather restrictive, and there are actually many uses that are not possible with closed model APIs.
The fact that these usage scenarios exist and are only possible with open models is both a good reason to care, and a near guarantee that open models will continue playing a role in the development of AI capabilities.

And then there's everyone else. AI is becoming a central component of all software systems, and will be increasingly utilised by all software developers and users to enable new capabilities. For most users and developers, open models just don't matter very much right now. Closed, commercial models offer excellent capabilities at the best price possible, and make it easy for developers and users to focus on making the most of AI.

Eleanor on Everything

Discussion about this post