Pablo Elgueta is a Machine Learning Specialist. In this article, he discusses making data accessible through AI.
LLMs are machine learning models trained on large datasets with billions or trillions of words or tokens. Their architecture, called the transformer, is what makes them different. This architecture scales in performance with the amount of data it’s been trained on. So, the more data, the better the performance. However, LLMs are generally provided by the companies that made them as an API-as-a-service. This makes a difference in the way they are consumed and integrated.
Google, Meta, and Open AI are the biggest companies in the industry right now, building these models. If you wanted to implement machine learning technology earlier, you had first to build a model in-house. This meant hiring data scientists, and machine learning engineers, collecting data, building data pipelines, etc. This was very expensive, difficult, and time-consuming.
LLMs are very versatile. They can perform multiple tasks, like presentation, summarization, and content creation.
LLMs can be complex. They have abstracted elements. We access them through an API layer, which stretches the complexity, making integration easier.
A traditional development cycle had the following steps –
- Data collection
- Data curation
- Data transformation
- Train the model
- Data exploration
- Data Validation
- Evaluate
- Integrate into the app
- Test
This model and development cycle was time-consuming and expensive. But, now, companies are providing all of this through APIs, which makes it cheaper, easier, and provides wider access.
We have a whole new stack of technology. We have frameworks like LangChain, LlamaIndex, and Vector databases, which are not new but are being used in a new way. There are new problems like prompt engineering. We also see new paradigms, like the creation of agents. And finally, we have new challenges—token limits, hallucinations, and context windows.
Despite simplifying the building of AI-powered applications, others have given rise to a new technical stack. We have the models themselves, OpenAI 3.5 and 4, Graphics, Cloud, and many open-source models. We also have embedded models, for example, which convert text into vectors, which then can be used in vector databases for retrieval. We have orchestration frameworks, like Langchain and LlamaIndex, which coordinate these pieces and integrate them into our traditional apps. We also have evolution frameworks to allow us to test and measure the performance and accuracy of our models. And we have external APIs that provide interaction. This is a whole new ecosystem. The teams need to retrain and learn new skills.
Then, in terms of the problems, we have prompt engineering. Prompts are the instructions we provide to the model. They have various components. First is the instruction, where we tell them what you do, how you use external information, if we provide it, what to do with the query, and how to construct the output, for example. We also provide external information in the prompt, providing context symbols providing additional sources of knowledge through a vector database or API access.
They are extremely complex, and it is still not widely understood how they operate to get the results they provide and how we can get them to provide useful results consistently.
The creation of agents is a new paradigm. Agents are LLMs that have been given access to tools, e.g., Python Interpreter, Google search, SQL database, or Microsoft 365 apps. It is a new paradigm because LLMs are very versatile. They can translate, summarize, etc., but they are still limited, and agents must fix those limitations. We can have multiple agents that are experts in different things. For example, an agent that’s an expert in Python and can do some math. This provides a lot of versatility to our model creation and complexity and is still super new.
Every new technology comes with its set of challenges. For example, LLMs have token limits. LLMs are still constrained by technical capacity, like computing, video cards, etc. They have limitations on the number of tokens they can process per minute. This introduces problems, for example, when sending parallel calls to the API, especially if you have thousands or millions of users using your application simultaneously. There are ways of mitigating it. But, it still needs to be considered when building LLM applications.
We also have a context window, which can be a problem. A context window is the amount of data that an envelope can ingest per call or for the attention span of the model. But models like GPT have a limit of 1000 tokens, while Anthropics Cloud can reach 100,000. This may seem like a lot, but it is still limited.
Finally, we have hallucinations, which are cases where the model will create information that is not true.
It is not consistent and is a complex problem to handle. It is not entirely clear as to why it happens.
Using LLMs
Consider using LLMs to democratize access to data by reducing technical barriers to using natural language. In organizations, stakeholders need access to data to make decisions. Currently, most data are stored in databases, which are usually SQL-based. To retrieve information from this database, you need to make it query in SQL format. So that means you need to know SQL. You need a team of analysts who can create queries on a database and format data such that it is appropriate for consumption.
This means stakeholders don’t have immediate access to data; they must wait for the analysts to provide it. We can improve decision-making speed, productivity, and even human resources usage by allowing LLMs to act as an interpreter between natural language and SQL queries.
This is how LLMs help us improve productivity.
This is still in its infancy. We’re still beta testing. It’s all very new, but you can see the possibilities for documenting efficiency and empowering humans and stakeholders to do more with less.