Leveraging Large Language Models for your next app

Large language models (LLMs) like GPT-4, Claude 2, and Llama 2 have exploded in popularity, unlocking new possibilities for developers looking to build intelligent applications. While the hype is real, working with LLMs does require some nuance. In this post, we’ll break down different techniques for integrating LLMs into your apps, ordered by increasing complexity.

Prompting

A prompt contains any of the following elements:

Instruction – a specific task or instruction you want the model to perform
Context – external information or additional context that can steer the model to better responses
Input Data – the input or question that we are interested to find a response for
Output Indicator – the type or format of the output

The easiest way to leverage an LLM is through prompting. This simply involves sending the model instructions and seeing how it responds. For example, you could prompt an LLM to write a blog post by giving it a topic and length. The key is framing the prompt in a way that elicits the desired output. With a well-constructed prompt, you can build a basic prototype in a matter of minutes or hours, no training data required.

Prompting is great for mocking up ideas and testing capabilities quickly. However, performance can be inconsistent, since the model hasn’t been explicitly trained on your use case. Tweaking the prompt and trying different phrasing is often necessary. Think of prompting as a starting point that gives you a taste of what’s possible.

One-Shot, Few-Shot Learning & Chain-of-Thought Prompting

If prompting alone doesn’t cut it, one-shot or few-shot learning may help. Here, in addition to a prompt, you provide a small number of examples that demonstrate the task you want the LLM to complete. For instance, to build a chatbot, you could show it a few examples of question-answer pairs.

By learning from just a handful of examples, LLMs can produce more reliable results for a given task. One-shot learning uses a single example, while few-shot learning uses 2-9 examples. This technique bridges the gap between prompting and full fine-tuning.

Chain-of-thought (CoT) prompting, on the other hand, enables complex reasoning capabilities through intermediate reasoning steps. You can combine it with few-shot prompting to get better results on more complex tasks that require reasoning before responding.

Fine-Tuning

For more rigorous training, you’ll want to fine-tune your LLM on a dataset specific to your application. Fine-tuning involves further training of an already pretrained model on new data representing your problem.

The benefit of fine-tuning is that you can tailor the model directly to your use case versus relying on its general knowledge. Performance will be more predictable and accurate. Most state-of-the-art results with LLMs today utilize fine-tuning.

The downside is that you need a training dataset, which can be time-consuming to create. Hundreds to thousands of examples are ideal but even a couple dozen can be effective depending on the task. As long as your data is high-quality and representative, fine-tuning can take LLMs to the next level.

Luckily, the tools for fine-tuning LLMs are becoming more accessible. Services like MonsterTuner from Monster API allow you to upload a dataset and launch fine-tuning jobs taking a low-code or no-code approach. As these capabilities spread, fine-tuning will become standard practice for many applications.

Pretraining

Currently, most developers are using general-purpose LLMs pretrained on large text corpora. But you can also pretrain your own model from scratch if you have very specialized needs and lots of in-domain training data. Pretraining requires massive computational resources – you need hundreds of GPUs running for weeks or months to pretrain an LLM using self-supervised learning. Unsurprisingly, very few organizations attempt this. For most use cases, leveraging publicly available pretrained models gets you most of the way there.

That said, pretraining does enable tailoring to highly specialized domains. For example, Bloomberg is pretraining LLMs on financial texts and Google’s med-PaLM 2 is pretraining within the medical domain. If you operate in a niche area with unique data, exploring pretraining could be worthwhile. Otherwise, stick to the techniques above.

In summary, integrating LLMs into your applications is more accessible than ever. Start with prompting for quick prototyping, then level up to few-shot learning and fine-tuning as needed. With the right technique for your use case and resources, LLM-powered apps are within reach for all developers. What will you build next?

Guest contributor Priyank Kapadia is Product and Technology Partner at Accolite, delivering solutions through design-led product engineering and advising clients to adopt Generative AI responsibly. Any opinions expressed in this article are strictly those of the author.