Table of contents Improving language models: This is how it works with OpenLlaMA and the Transformer library Basic models as token predictors Preparing training data and fine-tuning Further adapting the application and AI Outlook Read article in iX 10/2023
Large language models such as GPT-4, Bard or LLaMA have changed the everyday office life of many people within a very short time. Writing a perfectly worded email, summarizing documents or implementing the bubble sort algorithm in Python – all done in no time thanks to large language models (LLMs). However, they do not necessarily provide the perfect answer to all questions asked in the form of prompts. This is often because instructions described in the prompt are not sufficiently covered in the LLM’s training data, leaving the LLM uncertain about the correct answer. LLMs sometimes tend to hallucinate and give convincing-sounding but wrong answers.
Fine-tuning a large language model counteracts this. It is particularly suitable if the LLM is used exclusively within a specific domain. Through fine-tuning, large language models can, for example, acquire much more in-depth knowledge in specialist areas, such as medicine or a sub-discipline thereof. The procedure for building a chatbot from a general language model is identical. Therefore, for demonstration purposes, this article adapts a base model using a publicly available dataset to transform it into a personal assistant.
More on the topic of AI language models: Martin Thissen
Martin Thissen is a content creator on YouTube and Medium as well as a research assistant at Darmstadt University of Applied Sciences. As a content creator, he explains how AI models work and how they can be used.
An LLM who already acts as a personal assistant can acquire a wider range of knowledge by further fine-tuning with data sets from a specific knowledge domain or from their own company. The article shows this using the example of general medical questions. However, a company could also use larger data sets from questions and answers in customer support or sales to retrain an LLM for specific areas of application. The code shown here in excerpts is available on GitHub in the form of a Jupyter notebook and can be used for experiments with your own data with minimal adjustments. The prerequisite for the training are CUDA graphics cards with Ampere architecture or newer. A version adapted for older graphics cards provides a Colab notebook. It can be used with a free T4 GPU from the cloud.
More and more knowledge.
The digital subscription for IT and technology.
All exclusive tests, guides & backgrounds A subscription to all magazines: c’t, iX, MIT Technology Review, Mac & i, Make, read c’t Photography directly in the browser No risk: first month free, then monthly from 9.95 €. Magazine subscribers read even cheaper! Start your FREE month. Try it now for FREE & read more straight away!
Already subscribed to heise+?
Register and read Register now and read article immediately comments_outline_white Comment on post Go to homepage
#Improving #language #models #works #OpenLlaMA #Transformer #library