Hello, and welcome to
“Introduction to Large Language Models”. My name is John Ewald and I am a
Training Developer here at Google Cloud. In this course, you learn to:Define Large
Language Models (LLMs) Describe LLM Use Cases
Explain Prompt Tuning and describe Google’s Gen AI Development tools Large Language Models (or LLMs) are a subset
of Deep Learning. To find out more about Deep Learning, see
our Introduction to Generative AI course video. LLMs and Generative AI intersect and they
are both a part of Deep Learning. Another area of AI you may be hearing a lot
about is generative AI. This is a type of artificial intelligence
that can produce new content –including text images, audio, and synthetic data. So, what are large language models? Large language models refer to large, general-purpose
language models that can be pre-trained and then fine-tuned for specific purposes. What do pre-trained and fine-tuned mean? Imagine training a dog. Often you train your dog basic commands such
as sit, come, down, and stay. These commands are normally sufficient for
everyday life and help your dog become a good canine citizen. However, if you need a special-service dog
such as a police dog, a guide dog, or a hunting dog, you add special trainings. This similar idea applies to large language
models. These models are trained for general purposes
to solve common language problems such as text classification, question answering, document
summarization, and text generation across industries. The models can then be tailored to solve specific
problems in different fields such as retail, finance, and entertainment, using a relatively
small size of field datasets. Let’s further break down the concept into
three major features of large language models. Large indicates two meanings, First is the
enormous size of the training dataset, sometimes at the petabyte scale. Second it refers to the parameter count (in
ML, parameters are often called hyperparameters). Parameters are basically the memories and
the knowledge that the machine learned from the model training. Parameters define the skill of a model in
solving a problem, such as predicting text. General-purpose means that the models are
sufficient to solve common problems. Two reasons lead to this idea: First is the
commonality of a human language regardless of the specific tasks, and second is the resource
restriction. Only certain organizations have the capability
to train such large language models with huge datasets and a tremendous number of parameters. How about letting them create fundamental
language models for others to use? This leads to the last point, pre-trained
and fine-tuned, meaning to pre-train a large language model for a general purpose with
a large dataset and then fine-tune it for specific aims with a much smaller dataset. The benefits of using large language models
are straightforward: First, a single model can be used for different tasks. This is a dream come true. These large language models that are trained
with petabytes of data and generate billions of parameters are smart enough to solve different
tasks including language translation, sentence completion, text classification, question
answering, and more. Second, large language models require minimal
field training data when you tailor them to solve your specific problem. Large language models obtain decent performance
even with little domain training data. In other words, they can be used for few-shot
or even zero-shot scenarios. In machine learning, “few-shot” refers
to training a model with minimal data and “zero-shot” implies that a model can recognize
things that have not explicitly been taught in the training before. Third, the performance of large language models
is continuously growing when you add more data and parameters. Let’s take PaLM as an example. In April 2022, Google released PaLM(short
for Pathways Language Model), a 540 billion-parameter model that achieves a state-of-the-art performance
across multiple language tasks. PaLM is a dense decoder-only Transformer model. It has 540 billion parameters. It leverages the new Pathways system, which
enabled Google to efficiently train a single model across multiple TPU v4 Pods. Pathway is a new AI architecture that will
handle many tasks at once, learn new tasks quickly, and reflect a better understanding
of the world. The system enables PaLM to orchestrate distributed
computation for accelerators. We previously mentioned that PaLM is a transformer
model. A Transformer model consists of encoder and
decoder. The encoder encodes the input sequence and
passes it to the decoder, which learns how to decode the representations for
a relevant task. We’ve come a long way from traditional programming,
to neural networks, to generative models! In traditional programming, we used to have
to hard code the rules for distinguishing a cat -
type: animal, legs: 4,
ears: 2, fur: yes,
likes: yarn, catnip. In the wave of neural networks, we could give
the network pictures of cats and dogs and ask - “Is this a cat” - and it would predict
a cat. In the generative wave, we - as users - can
generate our own content - whether it be text, images, audio, video, etc. For example, models like PaLM (or Pathways
Language Model or LaMDA (or Language Model for Dialogue Applications) ingest very, very
large data from multiple sources across the Internet and build foundation language models
we can use simply by asking a question - whether typing it into a prompt or verbally talking
into the prompt. We - as users - can use these language models
to generate text or answer questions or summarize data, among other things. So, when you ask it “what’s a cat”,
it can give you everything it has learned about a cat. Let’s compare LLM Development using pre-trained
models with traditional ML development. First, with LLM Development, you don’t need
to be an expert. You don’t need training examples, and there
is no need to train a model. All you need to do is think about prompt design,
which is the process of creating a prompt that is clear, concise, and informative. It is an important part of natural language
processing (NLP). In traditional machine learning, you need
expertise, training examples, train a model and compute time and hardware. Let’s take a look at an example of a Text
Generation use case. Question answering (QA) is a subfield of natural
language processing that deals with the task of automatically answering questions posed
in natural language. QA systems are typically trained on a large
amount of text and code, and they are able to answer a wide range of questions, including
factual, definitional, and opinion-based questions. The key here is that you needed domain knowledge
to develop these Question Answering models. For example, domain knowledge is required
to develop a question answering model for customer IT Support, or Healthcare or Supply
Chain. Using Generative QA, the model generates free
text directly based on the context. There is no need for domain knowledge. Let’s look at three questions given to Bard,
a large language model chatbot developed by Google AI. Question 1 - This year’s sales are 100,000
dollars. Expenses are 60,000 dollars. How much is net profit? Bard first shares how net profit is calculated
then performs the calculation. Then, Bard provides the definition of net
profit. Here is another question: Inventory on hand
is 6,000 units. New order requires 8,000 units. How many units do I need to fill to complete
the order? Again, Bard answers the question by performing
the calculation. And our last example, We have 1,000 sensors
in ten geographic regions. How many sensors do we have on average in
each region? Bard answers the question with an example
on how to solve the problem and some additional context. In each of my questions, a desired response
was obtained. This is due to Prompt Design. Prompt design and prompt engineering are two
closely related concepts in natural language processing. Both involve the process of creating a prompt
that is clear, concise, and informative. However, there are some key differences between
the two. Prompt design is the process of creating a
prompt that is tailored to the specific task that the system is being asked to perform. For example, if the system is being asked
to translate a text from English to French, the prompt should be written in English and
should specify that the translation should be in French. Prompt engineering is the process of creating
a prompt that is designed to improve performance. This may involve using domain-specific knowledge,
providing examples of the desired output, or using keywords that are known to be effective
for the specific system. In general, prompt design is a more general
concept, while prompt engineering is a more specialized concept. Prompt design is essential for, while prompt
engineering is only necessary for systems that require a high degree of accuracy or
performance. There are three kinds of Large Language Models
- Generic Language Models, Instruction Tuned, and Dialog Tuned. Each needs prompting in a different way. Generic Language Models predict predict the
next word (technically token)based on the language in the training data. This is an example of a generic language model
- The next word is a token based on the language in the training data. In this example, the cat sat on —- the next
word should be “the” and you can see that “the” is the most likely next word. Think of this type as an “auto-complete”
in search. In Instruction tuned, the model is trained
to predict a response to the instructions given in the input. For example, summarized a text of “x”,
generate a poem in the style of ‘x”, give me a list of keywords based on semantic similarity
for “x”. And in this example, classify the text into
neutral, negative or positive. In Dialog tuned, the model is trained to have
a dialog by the next response. Dialog-tuned models are a special case of
instruction tuned where requests are typically framed as questions to a chat bot. Dialog tuning is expected to be in the context
of a longer back and forth conversation, and typically works better with natural question-like
phrasings. Chain of thought reasoning is the observation
that models are better at getting the right answer when they first output text that explains
the reason for the answer. Let’s look at the question: Roger has 5
tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? This question is posed initially with no response. The model is less likely to get the correct
answer directly. However, by the time the second question is
asked, the output is more likely to end with the correct answer. A model that can do everything has practical
limitations. Task-specific tuning can make LLMs more reliable. Vertex AI provides task specific foundation
models. Let’s say you have a use case where you
need to gather sentiments (or how your customers are feeling about your product or service),
you can use the classification task sentiment analysis task model. Same for vision tasks - if you need to perform
occupancy analytics, there is a task specific model for your use case. Tuning a model enables you to customize the
model response based on examples of the task that you want the model to perform. It is essentially the process of adapting
a model to a new domain or set of custom use cases by training the model on new data. For example, we may collect training data
and “tune” the model specifically for the legal or medical domain. You can also further tuned the model by “fine-tuning”,
where you bring your own dataset and retrain the model by tuning every weight in the LLM. This requires a big training job and hosting
your own fine-tuned model. Here is an example of a medical foundation
model trained on Healthcare data. The tasks include question answering, image
analysis, finding similar patients, etc. Fine-tuning is expensive and not realistic
in many cases. So, are there more efficient methods of tuning? Yes. Parameter-Efficient Tuning Methods are methods
for tuning a large language model on your own custom data without duplicating the model. The base model itself is not altered. Instead, a small number of add-on layers are
tuned, which can be swapped in and out at inference time. Generative AI Studio lets you quickly explore
and customize generative AI models that you can leverage in your applications on Google
Cloud. Generative AI Studio helps developers create
and deploy generative AI models by providing a variety of tools and resources that make
it easy to get started. For example, there is a: Library of pre-trained models
Tool for fine-tuning models Tool for deploying models to production
Community forum for developers to share ideas and collaborate Generative AI App Builder
lets you create Gen AI apps without having to write any code. Gen AI App Builder has A: Drag-and-drop interface that makes it easy
to design and build apps. Visual editor that makes it easy to create
and edit app content. Built-in search engine that allows users to
search for information within the app. Conversational AI engine that allows users
to interact with the app using natural language. You can create your own: Chatbots
Digital assistants Custom search engines
Knowledge bases Training applications
And more PaLM API let’s you test and experiment with Google’s Large Language Models and
Gen AI tools. To make prototyping quick and more accessible,
developers can integrate PaLM API with MakerSuite and use it to access the API using a graphical
user interface. The suite includes a number of different tools,
such as a model training tool, a model deployment tool, and a model monitoring tool. The model training tool helps developers train
ML models on their data using different algorithms. The model deployment tool helps developers
deploy ML models to production with a number of different deployment options. The model monitoring tool helps developers
monitor the performance of their ML models in production using a dashboard and a number
of different metrics.