Generative AI for beginners

Generative AI itself explains that basically it's Artificial Intelligence which generates something.

Here are some general guidelines to consider when using generative AI tools:

  1. Ethical Considerations: Always consider the ethical implications of any content you generate. Avoid creating content that could be interpreted as misleading, offensive, or harmful to others.

  2. Legal Considerations: Be aware of any potential copyright violations or other regulations. Do not use generative AI to produce content that infringes on intellectual property rights, violates privacy, or stands in conflict with applicable laws.

  3. Transparency: Disclose any content that has been generated using AI tools in an effort to maintain transparency and make others aware of the nature of the source.

  4. Bias and Fairness: AI models may inadvertently perpetuate biases present in data used in training, so exercise caution by taking steps to recognize issues, and make the effort to mitigate it.

  5. Security Measures: Access to generative AI tools, and outputs produced should be secured to prevent unauthorized use or the distribution of potentially harmful content.

The birth of Artificial Intelligence

Alan Turing proposed during 90s about machines ability to think.

Within Machine Learning, we have 1)Supervised learning

  1. Unsupervised Learning

  1. Perceptron

All timeline of human, system interaction

The rise of Deep Learning

By the early 2000s, the power of the processor had grown significantly and reignited AI research. Systems could now play games like chess and were so good, they were defeating Grandmaster champions. Around the same time, software companies were releasing the next generation of computer vision and voice recognition applications. When the Internet blossomed into what was dubbed Web 2.0 this brought to market improved web-based applications and better tools that gave users the ability to create even more information. These accomplishments all serve as catalyst in the explosion of computers and data that would help make systems even smarter. Earlier we mentioned neural networks, which are a type of AI model that is inspired by how the human brain works. Similar to the brain, neural networks are composed of billions of artificial neurons or nerve cells that process and transmit information. These cells are interconnected by a complex network that allows the neurons to pass signals to other neurons. Every connected layer is capable of applying a weighted influence or numerical value, which determines how much influence it has with other connections. Like the perceptron, neurons in the input layer receive data, performs a calculation, and then passes the result on to the next layer. This process is repeated in each layer until the output layer is reached, where a function completes the processing and delivers the final results. Neural networks are trained using sample data, along with examples of expected outputs. The algorithm adjust the weights repeatedly until the output is more desirable. With lots of training and weight adjustments, neural networks eventually are able to perform human like tasks with high degrees of accuracy. Other key advantages of neural networks are: the ability to model complex non-linear relationships, adapt to new data, and process complex items like images and languages. Historically, complex tasks were difficult for algorithms to complete. Neural networks were pioneered the development of intelligent applications that could perform language translation, computer vision, image, and speech recognition. The downside to developing these applications was the requirement for large amounts of data and powerful computers, which increased development time and cost. Technologies like Cloud computing would evolve and help address these factors by offering more resources and services for training AI applications. As a result, neural network architectures have become more accessible and provide design and development opportunities for many. Deep learning is a branch of machine learning that employs techniques borrowed from organic brains. Like other AI innovations, deep learning is not a new concept. It took the arrival of fast processors, data at Internet scale, and new services to accelerate its development. Researchers and engineers are now able to conduct research and develop applications with deep learning over the course of days and weeks, as opposed to months and years.

The first forms of Generative AI

Earlier, we mentioned Eliza, a primitive chatbot created in the 1960s. Eliza represented, at the time natural language processing that followed a simple pattern of recognizing words as input to generate a response. This chatbot's ability to talk proved that machines were capable of mimicking human speech. The problem, however, was early machines were interpreted words, character, data, which didn't provide meaning the way a human would. By Weizenbaum's own admission, Eliza was sort of a parody of a human with little intelligence. Even with limited speech capabilities, Eliza did offer promise and opened the door for future advances in natural language processing for decades to come.

Natural language processing, or NLP, is when a computer uses algorithms and information about linguistics to better understand language. This includes syntax or word definition, classification, text processing, and sentiment analysis.

Generative AI Models

Let's start with what is prompt

A prompt is a short piece of text that is used to provide input or instructions to a generative AI system. When you submit a prompt, you are giving the system the information it needs to generate something

Prompts are very important to the development of generative AI because while these systems are full of knowledge, there's no clear way to demonstrate the knowledge they have, so they rely on prompts. A good prompt acts like a writing instruction. It gives a generative AI app a starting point and direction.

For example, if you wanted to create a poem about nature, you could prompt the system with write a short poem about nature and appreciating the natural world.

The prompt will provide important context, like the topic, the length of the response, and a style or type of format you want. Without the prompt providing these details, it would be unclear to the AI system what output to generate.

What is a model?

Models are at the core of generative AI and are built with neural networks, system resources, data, and prompts, all working together. Unlike AI innovations of the past, models extend far beyond simple classification and prediction to create output that is unique, wide ranging, and in most cases, indistinguishable from humans.

When you build a model, you fill it with the knowledge it needs to generate unique output based on what it has learned. Neural networks were inspired by the human brain, but a model can be thought of as the actual brains of generative AI. Think about songs, which are an integral part of our lives, and hardly a day goes by where we don't listen to a favorite playlist or hear tunes coming out of a speaker. Now, if someone were to hand you a microphone or guitar and request a new song, it's likely you will be able to produce something based on your listening experience.

In some ways, the ability to create a song on request, no matter how good or bad, represents the human way of generating unique output, with help from previous examples. Okay, probably not the best analogy, but it will help in continuing to describe data, word relationships, and training.

Some of the early AI models could create text, speech, and other successive pieces of data. Considered groundbreaking at the time, they were limited in what they could produce. It's safe to say that these models and their ability were rather small.

It would take deep learning techniques that rapidly process vast amounts of data to create the performance and scale found in models today.

In fact, they are called large language models because of their size, complexity to develop, and for the amount of data they can process or generate, it's worth repeating. When you leverage data from across the Internet as a source, that means access to many petabytes of information at your disposal. This further equates to hundreds of billions of word passages, phrases, and patterns in many languages, which doesn't get much larger than that.

How do models work?

Large language models, or LLMs for short, excel at understanding and creating natural language due to the neural network architecture used for development.

The transformer is the most common architecture and is a huge upgrade from previous architectures that could perform a single task like determining if a comment is positive or negative, or translating between languages.

Transformers act as encoders to process input sequentially and decoders to generate the desired output. The power of a transformer is being able to pack numerous capabilities into a single model. Popular LLMs today like BERT, GPT-3, and Claude provide chat conversation, software code creation, text summarization, and short stories for children.

In addition to language processing, transformers can also be used for creating models that generate images, synthetic data, and audio output.

Let's have a high level overview of the inner workings of a large language model. We already know that huge amounts of data containing word patterns and language structure are used. How those patterns and structures get processed at scale is possible with the help from three key components, They are embeddings, positional encodings, and self attention.

Large language models are good at understanding and generating natural language. Earlier in the course, we talked about how primitive models interpreted words as character data, which limited their ability to process language effectively.

Current techniques represent words as encoded mathematical vectors called embeddings. Embeddings makes it easier for a model to associate words with similar meaning by identifying similarities in their vector locations. This ultimately helps a model perform many language tasks like word prediction. For example, we are all aware that apples and bananas are similar because they are classified as fruit. A spaceship, on the other hand, is not a type of food and therefore would carry a different location within a vector space from words representing foods or fruit.

If you would have prompted an LLM with a phrase like apples and blank are healthy food options, bananas would be a much more logical choice than spaceship or other words not located where foods are represented or embedded.

Other advantages of word embeddings are they can be used to perform more complex mathematical operations to reveal deeper relationships in text. For example, a computation result that includes a vector for Madrid and a vector for Spain, along with a vector for France, is likely to yield a value that closely represents a vector for Paris.

The way this is prompted using natural language would look something like Madrid is to Spain what blank is to France.

As response to this prompt, Paris has a high probability for being generated.

Word embeddings provide a dense and low dimensional representation which reduces model complexity. This helps with shortening training times and the size of the model.

Word embeddings allow a model to be pre trained on large, unlabeled or unsupervised data used for those other tasks like classification, sentiment analysis, or summaries, which gives a large language model so many capabilities.

Positional encoding is a technique used to identify word order and position. Take the following sentence, the hat sits on the table.

Without positional encoding, the model has no understanding that the is the first word in the sentence, hat is the second, and so on. It would just recognize a collection of independent words. Positional encoding assigns each word of value that uniquely represents its location in the sentence,

for example, the could be assigned the value, hat could be assigned, sits could be assigned, and the encoding assignments would continue until every word is represented.

These positional encodings now make it easier for a model to determine that the phrase, the hat sits on the table has different meaning than at the table sits a man wearing a hat.

Even though the words that are in both sentences, hats, sits, and table, might have the same embeddings, their positional encodings are different.

With time and more data for training, the model will get better with meaning and context for billions of sentences with various word orders and other patterns.

The third core mechanism in a transformer is self attention. With self attention, the model can look at the entire prompt and learn which words are most relevant for predicting additional words.

That is, the model pays attention to what provides the most context for a human. This is similar to reading and focusing more on the important words to help grasp the overall meaning.

What's innovative about self attention is that as words are analyzed, the model starts to create an internal interpretation of language automatically.

For example, a model will eventually learn that a programmer, software engineer, and developer are all synonymous. Transformers also naturally learn rules of grammar, gender, and word tense, giving it the ability to produce a more intelligent and reasonable language.

A well developed model that is given an input instruction to create five fictional names for an airline pilot will produce both male and female pilot names because it knows not all pilots are men. Again, self attention is very effective at understanding language better if focused on the input text as well. Here are two additional examples.

Self attention, along with knowledge of the embeddings and positional encodings will help the model understand the meaning of the words based on the other words that surround them.

When the model processes server in the first sentence, it will attend to other words like check or ask and ultimately determine that the sentence is referring to a human.

In a large dataset with lots of phrases and statements, it is likely that related examples will help establish this learning pattern. In the second sentence, the model attends to the word check again but this time it will also use hardware as part of the context to determine that server has a meaning that is synonymous with computer.

Again, having additional examples to train against is a huge help. Let's look at two more examples.

This prompt could be interpreted by the model as a bird residing in its natural habitat.

Could interpret the crane and this input as a large piece of machinery used for building a bridge.

If the above statements were prompt inputs to an image generation model, we would expect the first image to depict a bird by the river

and the second, an image of a bridge with a machine nearby.

Again, it depends on the location of words, the attention placed on surrounding words and lots of data available for the LLM to learn from.

Here's a list of some key criteria that should be considered when designing a large language model:

Data

The data used for training models is crucial. It should be high quality, diverse, and representative of the topics and tasks the model will be expected to handle. More data is typically better as it provides additional context for learning. However, great care should be taken to ensure that the data comes from reliable sources.

Model architecture:

The structure and number of parameters contained within a model should be suitable for performing all tasks. Parameters define characteristics like the amount of embeddings and tention and weights applied during development and training. Like knobs on a machine, parameters are adjusted to properly optimize performance. Large language models contain hundreds of billions of parameters, with some models being developed exceeding the trillion parameter mark.

Bigger does not always mean better, as trade offs must be made between the size, training time, computational cost, and complexity. Large language models for medicine and health care should be designed specifically for that use case and may exclude tasks like creative writing or audio generation that aren't relevant to patient care.

Training methods :

The techniques used to train a model must consider factors like transfer, learning from existing models, data accuracy and calibration for continuous improvement. Models that suffer from a lack of training often miss important patterns in the data set which can carry over to the newly generated data. This creates the potential for hallucination or details that seem realistic but are not grounded in the actual training data.

For example, an image model may hallucinate a tree in the background of an ocean scene even if no images of trees in the ocean was included as part of the training data. The risk of hallucination increases as generative models become more powerful and advanced at producing realistic content. False details often go unnoticed and is problematic when output needs to accurately reflect the real world. A large language model could hallucinate false information or inaccurate quotes. The effects could be damaging if it involves areas like news reporting or legal reviews in a court case. Earlier, we mentioned transfer learning from existing models.

One approach that involves transferred learning is fine tuning,

which is a process that extends the capabilities of a pre trained model with new data or tasks. Rather than starting with a new model, fine tuning customizes an existing model for a desired use case, making the bill process more efficient. Another benefit of fine tuning is that it incrementally improves model performance on the new dataset while maintaining the general capabilities learned during pre training.

Overall, knowledge transfer and fine tuning helps create more specialized generative AI tools and services.

Safety and ethics:

Careful thought should be put into how to make models safe, avoid harmful biases, and ensure ethical AI practices are followed. Techniques like human oversight, information filtering, and parameter adjustments that adhere to best practices help. Model design should make personal privacy and confidentiality a priority by avoiding details like an individual's address of residence, phone number, medical history, and other sensitive information.

Evaluation:

Rigorous testing of diverse datasets is needed to understand model capabilities and limitations. Automatic metrics and human evaluation should be used. Users of generative AI applications should be allowed to provide feedback for model operators to review and make adjustments accordingly.

Deployment:

Real world requirements like latency, compute cost, and model size should be considered when determining optimal designs for a production system.