Algorithm
A sequence of rules that a computer uses to complete a task or solve a mathematical problem.
Learn more: Data science and AI glossary | The Alan Turing Institute
Author
The OED defines an author as - an inventor, founder, or constructor (of something); a creator (cf. sense II.4c). The identity of the author is important for intellectual property and copyright Law. With AI generated work there are a number of complex issues related to this.
Learn more: Authorship in artificial intelligence-generated works
Bias
is about underlying preference leading to a distortion of facts. Bias in AI models refers to output errors caused by skewed training data or by algorithms selecting irrelevant or misleading data traits over meaningful patterns. Such bias can cause models to produce inaccurate, offensive, or misleading predictions.
Big data
A wide-ranging field of research that deals with large datasets. The field has grown rapidly over the past couple of decades as computer systems became capable of storing and analysing the vast amounts of data increasingly being collected about our lives and our planet. A key challenge in big data is working out how to generate useful insights from the data without inappropriately compromising the privacy of the people to whom the data relates.
Learn more: Data science and AI glossary | The Alan Turing Institute
Copyright
provides protection for any intellectual property by ensuring that it cannot be copied or altered without the permission of the copyright holder. Copyright does not need to be officially registered; the act of creating a work is enough for it to be protected.
Copyright Statement | Coventry University
Data
IBM define data as a collection of facts, numbers, words, observations or other useful information. This is used to create predictive models and map out patterns to support decision making.
Hallucinations
is the term used to describe when where large language models generate factually inaccurate or illogical answers due to the training data and architecture
Large language model, or LLM
An AI model trained on mass amounts of text data uses computer algorithms to analyse or synthesise human speech and text. The algorithms look for linguistic patterns in how sentences and paragraphs are constructed, and how the words, context and structure work together to create meaning this allows it to forecast the next most probable word and generate novel content in human-like language.
Machine learning
is a branch of artificial intelligence where it is possible to use elaborate algorithms to teach a particular machine to perform a tasks. There are different types of machine learning for including,
Deep learning: A subfield of machine learning, which uses multiple parameters to recognise complex patterns in pictures, sound and text. The process is inspired by the human brain and uses artificial neural networks to create patterns.
Diffusion: A method of machine learning that takes an existing piece of data, like a photo, and adds random noise. Diffusion models train their networks to re-engineer or recover that photo.
Natural language processing
A branch of AI that uses machine learning and deep learning to give computers the ability to understand, analyse, manipulate, and potentially generate human language, often using learning algorithms, statistical models and linguistic rules.
Neural networks
A mathematical system, that creates predictions by finding statistical patterns in data. It consists of layers of artificial neurons: The first layer receives the input data, and the last layer outputs the results, the middle layers determine what type of network it is.
For examples see The Neural Network Zoo - The Asimov Institute
Some of the most common ones are:
Convolutional neural network (CNN or ConvNet): A network architecture for deep learning that learns directly from data. CNNs are particularly useful for finding patterns in images to recognise objects, classes, and categories. They can also be quite effective for classifying audio, time-series, and signal data.
Generative adversarial networks, or GANs: A generative AI model composed of two neural networks to generate new data: a generator and a discriminator. The generator creates new content, and the discriminator checks the output against the original. This can be used to create realistic ‘deepfake’ images, which is difficult to distinguish from the data it is trained on.
Open Source Software
is software with source code that anyone can inspect, modify, and enhance.
Optimisation
in the context of mathematics, engineering, and various fields including machine learning and AI, refers to the process of finding the best solution or the most favourable outcome among a set of possible choices or configurations. The goal of optimisation is to either maximise or minimise a particular objective or criterion while adhering to a set of constraints or limitations.
Output
The ICO define AI outputs as the result generated by an artificial intelligence system after processing input data. For example:
Predictions: forecasting weather conditions or predicting stock market trends.
Recommendations: Suggesting products or content based on user preferences.
Decisions: Automated decisions in applications like loan approvals or medical diagnoses
Prompts
this is the set of instructions given to AI that define the task. Prompts that include specific details, clear boundaries and expectations generally produce better results.
Examples from Microsoft: Create effective prompts
Semi-supervised
A training approach that uses a small amount of labelled data combined with a larger amount of unlabelled data to build models.
Training data
The datasets used to help AI models learn, including text, images, code or data. What quality of this data impacts the reliability of the response.