Web LLM attacks

What is a large language model?

Large Language Models (LLMs) are AI algorithms that can process user inputs and create plausible responses by predicting sequences of words. They are trained on huge semi-public data sets, using machine learning to analyze how the component parts of language fit together.

LLMs usually present a chat interface to accept user input, known as a prompt. The input allowed is controlled in part by input validation rules.

LLMs can have a wide range of use cases in modern websites

Customer service, such as a virtual assistant.
Translation.
SEO improvement.
Analysis of user-generated content, for example to track the tone of on-page comments.

LLM attacks and prompt injection

Many web LLM attacks rely on a technique known as prompt injection. This is where an attacker uses crafted prompts to manipulate an LLM's output. Prompt injection can result in the AI taking actions that fall outside of its intended purpose, such as making incorrect calls to sensitive APIs or returning content that does not correspond to its guidelines.

Detecting LLM vulnerabilities

Our recommended methodology for detecting LLM vulnerabilities is:

Identify the LLM's inputs, including both direct (such as a prompt) and indirect (such as training data) inputs.
Work out what data and APIs the LLM has access to.
Probe this new attack surface for vulnerabilities.

Last updated