Red Teaming: How to challenge and secure LLMs

LLMOps

LLM Governance

Welcome to AI Watch episode 19!

This episode discusses the concept of red teaming in the context of LLMs. Hear 2021.AI’s lead product data scientist, Ahmed Zewain, explain the definition, goals, and methods of red teaming, emphasizing the importance of continuous testing and improvement for the safe and ethical use of LLMs.

Learn More

LISTEN ON SPOTIFY

Transcript

Ditte: Hi, and welcome to this episode of AI Watch. My name is Ditte, and with me today I have Ahmed. Ahmed is our product lead data scientist here at 2021.AI, and he's responsible for our great LLM solutions. Welcome, Ahmed. Ahmed: Thank you. Ditte: Today we're going to talk about red teaming and how to challenge and secure your LLMs as AI Watch is this brief piece we have. We also have a deep dive in a webinar in September that you're very welcome to join if you want to hear even more about this subject. LLMs, of course, are a big thing these days. We hear a lot about them. We hear about all of the great successes and opportunities they have, but we also hear about some of the challenges. And red teaming is about facing those challenges and fixing it, right? Ahmed: Yes. Ditte: So in your definition, Ahmed, what is red teaming? Ahmed: It's actually a military word, and it comes from the concept of trying to simulate an attack by the red team and you are the blue team, and then you try to simulate what could go wrong and how would I react. So you try to figure out what vulnerabilities you have and how you can mitigate those. So that concept has been used in IT and security generally, and now it is finding its way to LLM and LLM applications. So in the context of LLMs, you simulate and you try to figure out what can go wrong, and you do that by the method of red teaming. Ditte: Okay, that's really cool. So red teaming right now, is it a big topic for data scientists? This is something that a lot of different people can do, or where are we right now? Ahmed: Well, I mean, it is becoming a very important topic because people are realizing the nature of these large language models and their applications. And so this way of thinking is finding its way there. And so I think we are in this place where we're establishing standards and how to do it. But the thing is, the hardest part by far is actually defining those test cases and trying to figure out what are my vulnerabilities in different scenarios. And that is still a very difficult task to do because you cannot automate this. You cannot build a tool for that, just like that. You need experts, you need to actually map out your specific use case. What could go wrong? It's very, very difficult from use case to use case, even though technically use cases might be similar, but the way they are used is very different. So therefore the risk is different. Vulnerabilities are different. Ditte: So what is the goal of red teaming when it comes to large language models, like ChatGPT? Ahmed: So the goal is to save yourself from the potential scenario where you might be, you know, harming someone directly or sometimes also embarrassing an organization. You might be saying stuff or giving guidance to people that is wrong, that is not in line with your code of conduct and your philosophy. The idea is to figure out where it can go wrong and to identify those weaknesses and biases and ethical issues. Sometimes that's, I would say some of those are trivial, right? If you have an application in healthcare, then you might say an initial thing would be that it must never talk about self harm. It will never guide people towards those things. And then you try to find your way through some more non-trivial cases, and you try to figure out if your application is giving some kind of a dosage or some number. You want to make sure that number is correct all the time. Yeah, it depends on use case to use case. Ditte: Do you have any real examples where an LLM has been completely out of line, something that we must avoid? What could that be? Ahmed: So if you bypass the red teaming process and do not test and find the vulnerabilities in your system, you might end up in a situation where you have, let's say, an LLM solution that is customer facing, and it is, you know, helping customers navigate around your product. So let's say a customer tricks it into, can I get this product for $1? Maybe you're selling a car or something else, and then it says, yes, sure, you can have this, you can buy this car for $1, and it is a legally binding contract, right? So that's an embarrassing moment, but it could happen. Ditte: Something that we would never want to happen. Ahmed: Yes, because it turns out that there have been real world cases out there where the answers of such an application was harming people and was actually legally binding. So you want to make sure that that doesn't happen. Ditte: That's where people become a little bit worried about using these LLMs, right? Especially in facing customers. Ahmed: Yes. Ditte: So this is the tool to avoid it? We can fix it, right? Ahmed: Definitely, yes. It's a way of thinking. It's a methodology. It is well known. And now we're simply using that methodology in the red team so that we avoid these simple cases, at least. You'll never be able to solve everything. Even big giants like ChatGPT are still fixing those biases, are still addressing issues about, you know, but these are corner cases that we are jumping into. And maybe later we're going to talk about continuous testing, while that is important as well. Ditte: Yeah, we'll get back to that. First, let's dive into this. How do you actually do this? How do experts check if an LLM, such as ChatGPT 4, is safe and fair to use? Ahmed: So first you're going to study the different scenarios that you might be running into. You need to think about where the solution is going to be, and so that's going to determine how and also which sector you're in. So these are the things you need to think about. Then there are methods for starting to attack this problem. You do some adversarial testing, you pretend to be someone who wants to hack your LLM. So you're thinking about, if I was the attacker, what would I do? And then you start mapping out those things and from there you're gonna have a test and check. We'll do that. So you're actually going to try to break your own system. Ditte: So you really need to be creative here, I could imagine? Ahmed: You need to be very creative and you need to have a very diverse team that can think in different ways. As a technical person, I don't have an overview of what “ethically” is. I know normal things, but there are different corner cases that I might not think about as a developer. So you need different people with different perspectives or also domain knowledge, right? So as a technical person, I know that it should not talk about death, but I, in the case of healthcare, for example, but a nurse or a doctor might say, that's fine, but what about doses? It's giving me something about doses. So is that correct? How wrong can it be? Can it be wrong at all? Or, you know, how should we handle that? And also you think about what ethics mean, right? So what is ethical? So in the beginning, ChatGPT when it was launched. It had some obvious flaws before it was launched and they showed that where it would actually give instructions for, you know, explosives, that's obviously not something you want to go live with. So that was easily removed. But then you have other corner cases where it's like it's a debate. So politically, what's wrong, what's right? So you need to think about what ethics means for your organizations and what is beyond your lines and etcetera. Then you also have simple tests for performance. So my application will need to accommodate a million people, can it do that? You need to know if that's possible. So you simulate a million people trying to go in and ask simple questions. Can the system handle that? Ditte: A little example of like how, where it would be up to a company to also decide, like, what do I want my LLM to do? Something that wouldn't be a dangerous thing would be for the LLM to actually write a song for a party or something like that. But it wouldn't probably wouldn't seem professional. Like, if that's not the use that you want for your LLM. Is that also something that you do in the red teaming process where you kind of set the tone, or is that a separate part of the LLM project? Ahmed: Yes, that's definitely part of it. Because the tone of speaking would be in, this is in line with my company values and also ethics and fairness and social norms, right? It is exactly like having a human being. We are all in one company. There's ways to behave and then, as you know, we all behave in our way, but we are restricted within what the company allows us to do. You cannot have your customer facing LLM speak like a gangster when clearly that's not what we do. Ditte: Or it tries to be funny. I think we've all tried to get an LLM to tell a joke or something like that, and it is just not funny. Ahmed: That's very subjective. I would make a funny LLM. Ditte: Yes, you’d make a funny LLM. Ahmed: But still, is that professional tone, right? When I'm sitting in sales meetings or here in this company, there's a certain tone that I as a human know I need to adhere to. Can I guarantee that the LLM would do that? So one way to do that, red teaming, I set up test cases that would test if that is true and also make sure that that is always true. Just to map up the four areas where you would say experts go in to test the LLM that is adversarial testing, that you call it, scenario analysis, the ethical review, and the stress testing. Ahmed: Yes, there might be more, but if you want to cook it down to something quite simple, understandable, then yes. Ditte: And all of these you will go more in depth within the webinar that we are having, and also there we will have the opportunity to ask questions. So if you really think this is interesting, please join us there. Why is it important that we test continuously and keep improving these large language models and not just do this as an exercise in the beginning? Ahmed: Just like IT systems, you still need to make sure that they are reliable and safe to use. So obviously tests are a part of building IT systems and therefore also AI systems. But think about this, because it has some kind of intelligence or human behavior in the loop, I would draw other analogies. So just like we have, we as employees in a company, we have yearly reviews where part of it is how is your behavior in this company? And so that is also based on that someone has observed how I was. You know that you have different arguments for why you're still behaving. So how will you do that with a digital assistant? You run red teaming continuously to make sure that you are still behaving the exact same way as when the product was launched. And there are multiple factors that can make an LLM drift, right? This is known in AI that models will drift over time. This is no different for LLM systems for other reasons, more other non-trivial reasons, but there are factors that could make an LLM go in a different direction. And so you simply want to make sure that you continuously check, is it still behaving? Ditte: So if you are a data scientist who hasn't worked with this previously and you are responsible for an LLM and you think, I need to get started by this on my own, like, is that even possible? Do you need a team? You know, we thought about the creativity before, needed to actually do the attacks properly. Where would you start if you were a data scientist responsible for an LLM? Ahmed: So I would first, as I mentioned, it is really, really important to mention that this is a framework, a way of thinking. There's no way you can apply a solution on everything and then it's going to work. There are going to be some overlaps, obviously, but you need to study your case. Initially, when you're designing your prompts, there is a way to, you need to align with business, the tone and the boundaries, I would say that is possible to do. But there are a lot of corner cases that you simply cannot think about. You have to include others before releasing the product. So make sure that if it's an LLM for your salespeople, include them in the process. Have them test it and get their feedback and then change, get feedback [again]. Include the user very early on in the process, because there's no way you have any knowledge on how people in their domain would attack this. For each case, map out what can go wrong, and we can, and other experts can, help you do that. Map out how this system should be tested and checked. So that's the first thing you're going to do. So when you have mapped out this, then we need to actually start to operationalize these tests and actually build a system for running the red teaming exercise. Get a sign off from everyone that this is actually okay to move on and then you set a schedule where you want to run this. Remember, this is going to take time, so allocate good time for a couple of periodical tests before you release a product. There's no way you're going to get this live even though the prompting was fast or usually that process is also not fast, but you need to include people very early on. Ditte: So also, if you're pushed by management or someone else who really wants this product out there quickly, it's also an important exercise to sit down and really explain why and how this is done in order for it to be a good solution for a company and not just quickly have an LLM. Ahmed: Well, I mean, put reality in front of them. If you are out to embarrass yourself, by all means, put it up. Ditte: By all means. So we have to round this off. So thank you, Ahmed, and thank you so much for watching. If you want to learn more about this, join the webinar. I'm not going to be there, but Julian, our very good, talented colleague, will [be here] to discuss all the red teaming exercises with you and also provide some information on a specific case that we actually did to really go deep. So register for the webinar above. If you want to watch more AI watch episodes, you can also register here on the page. Thank you!

Red Teaming: How to challenge and secure LLMs

Transcript

More news

The Danish Agency for Climate Data Leads Wide-Scale AI Push

From policy to practice: AI Agent for controls extraction

The Right Fit: Choosing Open-Source LLMs

Red Teaming: How to challenge and secure LLMs

Transcript

More news

The Danish Agency for Climate Data Leads Wide-Scale AI Push

From policy to practice: AI Agent for controls extraction

The Right Fit: Choosing Open-Source LLMs

Get the latest news