Anthropic calls for AI red teaming to be standardized

Sage Lazzaro

13 June 2024 at 11:37 am·4-min read

Anthropic, the buzzy San Francisco-based AI startup founded by researchers who broke away from OpenAI, yesterday published an overview of how it’s been red-teaming its AI models, outlining four approaches and the advantages and disadvantages of each. Red teaming, of course, is the security practice of attacking your own system in order to uncover and address potential security vulnerabilities. For AI models, it goes a step further and involves exploring creative ways someone may intentionally or unintentionally misuse the software.

Red teaming has also taken a prominent role in discussions of AI regulation. The very first directive in the Biden administration’s AI executive order mandates that companies developing high-risk foundation models notify the government during training and share all red teaming results. The recently enacted EU AI Act also contains requirements around providing information from red teaming.

As lawmakers rally around red teaming as a way to ensure powerful AI models are developed safely, it certainly deserves a close eye. There’s a lot of talk about the results of red teaming, but not as much talk about how that red teaming is conducted. As Anthropic states in its findings, there’s a lack of standardization in red-teaming practices, which hinders our ability to contextualize results and objectively compare models.

Anthropic, which has a close partnership with Amazon, concludes its blog post with a series of red-teaming policy recommendations, including suggestions to fund and “encourage” third-party red teaming. Anthropic also suggests AI companies should create clear policies tying the scaling of development and release of new models with red teaming results. Through these suggestions, the company is weighing in on a running debate about the best practices for AI red teaming and the trade-offs associated with various levels of disclosure. Sharing findings enhances our understanding of models, but some worry publicizing vulnerabilities will only empower adversaries.

Anthropic’s approaches, as outlined in the blog post, include using language models to red team, red teaming in multiple modalities, “domain-specific, expert red teaming,” and “open-ended, general red teaming.”

The domain-specific red teaming is particularly interesting, as it includes testing for high-risk trust and safety risks, national security risks, and region-specific risks that may involve cultural nuances or multiple languages. Across all of these areas, Anthropic highlights depth as a significant benefit: Having the most knowledgeable experts extensively investigate specific threats can turn up really nuanced concerns that might otherwise be missed. At the same time, this approach is hard to scale, doesn’t cover a lot of ground, and often turns up isolated model failures that while potentially significant, are challenging to address and don’t necessarily tell us very much about the model’s likely safety in most real-world deployments..

Using AI language models to red team other AI language models, on the other hand, allows for quick iteration and makes it easier to test for a wide range of risks, Anthropic says.

“To do this, we employ a red team / blue team dynamic, where we use a model to generate attacks that are likely to elicit the target behavior (red team) and then fine-tune a model on those red-teamed outputs in order to make it more robust to similar types of attack (blue team),” reads the blog post.

Multi-modal red teaming is becoming necessary simply because models are increasingly being trained on and built to output multiple modalities, including text, images, video, and code. Lastly, Anthropic describes open-ended, general red teaming such as crowdsourced red teaming efforts and red teaming events and challenges. These more communal approaches to open-ended red teaming have the longest lists of benefits versus challenges. Many of the pros revolve around benefits to the participant, such as it being an educational opportunity and a way to involve the public. And while these techniques can identify potential risks and help harden systems against abuse, they both offer a lot more breadth than depth, according to Anthropic.

Looking at all these techniques together, it’s hard to imagine how red teaming could be successful without each and every one. It’s also easy to see why different approaches to red teaming can turn up such different findings and why standards are becoming ever more important.

In his executive order, Biden also ordered the National Institute of Standards and Technology to create “rigorous standards for extensive red-team testing to ensure safety before public release.” Those standards have yet to arrive, and there’s no indication when they will. With new, more powerful models being released every day without transparency into their development or risks, they can’t come soon enough.

Now, here’s some more AI news.

Sage Lazzaro
sage.lazzaro@consultant.fortune.com
sagelazzaro.com

This story was originally featured on Fortune.com

Barrons.com
Safra Catz Pushed Oracle Into the Cloud and Turned Competitors Into Customers
For at least four years, Oracle CEO Safra Catz and founder Larry Ellison have been telling investors that the enterprise software company’s business was about to leave behind a stretch of lackluster growth, thanks to a major push into cloud computing. Oracle has become a major player in both cloud computing and artificial intelligence, steadily adding marquee names to its list of cloud customers. In reporting financial results for the May quarter, for instance, Oracle announced new deals to provide cloud computing capacity to both OpenAI and its partner Microsoft, and to Google Cloud.
The Independent
An 8-year-old girl was sucked into a swimming pool pipe at a Hilton hotel. The management company blamed her parents
On March 23, Aliyah Lynette Jaico, 8, and her family were enjoying an afternoon of swimming at a DoubleTree by Hilton hotel when she disappeared.
South China Morning Post
The Beijing parking row that ignited debate on diplomatic immunity
A Beijing woman has apologised and been fined after she insisted that diplomatic immunity allowed her to park in the middle of a road and block traffic. In a video circulated online, the woman, who was later identified as Yu Qi, secretary general of the Asia-Pacific Space Cooperation Organisation (APSCO), refused to move her vehicle. "Do you know what an embassy car is? Do you understand what diplomatic immunity is? Get lost!" she said through the vehicle's window. Do you have questions about th
The Independent
Trigger Warning: Netflix’s latest hit film continues undesirable review streak for Jessica Alba
Alba stars as a US Special Forces commando in the film, which has reached No 1 on Netflix in the UK
EdgeProp
Convicted oil tycoon OK Lim to sell his third GCB for $43 mil
Knight Frank Singapore is marketing the sale of this GCB on Tanglin Hill. (Picture: Knight Frank Singapore)Former oil tycoon Lim Oon Kuin, known as OK Lim, is seeking to sell his third Good Class Bungalow (GCB) for $43 million. The two-story GCB is situated on Tanglin Hill within the Ridley Park GCB Area, spanning a 15,636 sq ft plot. The asking price translates to a land rate of $2,750 psf. Knight Frank's Capital Markets team is handling the property's marketing.The property has a built-up area
INSIDER
Video appears to capture the first use of Russia's monstrous 6,600-pound glide bomb in Ukraine and the immense destruction it causes
Analysts at the Institute for the Study of War think tank called the use of the FAB-3000 bomb a "significant development" for Russia.
Time
Kate Middleton Breaks Tradition With New Family Photo
For Prince William's 42nd birthday, Kate Middleton posted a fun-filled photo of her family.
The Telegraph
Israeli drone strike kills Hamas ‘weapons supplier’ 25 miles inside Lebanon
A high-profile Hamas terrorist was killed in an airstrike in Lebanon’s Bekaa district, Israeli media reported on Saturday.
BBC
Taylor Swift grabs royal selfie at London gig
The Prince of Wales celebrated his 42nd birthday at the singer's Wembley gig.
CNN
How China could take Taiwan without even needing to invade
China’s military could isolate Taiwan, cripple its economy, and make the democratic island succumb to the will of Beijing’s ruling Communist Party without ever firing a shot, a prominent think tank warns in a new report.
The Telegraph
Trent Alexander-Arnold midfield experiment ends now: he symbolises England’s identity crisis
Trent Alexander-Arnold has become emblematic of this England side: full of noble intentions, but hopelessly confused in his delivery and ultimately going nowhere very quickly. The fault lies as much with the manager as the player. This natural right-back’s unease as a central midfielder was painfully evident during this grisliest of draws, begging the question why Gareth Southgate had deployed him there in the first place. He lasted only 54 minutes, a tacit acknowledgment that this experiment ha
HuffPost
Prankster Coaxes CNN-Hating Donald Trump Fan Into Making Hilariously Awkward Admission
“All right. There you have it,” said The Good Liars comedian Jason Selvig after the MAGA voter dropped the bombshell confession and then silently turned away.
HuffPost
Lily Allen Just Got Seriously Real About Her And David Harbour’s Sex Life, And Said He 'Quite Often Asks For Things' In The Bedroom That She Won’t Do
Allen also shared the reason why she decided to reveal that she’d had intercourse with female sex workers during a “very, very promiscuous and experimental” phase of her life.
The Independent
Putin and Kim Jong Un celebrate new partnership with elaborate ‘gift-off’ in North Korea
Putin showers Kim with gifts after inking mutual defence pact
The Telegraph
For 42 years, a couple tended an abandoned baby’s grave. Now, an arrest has been made
There is a tiny grave in an isolated and remote spot of Northampton cemetery that sits separate from the other graves of young children, tragically taken from their parents through illness or accident.
Fortune
Malaysia’s China-friendly stance is a bid to get its biggest trading partner to help keep the economy stable: ‘You can’t be choosy’
China and Malaysia agreed to a dozen new deals during China Premier Li Qiang’s trip to the Southeast Asian country earlier this week.
Hello!
Princess Charlotte debuts new trendy jewellery she's never worn before
Princess Charlotte looked lovely in official royal pictures celebrating dad Prince William's birthday, wearing an anklet - something the daughter of Kate Middleton has never worn before.
People
Prince William Jumps for Joy on Beach Day with His Kids in Sweet 42nd Birthday Photo Taken by Kate Middleton
"Happy birthday Papa, we all love you so much!" the photo released on June 21 was captioned
BBC
Taiwan braces for fresh protests over controversial new law
The protests have come to reflect a deep political rift in Taiwan.
BuzzFeed
I May Need To Go To The Doctor Because I Physically Can't Stop Laughing At The 17 Funniest Signs Of The Week
This is a strong bunch of silly little signs.

Straits Times Index

Nikkei

Hang Seng

FTSE 100

Bitcoin USD

CMC Crypto 200

S&P 500

Dow

Nasdaq

Gold

Crude Oil

10-Yr Bond

FTSE Bursa Malaysia

Jakarta Composite Index

PSE Index

Anthropic calls for AI red teaming to be standardized

Latest stories

Safra Catz Pushed Oracle Into the Cloud and Turned Competitors Into Customers

An 8-year-old girl was sucked into a swimming pool pipe at a Hilton hotel. The management company blamed her parents

The Beijing parking row that ignited debate on diplomatic immunity

Trigger Warning: Netflix’s latest hit film continues undesirable review streak for Jessica Alba

Convicted oil tycoon OK Lim to sell his third GCB for $43 mil

Video appears to capture the first use of Russia's monstrous 6,600-pound glide bomb in Ukraine and the immense destruction it causes

Kate Middleton Breaks Tradition With New Family Photo

Israeli drone strike kills Hamas ‘weapons supplier’ 25 miles inside Lebanon

Taylor Swift grabs royal selfie at London gig

How China could take Taiwan without even needing to invade

Trent Alexander-Arnold midfield experiment ends now: he symbolises England’s identity crisis

Prankster Coaxes CNN-Hating Donald Trump Fan Into Making Hilariously Awkward Admission

Lily Allen Just Got Seriously Real About Her And David Harbour’s Sex Life, And Said He 'Quite Often Asks For Things' In The Bedroom That She Won’t Do

Putin and Kim Jong Un celebrate new partnership with elaborate ‘gift-off’ in North Korea

For 42 years, a couple tended an abandoned baby’s grave. Now, an arrest has been made

Malaysia’s China-friendly stance is a bid to get its biggest trading partner to help keep the economy stable: ‘You can’t be choosy’

Princess Charlotte debuts new trendy jewellery she's never worn before

Prince William Jumps for Joy on Beach Day with His Kids in Sweet 42nd Birthday Photo Taken by Kate Middleton

Taiwan braces for fresh protests over controversial new law

I May Need To Go To The Doctor Because I Physically Can't Stop Laughing At The 17 Funniest Signs Of The Week