Testing our security defenses with a brand new bug bounty program Anthropic

Featured3 months ago18 Views

At present, we’re launching a brand new bug bounty program to stress-test our newest security measures. Much like this system we introduced final summer season, we’re difficult researchers to seek out common jailbreaks in security classifiers that we’ve not but deployed publicly. These safeguards are a part of the superior protections we’ve developed to assist us meet the AI Security Degree-3 (ASL-3) Deployment Normal as a part of our Accountable Scaling Coverage, the framework that governs how we develop and deploy more and more succesful AI fashions safely.

The bug bounty program, which is in partnership with HackerOne, will take a look at an up to date model of our Constitutional Classifiers system. Constitutional Classifiers are a method we constructed to protect towards jailbreaks that would elicit info associated to CBRN (chemical, organic, radiological, and nuclear) weapons. This method follows an inventory of rules that outline what sort of content material ought to and shouldn’t be allowed when interacting with Claude, and focus narrowly on particular harms.

Individuals will obtain early entry to check our classifiers on Claude 3.7 Sonnet. We’re providing bounty rewards as much as $25,000 for verified common jailbreaks discovered on the unreleased system. A common jailbreak is a vulnerability that persistently bypasses Claude’s security measures throughout many subjects. For this initiative, we’re taken with common jailbreaks that might be exploited to allow misuse on CBRN-related subjects.

Our fashions have gotten more and more succesful, and as we’ve shared earlier than, we consider some future fashions could require the superior ASL-3 safety and security protections outlined in our Accountable Scaling Coverage. This bug bounty initiative will assist contribute to the work we’ve achieved over the past a number of months to iterate and stress-test our ASL-3 safeguards.

We’ve kicked off this new bug bounty initiative with participation from the researchers who joined our earlier program final 12 months and are providing the chance for brand spanking new researchers to take part. Should you’re an skilled pink teamer or have demonstrated experience in figuring out jailbreaks in language fashions, we encourage you to use for an invite via our utility kind. Detailed directions and suggestions shall be shared with chosen members. Functions open right this moment, and this system will run via Sunday, Might 18. This initiative is invite-only so we will reply to submissions with well timed suggestions.

We’re grateful to the safety group for its partnership in serving to us make AI techniques safer.

Upvote0PointsDownvote

0 Votes: 0 Upvotes, 0 Downvotes (0 Points)