Inside the British Lab Addressing A.I. Risks

1 month ago 0

Located in London’s Parliament Square, the A.I. Security Institute stands as a pioneering establishment in the field of artificial intelligence risk management. It is staffed by a diverse team, including former weapons inspectors, epidemiologists, and code breakers. The institute is setting an example for nations confronting the emerging risks associated with A.I.

In an Edwardian government building, a specialized team of four A.I. experts recently engaged in an experiment. Their task was to manipulate an A.I. chatbot into revealing details for creating anthrax, a dangerous bioweapon. The experts employed a unique strategy by asking the chatbot for a list of ingredients. When met with resistance as the A.I. responded, “I’m sorry I can’t help with that,” they resorted to a custom algorithm. This algorithm subjected the A.I. tool to thousands of automated queries.

Eventually, the A.I. succumbed, providing an extensive list of materials and instructions for fabricating the hazardous compound at home. The specific A.I. system involved remains undisclosed for security reasons.

“There are some questions that you definitely don’t want the model to give the answer to,” confided Xander Davies, a 25-year-old American leader of the red team at the institute. “We try really hard to get the answers out.”

The red team at the A.I. Security Institute specializes in simulating attacks on A.I. technologies. They recently succeeded in breaching the defenses of OpenAI’s latest ChatGPT, extracting hacking instructions within six hours. Upon identifying vulnerabilities, the team shares their findings with the respective companies.

“They try to fix it, report something back to us,” Mr. Davies, a computer scientist who opted for a position at the institute over a tech career in San Francisco, remarked, “They actually strengthen their system with us.”