Programming

Researchers train chatbots to “hack” other AI systems

Researchers train chatbots to

pixabay.com

If it is true that practically all chatbot IA in circulation are equipped with systems to prevent their abuse, through filters or various limitations, this does not seem to stop users’ attempts to obtain answers not in line with the platforms’ policies.

In the first few months, with complex prompts, it was possible to easily gain access to potentially dangerous information. Today this is no longer the case but, through the creation of an AI system, a mechanism has been created that allows a chatbot to hack some “colleagues”.

The researchers of Nanyang Technological University (NTU) of Singapore, have examined the ethics of various large language models (LLM) in circulation, having finally found a way to train chatbots to bypass the defense mechanisms of their peers, with a real jailbreak.

Experts have described how, in this process, the first step is constituted by understanding of defense systems of the specific chatbot. Once the defensive logic is discovered, it is then possible to form another similar system to circumvent the limitations of the first.

Masterkey is a sort of “passepartout” to avoid any filter of AI platforms

The method, created by the professor Liu Yang and by his students, he was baptized Masterkey. It is a platform designed precisely as a “passepartout” to undermine any type of chatbot. Even if an LLM is patched to narrow the filters, Masterkey seems to be able to adapt and find ways to become effective again.

All things considered, the Masterkey technique isn’t that complex. The system takes advantage of the addition of extra spaces between words to get around the possibility black list. In other cases, the “victim” chatbot is asked to respond as if he had no moral constraints.

With tailor-made prompts, therefore, the tool constantly finds a way to obtain the desired output, in defiance of any barrier previously created to limit AI.

Leave a Reply

Your email address will not be published. Required fields are marked *