Programming

SynthID, watermark to recognize AI-generated content. How the hell does this work with text?

SynthID, watermark to recognize AI-generated content.  How the hell does this work with text?

At the end of August 2023 we talked about SynthID, Google technology that introduces a “digital watermark” in images to help users distinguish real photos from those produced using tools based on generative artificial intelligence. To identify images created with AI, Google explained that it is able to add a handful of pixels on the generated images: these few additional “squares” act as a warning light to recognize artificially produced content. Furthermore, they do not alter the image quality.

The progress made byGenerative AI they have made it possible for anyone to easily use these tools to generate images and videos that look real. However, this also carries the risk that users may spread false information intentionally or unintentionally.

How SynthID works to recognize AI-generated images

Last summer, Google DeepMind engineers announced SynthID, a tool that adds a watermark to images generated by AI. This special one watermarkenabled through the modification of some pixels, makes it possible to establish whether the image is generated by artificial intelligence, even when the metadata or the image was modified.

On May 14, 2024, DeepMind engineers revealed that SynthID’s capabilities will be extended, allowing watermarking of the generated videos from a new AI model called Veo. Since the video is made up of singles framesSynthID’s watermarking mechanism is similar to that adopted for static images.

The watermark, explains Google, is embedded in the pixels of each frame that makes up the video, making the identification of videos generated by AI automated, without this information being directly exposed to the human eye. DeepMind also adds that all content produced with the new video generation tool, VideoFX, already integrated into Veo, will include the SynthID watermark.

Digital watermarking also works with text, according to Google DeepMind. How is it possible?

In describing the progress made with SynthID, DeepMind assures that it is now also able to insert a sort of watermark into texts generated by AI. In other words, with a simple analysis and verification activity of any written text, it would be possible to establish with certainty whether it was written by a real person or produced by a generative model. But is it really like that?

As for lyrics, SynthID works by manipulating the distribution of token. Tokens are the basic elements that the linguistic model uses to produce text, such as single characters, words, or sentences. We explained it in the article on how the Large Language Model (LLM).

SynthID subtly modifies the chance with which certain tokens are selected. This type of behavior allows the insertion of a watermark in the text generated by the LLM.

Obviously, while in the case of images and videos the detection accuracy is very high and does not give rise to errors when it comes to watermarking you have to be a little more cautious about the lyrics. DeepMind claims that editing long texts usually does not allow the complete removal of the watermark because in some tokens they tend to remain recognizable. However, it is clear that a strong re-elaboration of the text produced by AI or the translation from one language to another can cause the watermarking.

Honestly, we feel like saying that the automated recognition of texts created on AI is an almost daunting challenge. Much better to concentrate on verifying sources, on in-depth analysis of the text, on using that critical spirit that only a human can develop, rather than focusing on an algorithm that – in the end – on a statistical basis judges a text as whether or not created by an AI.

Leave a Reply

Your email address will not be published. Required fields are marked *