Network

Web scraping: when it’s legal and when it’s not. The Guarantor explains it

Web scraping: when it's legal and when it's not.  The Guarantor explains it

Il Privacy Guarantor Europen has just disclosed the text of a provision taken against a person based in our country who, using special crawler (o spider), scanned the contents of web pages for phone numbers and related contacts. The activity of Web scraping it was set up with the aim of creating and disseminating online a telephone directory that was as complete and up-to-date as possible.

As the Guarantor Authority for the protection of personal data explains, the legislation currently in force does not allow the creation of telephone directories generic ones that are not extracted from the DBU, the single database that contains the list of users and customer identification data of all national operators, on fixed and mobile networks.

What is meant by web scraping

The expression Web scraping refers to the process of automatically extracting information from web pages using special software. The operation allows you to retrieve structured or unstructured data from websites, in an automated way.

This type of activity can be implemented for various reasons: public data collection for later analysis, research, price tracking, content aggregation, competitor tracking, and more. For example, a huge volume of texts was for example delivered to generative models with the purpose of training them and answering user questions.

The downloaded web pages are usually then automatically analyzed for extrapolate the data of interest. This is done using specialized libraries, HTML parsing tools, regular expressions and other data mining techniques.

Many websites take steps to prevent or hinder web scrapingfor example through the implementation of CAPTCHA, blocking IP addresses used by crawler unrecognized, using the file robots.txt to indicate which pages or content should not be extracted.

Web scraping activities are illegal if they involve personal data

In general, web scraping is not an illegal activity in itself. Unless prohibited by the individual website operators and until they are mined personal data.

In the terms of service On many websites it is reported whether web scraping is allowed or forbidden. Some sites may explicitly prohibit this, while others may require permission or to comply with certain restrictions. Furthermore, this type of activity could violate the intellectual property rights if protected information such as texts, images, videos or other original content is extracted outside the normal browsing activity, with the precise aim of reusing them for other purposes.

In his decision, the Guarantor notes that he has already expressed himself several times in the past “regarding the illegality of using data collected through Web scraping for purposes incompatible with the initial ones“. This means that third parties are not entitled to make their own e process personal data that come from information legitimately published on the Web following the collection of consent by the related rights holders.

The precedents in Europe and in Europe

Again the Europen Guarantor, in fact, in 2022 had sanctioned ClearView AI for having used Web scraping to compose a vast database starting from billions of photos posted online. In this case, the collected material was exploited to improve the functioning of the system Facial recognition of the company. Then there is the 2016 decision which also in that case concerned the composition of telephone directories starting from data collected on the Web with automated scans.

However, there are not only the responsibilities of those who carry out Web scraping activities without having the right to do so. In the background remains the problem of adequate guarantees that the data controller. The Irish Privacy Guarantor has in the past imposed an administrative fine on Facebook for failing to adequately defend users’ personal data. At the material time, the social network would not have actually hindered the composition of a telephone directory with i mobile numbers of subscribers. The result was a sort of White Pages for Europen and other countries’ mobile phones which, unfortunately, are still circulating online today. Especially on the dark web.

Leave a Reply

Your email address will not be published. Required fields are marked *