Extract text from a PDF or image with Google Docs

PDF OCR
Google Docs continues to add new features to its online application. Thus, together with the ability to upload files of any type and size (in Google Drive), now adds a very interesting novelty, the OCR function, which allows you to extract text from a PDF or image.
It, therefore, becomes clear how useful it can be, both in the workplace and for everyday life, to be able to capture text from a scanned image and a protected PDF document.
For example, you could, in theory, use the scanner to save a book on your computer, extract the text and modify it or you can take pre-made documentation, modify it and recycle it (be careful, however, to copy and paste from documents protected by copyright).
Let’s see together how to extract text from a PDF or image with Google Docs; for completeness of the information, we will also show you some valid alternatives to be able to extract text via OCR.

How to extract text with Google Docs

To activate the OCR function on Docs open the Google Drive page, press the gear icon at the top right and then up Settings; in the window that opens, check the item Convert uploaded files to Google Docs editor format.
Drive documents

At this point, just upload a PDF or an image with text in Google Drive, then right-click on the file just uploaded and use the option Open with – Google Docs. The PDF or image will not load in its starting format, but we will get a directly editable text sheet with Google Docs tools. The text file can then be saved again in PDF format on the computer or in a Word file, in a TXT, in RTF, or in a format compatible with LibreOffice (ODT).
Clearly, if you upload a PDF and extract the text, you will lose the formatting of the paragraphs even if the font settings, italics, and bolds should remain (much depends on the quality of the images that make up the original PDF ). It still remains a quick and easy way to bring paper books to your computer without having to rewrite them from scratch.

How to extract text on Windows 10

If the OCR of Google Docs has not convinced us completely, we can alternatively use the PDF24 tool, available for free for any version of Windows.
PDF 24

After installing the app let’s start it, press on the item Recognize text and, in the next window, click on Add files and then on Starts. The program will automatically start capturing text from images in the PDF; at the end of the work we press on Save file, so you can create a new PDF with the text extracted from the images (much more readable and accurate).

Alternatively, always on Windows, we can use the FreeOCR program, one of the best free tools.

Once the program is open, press on Open PDF and choose the PDF to upload, so as to be able to extract the text contained within, and press the top on OCR. At the end of the process, we choose whether to save the recovered text on a new PDF file (recommended) or in any other supported text format.

To closely test other alternatives for Windows we recommend that you read our guide to OCR programs to convert images, faxes, and pdfs from the scanner.

How to extract text from a PDF on Mac

If we are looking for something similar to the programs seen above for Mac, we can try OCRKit, available as a free trial for 14 days.
OCRKit

Once this small tool is open, just load the PDF with the images and start the conversion: in a few minutes, we will get the reading of all the images and a file with all the extracted text will be generated, ready to be copied, modified or shared.

How to extract text from a PDF online

If we cannot install any program on our company PC or we work on a PC with a user with limited permissions, we can still extract text from a PDF consisting of images or scans using the onlineocr.net online service.
onlineocr.net

Once the site is open, press the button Select file, load the PDF file with the text to extract, select ITALIAN is Microsoft Word (Docx) from the next drop-down menu and finally, press on Convert.
The PDF will be read and converted into an easily editable Word document and downloaded from the browser as any file, ready to be edited with Word or with LibreOffice Writer (the free version accessible to all).

If the site above does not convince us and we want to try another one, we can get a free OCR for PDF by taking us to the Convertio site, which has a section dedicated to reading the characters from scans or images.

To use the site, press the button Choose files, we load the PDF to be scanned, we check if all the options correspond to our needs then press down on recognizes. The site will immediately read all the images and generate an editable Word file, ready to use.
In the free version, we can only convert 10 pages; if we need more pages we will have to register by pressing on the top right Sign in before doing anything.

Conclusions

As we have seen, the methods to extract text from a PDF or an image are really many: we started from Google Docs (the simplest and most immediate tool) to show you other tools and programs useful for the purpose.

Still, on the subject of PDF, we can modify this type of file by reading our guides How to edit PDF files is Top 10 PDF Editing Programs.
If, on the other hand, we are looking for a way to edit and compile PDFs on our phone, we recommend reading our article How to edit and fill out PDFs from Android and iPhone.

LEAVE A REPLY

Please enter your comment!
Please enter your name here