PDF: Extract text to a Field

Use this action to extract text from a PDF file and save the text to a field.

For the same functionality with Word documents, first convert a .docx file to a PDF with the PDF: Create PDF Document action and then apply this action.

✅   Heads-up: This action is unable to extract text entered into a PDF file itself, such as a user-fillable form.

Use case

This action extracts all text from a PDF file and stores the results in a field. The stored text field is then available for parsing with our other Text actions.

How to configure this action

This action only works with non-scanned PDFs. For similar functionality with PDFs generated through scanning, use the Images: Optical Character Recognition action.

Fields for this action

  • PDF File

    • Enter the PDF file you want to work with. Use a field reference to a PDF file uploaded in a prior task or instance.
  • Output Field Name

    • The name of the field where the result is saved. The field will contain the text extracted from the PDF.

What will this output?

This action will output one text field. The name of the field is the value entered in the Output field name action configuration.

Output fields for this action

  • Output field name

    • The field will store the text content of the PDF file.

Why is text missing from a PDF form I scanned?

This action is unable to extract text entered into a PDF file itself, such as a user fillable form. This is a limitation of the PDF file type and OCR technology.

If it is possible to export the PDF as another file type, the user-filled forms are typically “flattened” into the file. For example, export the PDF as a PNG, then run it through the Optical character recognition action.

