Text: Find words that match a pattern

Use this action to search a block of text for a custom pattern and return all matches into a data table.

Use case

  • Parse through blocks of text to find a company ID or SSN.
  • Configure this action with a POS system spreadsheet and look for IDs that match a specific pattern.
  • Use this action as part of a text analysis Workflow that extracts and labels text.

How to configure this action

This action can find words or phrases that follow a consistent format or structure by creating a custom pattern. For example, since Social Security Numbers are always in the same format, 000-00-0000, or a company ID number may always be A1-00-00, you can build a pattern to find these values in text.

Taking the SSN example, the SSN is always a 11 character string containing 2 dashes in the same position: 000-00-0000. You can create a pattern to parse for any strings that match the same structure: any 3 digits, a dash, any 2 digits, a dash, then any 4 digits.

To create a pattern, 3 characters, #, @, ?, each represent a type of character in the pattern. The # symbol is for digits and represents any digit between 0-9. These variable characters are often referred to as wildcards, because they can represent any possible digit or symbol.

Character Represents
# Represents any digit (0-9)
@ Represents any letter (a-z)
? Represents any letter, digit, or dash

The SSN structure has two types of characters: digits and dashes. Since the # character represents any digit (0-9), you can use the # and the dash character to create a pattern that searches for the same format: ###-##-####.

Here are example strings and whether they would match the ###-##-#### pattern.

String Result
A1-292-2992 No match. The pattern character # is looking for digits, but the first character is a letter. The string does not match the pattern.
000-00-0000 Match. The pattern is looking for any 3 digits, a dash, any 2 digits, a dash, then any 4 more digits. The string matches the pattern.
00-0000-000 No Match. This string has 2 digits, a dash, 4 digits, a dash, then 3 digits, but the pattern is looking for 3 digits, a dash, 2 digits, a dash, then 4 more digits. The string does not match the pattern.

Fields for this action

    • A block of text or field reference to search for matching words.
  • Pattern to match 1

    • Enter a set of characters that define the pattern.
    • Character Represents
      # Represents any digit (0-9)
      @ Represents any letter (a-z)
      ? Represents any letter, digit, or dash

      If you need to search for the symbols @ or #, use a backslash before them \@ or \#.

      Using the pattern characters, the following examples illustrate different pattern structures and the strings that would match them.

    • Pattern Matched Strings
      ###-##-#### 121-12-1212, 393-29-2929, 344-34-3434
      ##:##:## 12:34:56, 02:21:18, 09:11:23
      @#_@@_@@@ A1_BC_DEF, Z4_XY_LMN, D2_EF_GHI
      ???-### A8C-111, 123-222, A-B-333
  • Pattern to match 2 (3,4,etc.)

    • Any additional patterns to look for in the Text to search. The patterns parse the text independently and the results will be shown separately in the results.
  • Output Field Prefix

    • To help keep output fields organized, choose an output field prefix to add to the beginning of each output field name as this action may output more than one field.
    • The step’s name is used as the prefix by default.

What will this output?

This action will output the matched results in a data table. The results are ordered by pattern if using multiple patterns, then by the position of the matched text from the text body. The table lists the matches in descending order, where the first row of the results is the first match found.

This action may generate multiple fields. To help keep output fields organized, the prefix above will be added to the beginning of each of the output field names, separated by two dashes. Each field will result as:{{output-field-prefix--output-field}}. Learn more

Output fields for this action

  • Data table

    • This field provides the ID for a Data Table where the results are stored. The table will have 3 columns:

    • Matching Text Row Position
      A string that matched the pattern. The row number of the match. The position of the matching text in the text body using a #,# syntax where the first value is the row #, and the second is the character # in the row.
    • Each row of the table is another instance of matched text.
  • First match

    • Lists the result of the first match of the action for the first pattern.
  • Number of Matches Found

    • The number of matches found across all patterns.

Sorry about that. What was the most unhelpful part?

Thanks for your feedback

We update the Help Center daily, so expect changes soon.

Link Copied

Paste this URL anywhere to link straight to the section.

Need more help?

If you're signed in to Catalytic Community, you can ask other users a question. You'll be redirected to Community where you can add more info.