Text: Find words that match a pattern
Use this action to search a block of text for a custom pattern and return all matches into a data table.
Use case
- Parse through blocks of text to find a company ID or SSN.
- Configure this action with a POS system spreadsheet and look for IDs that match a specific pattern.
- Use this action as part of a text analysis Workflow that extracts and labels text.
How to configure this action
This action can find words or phrases that follow a consistent format or structure by creating a custom pattern. For example, since Social Security Numbers are always in the same format, 000-00-0000, or a company ID number may always be A1-00-00, you can build a pattern to find these values in text.
Taking the SSN example, the SSN is always a 11 character string containing 2 dashes in the same position: 000-00-0000. You can create a pattern to parse for any strings that match the same structure: any 3 digits, a dash, any 2 digits, a dash, then any 4 digits.
To create a pattern, 3 characters, #
, @
, ?
, each represent a type of character in the pattern. The #
symbol is for digits and represents any digit between 0-9. These variable characters are often referred to as wildcards, because they can represent any possible digit or symbol.
Character | Represents |
---|---|
# |
Represents any digit (0-9) |
@ |
Represents any letter (a-z) |
? |
Represents any letter, digit, or dash |
The SSN structure has two types of characters: digits and dashes. Since the #
character represents any digit (0-9), you can use the #
and the dash character to create a pattern that searches for the same format: ###-##-####
.
Here are example strings and whether they would match the ###-##-####
pattern.
String | Result |
---|---|
A1-292-2992 | No match. The pattern character # is looking for digits, but the first character is a letter. The string does not match the pattern. |
000-00-0000 | Match. The pattern is looking for any 3 digits, a dash, any 2 digits, a dash, then any 4 more digits. The string matches the pattern. |
00-0000-000 | No Match. This string has 2 digits, a dash, 4 digits, a dash, then 3 digits, but the pattern is looking for 3 digits, a dash, 2 digits, a dash, then 4 more digits. The string does not match the pattern. |
Fields for this action
-
Text to search
- A block of text or field reference to search for matching words.
-
Pattern to match 1
- Enter a set of characters that define the pattern.
-
Character Represents #
Represents any digit (0-9) @
Represents any letter (a-z) ?
Represents any letter, digit, or dash If you need to search for the symbols
@
or#
, use a backslash before them\@
or\#
.Using the pattern characters, the following examples illustrate different pattern structures and the strings that would match them.
-
Pattern Matched Strings ###-##-####
121-12-1212, 393-29-2929, 344-34-3434 ##:##:##
12:34:56, 02:21:18, 09:11:23 @#_@@_@@@
A1_BC_DEF, Z4_XY_LMN, D2_EF_GHI ???-###
A8C-111, 123-222, A-B-333
-
Pattern to match 2 (3,4,etc.)
- Any additional patterns to look for in the Text to search. The patterns parse the text independently and the results will be shown separately in the results.
-
Output Field Prefix
- To help keep output fields organized, choose an output field prefix to add to the beginning of each output field name as this action may output more than one field.
- The step’s name is used as the prefix by default.
What will this output?
This action will output the matched results in a data table. The results are ordered by pattern if using multiple patterns, then by the position of the matched text from the text body. The table lists the matches in descending order, where the first row of the results is the first match found.
This action may generate multiple fields. To help keep output fields organized, the prefix above will be added to the beginning of each of the output field names, separated by two dashes. Each field will result as:{{output-field-prefix--output-field}}
. Learn more
Output fields for this action
-
Data table
-
This field provides the ID for a Data Table where the results are stored. The table will have 3 columns:
-
Matching Text Row Position A string that matched the pattern. The row number of the match. The position of the matching text in the text body using a #,#
syntax where the first value is therow #
, and the second is thecharacter #
in the row. - Each row of the table is another instance of matched text.
-
-
First match
- Lists the result of the first match of the action for the first pattern.
-
Number of Matches Found
- The number of matches found across all patterns.
Thanks for your feedback
We update the Help Center daily, so expect changes soon.
Link Copied
Paste this URL anywhere to link straight to the section.