Skip to contents

These functions extract parts of strings based on a pattern.

str_extract_first(), str_extract_nth() and str_extract_last() extract the first, nth and last occurrence of a pattern in each string, into a character vector the same length as strings.

str_extract_all() extracts a character vector of all occurrences of a pattern for each string, into a list the same length as strings.

chr_extract_all() extracts all occurrences of a pattern from strings into a character vector.

Usage

str_extract_first(strings, pattern, fixed = FALSE)

str_extract_all(strings, pattern, fixed = FALSE)

str_extract_nth(strings, pattern, n, fixed = FALSE)

str_extract_last(strings, pattern, fixed = FALSE)

chr_extract_all(strings, pattern, fixed = FALSE)

Arguments

strings

A character vector, where each element of the vector is a character string.

pattern

A single character string to be searched for in each element of strings. By default, pattern is interpreted as a regular expression (regex). If the fixed argument is set to TRUE, pattern will be treated as a literal string to be matched exactly.

fixed

Logical; whether pattern should be matched exactly, treating regex special characters as regular string characters. Default FALSE.

n

(str_extract_nth only) Integer, the nth occurrence of the pattern to extract. Negative values count back from the end.

Value

str_extract_first(), str_extract_nth() and str_extract_last()

each return a character vector the same length as the input vector strings. It contains the extracted portion of the string, corresponding to the first, nth and last match of the pattern, respectively. Strings with no corresponding match are represented as NA values.

str_extract_all(): returns a list of character vectors, where each list element corresponds to a string in the input vector. Each element is a character vector of all matches in that string. If no matches are found in a string, the corresponding list element is an empty character vector i.e. character(0). The list is the same length as the input vector strings.

chr_extract_all(): returns a character vector containing every single match in the input vector. Non-matches are ignored. This is equivalent to calling unlist() on the output of str_extract_all().

Details

These functions are built using the base R regular expression functions. {suitestrings} uses Perl-compatible Regular Expressions (PCRE). This is achieved by setting perl = TRUE in the underlying base functions. See R's regexp documentation for info on the regex implementation. For complete syntax details see https://www.pcre.org/current/doc/html/

See also

regmatches() for base R matched substring extraction.

Examples

str_extract_first(c("mat", "bat", "pig", "cat-in-a-hat"), ".at")
#> [1] "mat" "bat" NA    "cat"

str_extract_all(c("mat", "bat", "pig", "cat-in-a-hat"), ".at")
#> [[1]]
#> [1] "mat"
#>
#> [[2]]
#> [1] "bat"
#>
#> [[3]]
#> character(0)
#>
#> [[4]]
#> [1] "cat" "hat"

str_extract_nth(c("mat", "bat", "pig", "cat-in-a-hat"), ".at", 2)
#> [1] NA    NA    NA    "hat"

str_extract_last(c("mat", "bat", "pig", "cat-in-a-hat"), ".at")
#> [1] "mat" "bat" NA    "hat"

chr_extract_all(c("mat", "bat", "pig", "cat-in-a-hat"), ".at")
#> [1] "mat" "bat" "cat" "hat"