Extract complete matches from strings
Source:R/string-pattern-matching.R, R/vector-pattern-matching.R
str_extract.RdThese functions extract parts of strings based on a pattern.
str_extract_first(), str_extract_nth() and str_extract_last() extract
the first, nth and last occurrence of a pattern in each string,
into a character vector the same length as strings.
str_extract_all() extracts a character vector of all occurrences of a pattern
for each string, into a list the same length as strings.
chr_extract_all() extracts all occurrences of a pattern from strings
into a character vector.
Usage
str_extract_first(strings, pattern, fixed = FALSE)
str_extract_all(strings, pattern, fixed = FALSE)
str_extract_nth(strings, pattern, n, fixed = FALSE)
str_extract_last(strings, pattern, fixed = FALSE)
chr_extract_all(strings, pattern, fixed = FALSE)Arguments
- strings
A character vector, where each element of the vector is a character string.
- pattern
A single character string to be searched for in each element of
strings. By default,patternis interpreted as a regular expression (regex). If thefixedargument is set toTRUE,patternwill be treated as a literal string to be matched exactly.- fixed
Logical; whether
patternshould be matched exactly, treating regex special characters as regular string characters. DefaultFALSE.- n
(
str_extract_nthonly) Integer, the nth occurrence of the pattern to extract. Negative values count back from the end.
Value
str_extract_first(), str_extract_nth() and str_extract_last()
each return a character vector the same length as the input vector strings.
It contains the extracted portion of the string, corresponding to
the first, nth and last match of the pattern, respectively. Strings
with no corresponding match are represented as NA values.
str_extract_all(): returns a list of character vectors, where each list element corresponds
to a string in the input vector. Each element is a character vector of all matches in that string.
If no matches are found in a string, the corresponding list element is an empty character vector i.e. character(0).
The list is the same length as the input vector strings.
chr_extract_all(): returns a character vector containing every single match in the input vector.
Non-matches are ignored. This is equivalent to calling unlist() on the output of str_extract_all().
Details
These functions are built using the base R regular expression functions.
{suitestrings} uses Perl-compatible Regular Expressions (PCRE).
This is achieved by setting perl = TRUE in the underlying base functions.
See R's regexp documentation for info on the regex implementation.
For complete syntax details see https://www.pcre.org/current/doc/html/
See also
regmatches() for base R matched substring extraction.
Examples
str_extract_first(c("mat", "bat", "pig", "cat-in-a-hat"), ".at")
#> [1] "mat" "bat" NA "cat"
str_extract_all(c("mat", "bat", "pig", "cat-in-a-hat"), ".at")
#> [[1]]
#> [1] "mat"
#>
#> [[2]]
#> [1] "bat"
#>
#> [[3]]
#> character(0)
#>
#> [[4]]
#> [1] "cat" "hat"
str_extract_nth(c("mat", "bat", "pig", "cat-in-a-hat"), ".at", 2)
#> [1] NA NA NA "hat"
str_extract_last(c("mat", "bat", "pig", "cat-in-a-hat"), ".at")
#> [1] "mat" "bat" NA "hat"
chr_extract_all(c("mat", "bat", "pig", "cat-in-a-hat"), ".at")
#> [1] "mat" "bat" "cat" "hat"