Extract complete matches from strings
Source:R/string-pattern-matching.R
, R/vector-pattern-matching.R
str_extract.Rd
These functions extract parts of strings based on a pattern.
str_extract_first()
, str_extract_nth()
and str_extract_last()
extract
the first, nth and last occurrence of a pattern in each string,
into a character vector the same length as strings
.
str_extract_all()
extracts a character vector of all occurrences of a pattern
for each string, into a list the same length as strings
.
chr_extract_all()
extracts all occurrences of a pattern from strings
into a character vector.
Usage
str_extract_first(strings, pattern, fixed = FALSE)
str_extract_all(strings, pattern, fixed = FALSE)
str_extract_nth(strings, pattern, n, fixed = FALSE)
str_extract_last(strings, pattern, fixed = FALSE)
chr_extract_all(strings, pattern, fixed = FALSE)
Arguments
- strings
A character vector, where each element of the vector is a character string.
- pattern
A single character string to be searched for in each element of
strings
. By default,pattern
is interpreted as a regular expression (regex). If thefixed
argument is set toTRUE
,pattern
will be treated as a literal string to be matched exactly.- fixed
Logical; whether
pattern
should be matched exactly, treating regex special characters as regular string characters. DefaultFALSE
.- n
(
str_extract_nth
only) Integer, the nth occurrence of the pattern to extract. Negative values count back from the end.
Value
str_extract_first()
, str_extract_nth()
and str_extract_last()
each return a character vector the same length as the input vector strings
.
It contains the extracted portion of the string, corresponding to
the first, nth and last match of the pattern, respectively. Strings
with no corresponding match are represented as NA
values.
str_extract_all()
: returns a list of character vectors, where each list element corresponds
to a string in the input vector. Each element is a character vector of all matches in that string.
If no matches are found in a string, the corresponding list element is an empty character vector i.e. character(0)
.
The list is the same length as the input vector strings
.
chr_extract_all()
: returns a character vector containing every single match in the input vector.
Non-matches are ignored. This is equivalent to calling unlist()
on the output of str_extract_all()
.
Details
These functions are built using the base R regular expression functions.
{suitestrings}
uses Perl-compatible Regular Expressions (PCRE).
This is achieved by setting perl = TRUE
in the underlying base functions.
See R's regexp documentation for info on the regex implementation.
For complete syntax details see https://www.pcre.org/current/doc/html/
See also
regmatches()
for base R matched substring extraction.
Examples
str_extract_first(c("mat", "bat", "pig", "cat-in-a-hat"), ".at")
#> [1] "mat" "bat" NA "cat"
str_extract_all(c("mat", "bat", "pig", "cat-in-a-hat"), ".at")
#> [[1]]
#> [1] "mat"
#>
#> [[2]]
#> [1] "bat"
#>
#> [[3]]
#> character(0)
#>
#> [[4]]
#> [1] "cat" "hat"
str_extract_nth(c("mat", "bat", "pig", "cat-in-a-hat"), ".at", 2)
#> [1] NA NA NA "hat"
str_extract_last(c("mat", "bat", "pig", "cat-in-a-hat"), ".at")
#> [1] "mat" "bat" NA "hat"
chr_extract_all(c("mat", "bat", "pig", "cat-in-a-hat"), ".at")
#> [1] "mat" "bat" "cat" "hat"