R与多个无序拆分参数strsplit?
给定一个string
test_1<-"abc def,ghi klm" test_2<-"abc, def ghi klm" 我希望获得
 "abc" "def" "ghi" 
但是,使用strsplit,必须知道string中的分割值的顺序,因为strsplit使用第一个值做第一个分割,第二个做第二个…然后再循环。
但是这不是:
 strsplit(test_1, c(",", " ")) strsplit(test_2, c(" ", ",")) strsplit(test_2, split=c("[:punct:]","[:space:]"))[[1]] 
我正在寻找拆分string,无论我在哪里find我的任何分裂值在一个单一的步骤。
 其实strsplit使用grep模式: 
 > strsplit(test_1, "\\, |\\,| ") [[1]] [1] "abc" "def" "ghi" "klm" > strsplit(test_2, "\\, |\\,| ") [[1]] [1] "abc" "def" "ghi" "klm" 
 如果不使用\\,和\\, (注意SO不显示的额外空间),您将得到一些字符(0)值。 如果我写了下面的话,可能会更清楚: 
 > strsplit(test_2, "\\,\\s|\\,|\\s") [[1]] [1] "abc" "def" "ghi" "klm" 
@Fojtasek是如此正确:使用字符类通常简化了任务,因为它创build了一个隐式的逻辑OR:
 > strsplit(test_2, "[, ]+") [[1]] [1] "abc" "def" "ghi" "klm" > strsplit(test_1, "[, ]+") [[1]] [1] "abc" "def" "ghi" "klm" 
 你可以去strsplit(test_1, "\\W") 。 
 如果你不喜欢正则expression式,你可以多次调用strsplit() : 
 strsplits <- function(x, splits, ...) { for (split in splits) { x <- unlist(strsplit(x, split, ...)) } return(x[!x == ""]) # Remove empty values } strsplits(test_1, c(" ", ",")) # "abc" "def" "ghi" "klm" strsplits(test_2, c(" ", ",")) # "abc" "def" "ghi" "klm" 
更新后的示例
 strsplits(test_1, c("[[:punct:]]","[[:space:]]")) # "abc" "def" "ghi" "klm" strsplits(test_2, c("[[:punct:]]","[[:space:]]")) # "abc" "def" "ghi" "klm" 
但是,如果你打算使用正则expression式,那么你可以使用@Dinin的方法:
 strsplit(test_1, "[[:punct:][:space:]]+")[[1]] # "abc" "def" "ghi" "klm" strsplit(test_2, "[[:punct:][:space:]]+")[[1]] # "abc" "def" "ghi" "klm" 
  test_1<-"abc def,ghi klm" test_2<-"abc, def ghi klm" key_words <- c("abc","def","ghi") matches <- str_c(key_words, collapse ="|") str_extract_all(test_1, matches) str_extract_all(test_2, matches)