使用tm_map(…,tolower)将文本转换为小写时出错

我尝试使用tm_map 。 它给了以下错误。 我怎样才能解决这个问题?

  require(tm) byword<-tm_map(byword, tolower) Error in UseMethod("tm_map", x) : no applicable method for 'tm_map' applied to an object of class "character" 

使用基本的R函数tolower()

 tolower(c("THE quick BROWN fox")) # [1] "the quick brown fox" 

把我的评论扩展到更详细的答案:你必须把content_transformer内部包装起来,而不是把VCorpus对象VCorpus – 就像:

 > library(tm) > data('crude') > crude[[1]]$content [1] "Diamond Shamrock Corp said that\neffective today it had cut its contract prices for crude oil by\n1.50 dlrs a barrel.\n The reduction brings its posted price for West Texas\nIntermediate to 16.00 dlrs a barrel, the copany said.\n \"The price reduction today was made in the light of falling\noil product prices and a weak crude oil market,\" a company\nspokeswoman said.\n Diamond is the latest in a line of US oil companies that\nhave cut its contract, or posted, prices over the last two days\nciting weak oil markets.\n Reuter" > tm_map(crude, content_transformer(tolower))[[1]]$content [1] "diamond shamrock corp said that\neffective today it had cut its contract prices for crude oil by\n1.50 dlrs a barrel.\n the reduction brings its posted price for west texas\nintermediate to 16.00 dlrs a barrel, the copany said.\n \"the price reduction today was made in the light of falling\noil product prices and a weak crude oil market,\" a company\nspokeswoman said.\n diamond is the latest in a line of us oil companies that\nhave cut its contract, or posted, prices over the last two days\nciting weak oil markets.\n reuter" 
 myCorpus <- Corpus(VectorSource(byword)) myCorpus <- tm_map(myCorpus , tolower) print(myCorpus[[1]]) 

以这种方式使用tolower会产生不希望的副作用:如果以后尝试从语料库中创build术语文档matrix,则会失败。 这是因为最近tm的变化,不能处理tolower的返回types。 相反,使用:

 myCorpus <- tm_map(myCorpus, PlainTextDocument) 
Interesting Posts