在data.table中转换列类

我有一个问题,使用data.table:如何转换列类? 这里是一个简单的例子:用data.frame我没有问题转换它,用data.table我只是不知道如何:

df <- data.frame(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10)) #One way: http://stackoverflow.com/questions/2851015/r-convert-data-frame-columns-from-factors-to-characters df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE) #Another way df[, "value"] <- as.numeric(df[, "value"]) library(data.table) dt <- data.table(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10)) dt <- data.table(lapply(dt, as.character), stringsAsFactors=FALSE) #Error in rep("", ncol(xi)) : invalid 'times' argument #Produces error, does data.table not have the option stringsAsFactors? dt[, "ID", with=FALSE] <- as.character(dt[, "ID", with=FALSE]) #Produces error: Error in `[<-.data.table`(`*tmp*`, , "ID", with = FALSE, value = "c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)") : #unused argument(s) (with = FALSE) 

我在这里错过一些明显的东西吗

由于马修的post更新:我以前使用旧版本,但即使更新到1.6.6(我现在使用的版本),我仍然得到一个错误。

更新2:假设我想将类“factor”的每一列转换为“字符”列,但是事先不知道哪一列是哪一类。 有了data.frame,我可以做到以下几点:

 classes <- as.character(sapply(df, class)) colClasses <- which(classes=="factor") df[, colClasses] <- sapply(df[, colClasses], as.character) 

我可以用data.table做类似的事吗?

更新3:

sessionInfo()R版本2.13.1(2011-07-08)平台:x86_64-pc-mingw32 / x64(64位)

 locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.6.6 loaded via a namespace (and not attached): [1] tools_2.13.1 

对于单列:

 dtnew <- dt[, Quarter:=as.character(Quarter)] str(dtnew) Classes 'data.table' and 'data.frame': 10 obs. of 3 variables: $ ID : Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2 $ Quarter: chr "1" "2" "3" "4" ... $ value : num -0.838 0.146 -1.059 -1.197 0.282 ... 

使用lapplyas.character

 dtnew <- dt[, lapply(.SD, as.character), by=ID] str(dtnew) Classes 'data.table' and 'data.frame': 10 obs. of 3 variables: $ ID : Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2 $ Quarter: chr "1" "2" "3" "4" ... $ value : chr "1.487145280568" "-0.827845218358881" "0.028977182770002" "1.35392750102305" ... 

尝试这个

 DT <- data.table(X1 = c("a", "b"), X2 = c(1,2), X3 = c("hello", "you")) changeCols <- colnames(DT)[which(as.vector(DT[,lapply(.SD, class)]) == "character")] DT[,(changeCols):= lapply(.SD, as.factor), .SDcols = changeCols] 

这是一个不好的方法来做到这一点! 我只是留下这个答案,以解决其他奇怪的问题。 这些更好的方法可能部分是更新的data.table版本的结果…所以这是值得的文件这种困难的方式。 另外,这是eval substitute语法的一个很好的语法示例。

 library(data.table) dt <- data.table(ID = c(rep("A", 5), rep("B",5)), fac1 = c(1:5, 1:5), fac2 = c(1:5, 1:5) * 2, val1 = rnorm(10), val2 = rnorm(10)) names_factors = c('fac1', 'fac2') names_values = c('val1', 'val2') for (col in names_factors){ e = substitute(X := as.factor(X), list(X = as.symbol(col))) dt[ , eval(e)] } for (col in names_values){ e = substitute(X := as.numeric(X), list(X = as.symbol(col))) dt[ , eval(e)] } str(dt) 

这给你

 Classes 'data.table' and 'data.frame': 10 obs. of 5 variables: $ ID : chr "A" "A" "A" "A" ... $ fac1: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5 $ fac2: Factor w/ 5 levels "2","4","6","8",..: 1 2 3 4 5 1 2 3 4 5 $ val1: num 0.0459 2.0113 0.5186 -0.8348 -0.2185 ... $ val2: num -0.0688 0.6544 0.267 -0.1322 -0.4893 ... - attr(*, ".internal.selfref")=<externalptr> 

我尝试了几种方法。

 # BY {dplyr} data.table(ID = c(rep("A", 5), rep("B",5)), Quarter = c(1:5, 1:5), value = rnorm(10)) -> df1 df1 %<>% dplyr::mutate(ID = as.factor(ID), Quarter = as.character(Quarter)) # check classes dplyr::glimpse(df1) # Observations: 10 # Variables: 3 # $ ID (fctr) A, A, A, A, A, B, B, B, B, B # $ Quarter (chr) "1", "2", "3", "4", "5", "1", "2", "3", "4", "5" # $ value (dbl) -0.07676732, 0.25376110, 2.47192852, 0.84929175, -0.13567312, -0.94224435, 0.80213218, -0.89652819... 

或以其他方式

 # from list to data.table using data.table::setDT list(ID = as.factor(c(rep("A", 5), rep("B",5))), Quarter = as.character(c(1:5, 1:5)), value = rnorm(10)) %>% setDT(list.df) -> df2 class(df2) # [1] "data.table" "data.frame" 

尝试:

 dt <- data.table(A = c(1:5), B= c(11:15)) x <- ncol(dt) for(i in 1:x) { dt[[i]] <- as.character(dt[[i]]) }