ddply错误的含义:'names'属性必须与向量的长度相同,

我正在通过机器学习黑客,我被困在这一行:

from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject)) 

其中会产生以下错误:

 Error in attributes(out) <- attributes(col) : 'names' attribute [9] must be the same length as the vector [1] 

这是一个追溯():

 > traceback() 11: FUN(1:5[[1L]], ...) 10: lapply(seq_len(n), extract_col_rows, df = x, i = i) 9: extract_rows(x$data, x$index[[i]]) 8: `[[.indexed_df`(pieces, i) 7: pieces[[i]] 6: function (i) { piece <- pieces[[i]] if (.inform) { res <- try(.fun(piece, ...)) if (inherits(res, "try-error")) { piece <- paste(capture.output(print(piece)), collapse = "\n") stop("with piece ", i, ": \n", piece, call. = FALSE) } } else { res <- .fun(piece, ...) } progress$step() res }(1L) 5: .Call("loop_apply", as.integer(n), f, env) 4: loop_apply(n, do.ply) 3: llply(.data = .data, .fun = .fun, ..., .progress = .progress, .inform = .inform, .parallel = .parallel, .paropts = .paropts) 2: ldply(.data = pieces, .fun = .fun, ..., .progress = .progress, .inform = .inform, .parallel = .parallel, .paropts = .paropts) 1: ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject)) 

priority.train对象是一个数据框,这里是更多的信息:

 > mode(priority.train) [1] "list" > names(priority.train) [1] "Date" "From.EMail" "Subject" "Message" "Path" > sapply(priority.train, mode) Date From.EMail Subject Message Path "list" "character" "character" "character" "character" > sapply(priority.train, class) $Date [1] "POSIXlt" "POSIXt" $From.EMail [1] "character" $Subject [1] "character" $Message [1] "character" $Path [1] "character" > length(priority.train) [1] 5 > nrow(priority.train) [1] 1250 > ncol(priority.train) [1] 5 > str(priority.train) 'data.frame': 1250 obs. of 5 variables: $ Date : POSIXlt, format: "2002-01-31 22:44:14" "2002-02-01 00:53:41" "2002-02-01 02:01:44" "2002-02-01 10:29:23" ... $ From.EMail: chr "removed@removed.ca" "removed@removed.net" "removed@removed.ca" "removed@removed.net" ... $ Subject : chr "please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" "re: please help a newbie compile mplayer :-)" ... $ Message : chr " \n Hello,\n \n I just installed redhat 7.2 and I think I have everything \nworking properly. Anyway I want to in"| __truncated__ "Make sure you rebuild as root and you're in the directory that you\ndownloaded the file. Also it might complain of a few depen"| __truncated__ "Lance wrote:\n\n>Make sure you rebuild as root and you're in the directory that you\n>downloaded the file. Also it might compl"| __truncated__ "Once upon a time, rob wrote :\n\n> I dl'd gcc3 and libgcc3, but I still get the same error message when I \n> try rpm --rebuil"| __truncated__ ... $ Path : chr "../03-Classification/data/easy_ham/01061.6610124afa2a5844d41951439d1c1068" "../03-Classification/data/easy_ham/01062.ef7955b391f9b161f3f2106c8cda5edb" "../03-Classification/data/easy_ham/01063.ad3449bd2890a29828ac3978ca8c02ab" "../03-Classification/data/easy_ham/01064.9f4fc60b4e27bba3561e322c82d5f7ff" ... Warning messages: 1: In encodeString(object, quote = "\"", na.encode = FALSE) : it is not known that wchar_t is Unicode on this platform 2: In encodeString(object, quote = "\"", na.encode = FALSE) : it is not known that wchar_t is Unicode on this platform 

我会张贴一个样本,但内容有点长,我不认为这里的内容是相关的。

同样的错误也发生在这里:

 > ddply(priority.train, .(Subject)) Error in attributes(out) <- attributes(col) : 'names' attribute [9] must be the same length as the vector [1] 

有没有人知道这里发生了什么? 这个错误似乎是由不同于priority.train的对象生成的,因为它的名字属性显然有9个元素。

我会很感激任何帮助。 谢谢!

问题解决了

我发现这个问题得益于@ user1317221_G的使用dput函数的提示。 问题是date字段,这是一个包含9个字段(秒,分钟,小时,星期一,星期一,星期一,星期六,星期一,星期一)的列表。 为了解决这个问题,我简单地将date转换成字符向量,使用ddply然后将date转换回date:

 > tmp <- priority.train$Date > priority.train$Date <- as.character(priority.train$Date) > from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject)) > priority.train$Date <- tmp > rm(tmp) 

我解决了这个问题,我正在通过从POSIXlt到POSIXct的格式转换,就像哈德利上面提到的那样 – 一行代码:

  mydata$datetime<-strptime(mydata$datetime, "%Y-%m-%d %H:%M:%S") # original conversion from datetime string : > class(mydata$datetime) [1] "POSIXlt" "POSIXt" mydata$datetime<-as.POSIXct(mydata$datetime) # convert to POSIXct to use in data frames / ddply 

你可能已经看到这个 ,并没有帮助。 我想我们可能还没有答案,因为人们不能重现你的错误。

dput或更小的head(dput())可能会有所帮助。 但是这里有一个使用base的替代方法:

 x <- data.frame(A=c("a","b","c","a"),B=c("e","d","d","d")) ddply(x,.(A),summarise, Freq = length(B)) A Freq 1 a 2 2 b 1 3 c 1 tapply(x$B,x$A,length) abc 2 1 1 

这是否适合你?

 x2 <- data.frame(A=c("removed@removed.ca", "removed@removed.net"), B=c("please help a newbie compile mplayer :-)", "re: please help a newbie compile mplayer :-)")) tapply(x2$B,x2$A,length) removed@removed.ca removed@removed.net 1 1 ddply(x2,.(A),summarise, Freq = length(B)) A Freq 1 removed@removed.ca 1 2 removed@removed.net 1 

你也可以尝试更简单:

 table(x2$A) removed@removed.ca removed@removed.net 1 1 

我有一个非常类似的问题,虽然不知道它是否是一个相同的问题。 我收到下面的错误。

 Error in attributes(out) <- attributes(col) : 'names' attribute [20388] must be the same length as the vector [128] 

我在列表模式中没有任何variables,所以Mota的解决scheme不适用于我的情况。 我sorting问题的方法是删除plyr 1.8并手动安装plyr 1.7。 错误然后消失了。 我也试着重新安装plyr 1.8,并复制了这个问题。

HTH。

我也遇到了与ddply类似的问题,并给出了下面的代码/错误:

  test <- ddply(test, "catColumn", function(df) df[1:min(nrow(df), 3),]) Error: 'names' attribute [11] must be the same length as the vector [2] 

数据框“testing”中有不less分类variables。

将分类variables转换为字符variables如下使得ddply命令工作:

  test <- data.frame(lapply(test, as.character), stringsAsFactors=FALSE) 

一旦你明白,这是一个干扰的date列,你也可以简单地离开那列时,你运行命令,而不是转换它…

所以

 from.weight <- ddply(priority.train, .(From.EMail), summarise, Freq = length(Subject)) 

可以变成

 from.weight <- ddply(priority.train[,c(1:7,9:10)], .(From.EMail), summarise, Freq = length(Subject)) 

例如,如果POSIXltdate碰巧在数据框的第8列。 报告错误的奇怪之处在于它可能与您正在尝试分组或按您所寻找的输出信息无关。

使用ddply时遇到同样的问题,并用doBy

 library(doBy) bylength = function(x){length(x)} newdt = bylength(X ~From.EMail + To.EMail, data = dt, FUN = bylength) 

我也面临同样的问题,我解决它只保留所需的数据为ddply和转换filtervariables和所有需要的文本variables字符通过使用as.character

有效