重复dataframeN次

我有以下数据框

data.frame(a = c(1,2,3),b = c(1,2,3)) ab 1 1 1 2 2 2 3 3 3 

我想把它变成

  ab 1 1 1 2 2 2 3 3 3 4 1 1 5 2 2 6 3 3 7 1 1 8 2 2 9 3 3 

或重复N次。 R中有这样一个简单的function吗? 谢谢!

你可以使用replicate() ,然后把结果重新结合在一起。 rownames会自动更改为从1:nrows运行。

 d <- data.frame(a = c(1,2,3),b = c(1,2,3)) n <- 3 do.call("rbind", replicate(n, d, simplify = FALSE)) 

一个更传统的方法是只使用索引,但是这里的rowname改变不是那么整齐(但更多的信息):

  d[rep(seq_len(nrow(d)), n), ] 

对于data.frame对象,这个解决scheme比@ mdsummer和@ wojciech-sobala快几倍。

 d[rep(seq_len(nrow(d)), n), ] 

对于data.table对象,在转换为data.frame之后,@ mdsummer的比应用上面的要data.frame 。 对于大n这可能翻转。 微基准

完整代码:

 Repeat1 <- function(d, n) { return(do.call("rbind", replicate(n, d, simplify = FALSE))) } Repeat2 <- function(d, n) { return(Reduce(rbind, list(d)[rep(1L, times=n)])) } Repeat3 <- function(d, n) { if ("data.table" %in% class(d)) return(d[rep(seq_len(nrow(d)), n)]) return(d[rep(seq_len(nrow(d)), n), ]) } Repeat3.dt.convert <- function(d, n) { if ("data.table" %in% class(d)) d <- as.data.frame(d) return(d[rep(seq_len(nrow(d)), n), ]) } # Try with data.frames mtcars1 <- Repeat1(mtcars, 3) mtcars2 <- Repeat2(mtcars, 3) mtcars3 <- Repeat3(mtcars, 3) library(RUnit) checkEquals(mtcars1, mtcars2) # Only difference is row.names having ".k" suffix instead of "k" from 1 & 2 checkEquals(mtcars1, mtcars3) # Works with data.tables too mtcars.dt <- data.table(mtcars) mtcars.dt1 <- Repeat1(mtcars.dt, 3) mtcars.dt2 <- Repeat2(mtcars.dt, 3) mtcars.dt3 <- Repeat3(mtcars.dt, 3) # No row.names mismatch since data.tables don't have row.names checkEquals(mtcars.dt1, mtcars.dt2) checkEquals(mtcars.dt1, mtcars.dt3) # Time test library(microbenchmark) res <- microbenchmark(Repeat1(mtcars, 10), Repeat2(mtcars, 10), Repeat3(mtcars, 10), Repeat1(mtcars.dt, 10), Repeat2(mtcars.dt, 10), Repeat3(mtcars.dt, 10), Repeat3.dt.convert(mtcars.dt, 10)) print(res) library(ggplot2) ggsave("~/gdrive/repeat_microbenchmark.png", autoplot(res)) 

“`

dplyr包含直接将列表中的所有dataframe组合在一起的函数bind_rows() ,因此不需要将do.call()rbind()一起使用:

 df <- data.frame(a = c(1, 2, 3), b = c(1, 2, 3)) library(dplyr) bind_rows(replicate(3, df, simplify = FALSE)) 

对于大量的重复, bind_rows()也比rbind()快得多:

 library(microbenchmark) microbenchmark(rbind = do.call("rbind", replicate(1000, df, simplify = FALSE)), bind_rows = bind_rows(replicate(1000, df, simplify = FALSE)), times = 20) ## Unit: milliseconds ## expr min lq mean median uq max neval cld ## rbind 31.796100 33.017077 35.436753 34.32861 36.773017 43.556112 20 b ## bind_rows 1.765956 1.818087 1.881697 1.86207 1.898839 2.321621 20 a 
 d <- data.frame(a = c(1,2,3),b = c(1,2,3)) r <- Reduce(rbind, list(d)[rep(1L, times=3L)]) 

只需使用简单的索引与重复function。

 mydata<-data.frame(a = c(1,2,3),b = c(1,2,3)) #creating your data frame n<-10 #defining no. of time you want repetition of the rows of your dataframe mydata<-mydata[rep(rownames(mydata),n),] #use rep function while doing indexing rownames(mydata)<-1:NROW(mydata) #rename rows just to get cleaner look of data