在R中嵌套ifelse语句

我在这里是新来的,在R我是初学者。我在Windows7上使用最新的R 3.0.1。

我仍然在学习如何将SAS代码翻译成R,并得到警告。 我需要明白我犯的错误。 我想做的是创造一个总结和区分一个人口的大陆,海外,外国人的变数。 我有一个数据库与2个variables:

  • id国籍: idnat (法语,外国人),

如果idnat是法语的话:

  • id出生地: idbp (大陆,殖民地,海外)

我想将idnatidbp的信息idbp到一个名为idnat2的新variables中:

  • 状态:k(大陆,海外,外国人)

所有这些variables都使用“字符types”。

列idnat2中的预期结果:

  idnat idbp idnat2 1 french mainland mainland 2 french colony overseas 3 french overseas overseas 4 foreign foreign foreign 

这是我想要在R中翻译的SAS代码:

 if idnat = "french" then do; if idbp in ("overseas","colony") then idnat2 = "overseas"; else idnat2 = "mainland"; end; else idnat2 = "foreigner"; run; 

这是我在R的尝试:

 if(idnat=="french"){ idnat2 <- "mainland" } else if(idbp=="overseas"|idbp=="colony"){ idnat2 <- "overseas" } else { idnat2 <- "foreigner" } 

我收到这个警告:

 Warning message: In if (idnat=="french") { : the condition has length > 1 and only the first element will be used 

我被build议使用“嵌套ifelse ”,而不是它的简单性,但得到更多的警告:

 idnat2 <- ifelse (idnat=="french", "mainland", ifelse (idbp=="overseas"|idbp=="colony", "overseas") ) else (idnat2 <- "foreigner") 

根据警告消息,长度大于1,所以只考虑第一个括号之间的内容。 对不起,但我不明白这个长度与这里有什么关系? 谁知道我错了?

如果您正在使用任何电子表格应用程序,则有一个基本的函数if() with syntax:

 if(<condition>, <yes>, <no>) 

语法与R中的ifelse()完全相同:

 ifelse(<condition>, <yes>, <no>) 

if()在电子表格应用程序中的唯一区别是R ifelse()是vector化的(将vector作为input,并将输出返回给vector)。 考虑以下电子表格应用程序中的公式比较和R中的一个例子,如果a> b,则返回1,否则返回0。

在电子表格中:

  ABC 1 3 1 =if(A1 > B1, 1, 0) 2 2 2 =if(A2 > B2, 1, 0) 3 1 3 =if(A3 > B3, 1, 0) 

在R:

 > a <- 3:1; b <- 1:3 > ifelse(a > b, 1, 0) [1] 1 0 0 

ifelse()可以以多种方式嵌套:

 ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, <no>)) ifelse(<condition>, ifelse(<condition>, <yes>, <no>), <no>) ifelse(<condition>, ifelse(<condition>, <yes>, <no>), ifelse(<condition>, <yes>, <no>) ) ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, ifelse(<condition>, <yes>, <no>) ) ) 

要计算列idnat2您可以:

 df <- read.table(header=TRUE, text=" idnat idbp idnat2 french mainland mainland french colony overseas french overseas overseas foreign foreign foreign" ) with(df, ifelse(idnat=="french", ifelse(idbp %in% c("overseas","colony"),"overseas","mainland"),"foreign") ) 

R文档

什么是the condition has length > 1 and only the first element will be used ? 让我们来看看:

 > # What is first condition really testing? > with(df, idnat=="french") [1] TRUE TRUE TRUE FALSE > # This is result of vectorized function - equality of all elements in idnat and > # string "french" is tested. > # Vector of logical values is returned (has the same length as idnat) > df$idnat2 <- with(df, + if(idnat=="french"){ + idnat2 <- "xxx" + } + ) Warning message: In if (idnat == "french") { : the condition has length > 1 and only the first element will be used > # Note that the first element of comparison is TRUE and that's whay we get: > df idnat idbp idnat2 1 french mainland xxx 2 french colony xxx 3 french overseas xxx 4 foreign foreign xxx > # There is really logic in it, you have to get used to it 

我还可以使用if()吗? 是的,你可以,但语法不是很酷:)

 test <- function(x) { if(x=="french") { "french" } else{ "not really french" } } apply(array(df[["idnat"]]),MARGIN=1, FUN=test) 

如果您熟悉SQL,则还可以在sqldf 软件包中使用CASE 语句 。

尝试如下所示:

 # some sample data idnat <- sample(c("french","foreigner"),100,TRUE) idbp <- rep(NA,100) idbp[idnat=="french"] <- sample(c("mainland","overseas","colony"),sum(idnat=="french"),TRUE) # recoding out <- ifelse(idnat=="french" & !idbp %in% c("overseas","colony"), "mainland", ifelse(idbp %in% c("overseas","colony"),"overseas", "foreigner")) cbind(idnat,idbp,out) # check result 

你的困惑来自于SAS和R如何处理其他结构。 在R中, ifelse不是vector化,意思是它们检查单个条件是否为真(即, if("french"=="french")工作)并且不能处理多个逻辑(即, if(c("french","foreigner")=="french")不起作用)和R给你警告你收到。

相比之下, ifelse是vector化的,所以它可以把你的向量(akainputvariables)和每个元素的逻辑条件进行testing,就像你在SAS中习惯的那样。 另一种方法是用ifelse语句来构build一个循环(正如你在这里所做的那样),但是vector化的ifelse方法将会更有效率,并且通常涉及更less的代码。

如果没有ififelse你可以创buildvectorifelse

functionreplace可以用来replace所有出现的"colony""overseas"

 idnat2 <- replace(idbp, idbp == "colony", "overseas") 

如果数据集包含许多行,则使用data.table而不是嵌套的ifelse()与查找表连接可能更有效。

提供了下面的查找表

 lookup 
  idnat idbp idnat2 1: french mainland mainland 2: french colony overseas 3: french overseas overseas 4: foreign foreign foreign 

和一个样本数据集

 library(data.table) n_row <- 10L set.seed(1L) DT <- data.table(idnat = "french", idbp = sample(c("mainland", "colony", "overseas", "foreign"), n_row, replace = TRUE)) DT[idbp == "foreign", idnat := "foreign"][] 
  idnat idbp 1: french colony 2: french colony 3: french overseas 4: foreign foreign 5: french mainland 6: foreign foreign 7: foreign foreign 8: french overseas 9: french overseas 10: french mainland 

那么我们可以在join时进行更新

 DT[lookup, on = .(idnat, idbp), idnat2 := i.idnat2][] 
  idnat idbp idnat2 1: french colony overseas 2: french colony overseas 3: french overseas overseas 4: foreign foreign foreign 5: french mainland mainland 6: foreign foreign foreign 7: foreign foreign foreign 8: french overseas overseas 9: french overseas overseas 10: french mainland mainland 

将SQL CASE语句与dplyr和sqldf包一起使用:

数据

 df <-structure(list(idnat = structure(c(2L, 2L, 2L, 1L), .Label = c("foreign", "french"), class = "factor"), idbp = structure(c(3L, 1L, 4L, 2L), .Label = c("colony", "foreign", "mainland", "overseas"), class = "factor")), .Names = c("idnat", "idbp"), class = "data.frame", row.names = c(NA, -4L)) 

sqldf

 library(sqldf) sqldf("SELECT idnat, idbp, CASE WHEN idbp IN ('colony', 'overseas') THEN 'overseas' ELSE idbp END AS idnat2 FROM df") 

dplyr

 library(dplyr) df %>% mutate(idnat2 = case_when(.$idbp == 'mainland' ~ "mainland", .$idbp %in% c("colony", "overseas") ~ "overseas", TRUE ~ "foreign")) 

产量

  idnat idbp idnat2 1 french mainland mainland 2 french colony overseas 3 french overseas overseas 4 foreign foreign foreign 

使用data.table,解决scheme是:

 DT[, idnat2 := ifelse(idbp %in% "foreign", "foreign", ifelse(idbp %in% c("colony", "overseas"), "overseas", "mainland" ))] 

ifelse是vector化的。 if-else不是。 在这里,DT是:

  idnat idbp 1 french mainland 2 french colony 3 french overseas 4 foreign foreign 

这给了:

  idnat idbp idnat2 1: french mainland mainland 2: french colony overseas 3: french overseas overseas 4: foreign foreign foreign