是否有一个内置的function来查找模式？

在R中， mean()和median()是符合你期望的标准函数。 mode()告诉你对象的内部存储模式，而不是在其参数中出现最多的值。但是是否有一个标准的库函数来实现向量（或列表）的统计模式？

还有一个解决scheme，可以同时处理数字和字符/因子数据：

 Mode <- function(x) { ux <- unique(x) ux[which.max(tabulate(match(x, ux)))] }

在我小小的机器上，可以在大约半秒内产生和find一个10M整数向量的模式。

有包模式，它提供了单variables单峰（有时是多峰）数据的模式的估计值和通常概率分布模式的值。

 mySamples <- c(19, 4, 5, 7, 29, 19, 29, 13, 25, 19) library(modeest) mlv(mySamples, method = "mfv") Mode (most likely value): 19 Bickel's modal skewness: -0.1 Call: mlv.default(x = mySamples, method = "mfv")

欲了解更多信息，请参阅此页

在r邮件列表中find这个，希望对你有帮助。这也正是我在想什么。你需要table（）数据，sorting，然后select名字。这是hackish，但应该工作。

 names(sort(-table(x)))[1]

我发现肯·威廉姆斯在上面发帖很棒，我添加了几行来解释NA的值，并使其成为一种轻松的function。

 Mode <- function(x, na.rm = FALSE) { if(na.rm){ x = x[!is.na(x)] } ux <- unique(x) return(ux[which.max(tabulate(match(x, ux)))]) }

估计你相信来自连续单variables分布（例如，正态分布）的数字向量的模式的快速和肮脏的方式是定义和使用以下函数：

 estimate_mode <- function(x) { d <- density(x) d$x[which.max(d$y)] }

然后得到模式估计：

 x <- c(5.8, 5.6, 6.2, 4.1, 4.9, 2.4, 3.9, 1.8, 5.7, 3.2) estimate_mode(x) ## 5.439788

以下function有三种forms：

method =“mode”[default]：计算单峰向量的模式，否则返回NA
method =“nmodes”：计算向量中的模式数量
method =“modes”：列出单峰或多峰vector的所有模式

 modeav <- function (x, method = "mode", na.rm = FALSE) { x <- unlist(x) if (na.rm) x <- x[!is.na(x)] u <- unique(x) n <- length(u) #get frequencies of each of the unique values in the vector frequencies <- rep(0, n) for (i in seq_len(n)) { if (is.na(u[i])) { frequencies[i] <- sum(is.na(x)) } else { frequencies[i] <- sum(x == u[i], na.rm = TRUE) } } #mode if a unimodal vector, else NA if (method == "mode" | is.na(method) | method == "") {return(ifelse(length(frequencies[frequencies==max(frequencies)])>1,NA,u[which.max(frequencies)]))} #number of modes if(method == "nmode" | method == "nmodes") {return(length(frequencies[frequencies==max(frequencies)]))} #list of all modes if (method == "modes" | method == "modevalues") {return(u[which(frequencies==max(frequencies), arr.ind = FALSE, useNames = FALSE)])} #error trap the method warning("Warning: method not recognised. Valid methods are 'mode' [default], 'nmodes' and 'modes'") return() }

这里另一个解决scheme

 freq <- tapply(mySamples,mySamples,length) #or freq <- table(mySamples) as.numeric(names(freq)[which.max(freq)])

我不能投票，但RasmusBå？th的回答是我正在寻找的。但是，我会修改一下，允许限制分配，例如从0到1之间的值。

 estimate_mode <- function(x,from=min(x), to=max(x)) { d <- density(x, from=from, to=to) d$x[which.max(d$y)] }

我们意识到，您可能不想限制所有分配，然后从= – “大数”设置为=“大数”

我写了下面的代码为了生成模式。

 MODE <- function(dataframe){ DF <- as.data.frame(dataframe) MODE2 <- function(x){ if (is.numeric(x) == FALSE){ df <- as.data.frame(table(x)) df <- df[order(df$Freq), ] m <- max(df$Freq) MODE1 <- as.vector(as.character(subset(df, Freq == m)[, 1])) if (sum(df$Freq)/length(df$Freq)==1){ warning("No Mode: Frequency of all values is 1", call. = FALSE) }else{ return(MODE1) } }else{ df <- as.data.frame(table(x)) df <- df[order(df$Freq), ] m <- max(df$Freq) MODE1 <- as.vector(as.numeric(as.character(subset(df, Freq == m)[, 1]))) if (sum(df$Freq)/length(df$Freq)==1){ warning("No Mode: Frequency of all values is 1", call. = FALSE) }else{ return(MODE1) } } } return(as.vector(lapply(DF, MODE2))) }

让我们试试看：

 MODE(mtcars) MODE(CO2) MODE(ToothGrowth) MODE(InsectSprays)

基于@ Chris的函数来计算模式或相关指标，然而使用Ken Williams的方法来计算频率。这一个提供了一个解决scheme的情况下根本没有模式（所有元素同样频繁），和一些更可读的method名称。

 Mode <- function(x, method = "one", na.rm = FALSE) { x <- unlist(x) if (na.rm) { x <- x[!is.na(x)] } # Get unique values ux <- unique(x) n <- length(ux) # Get frequencies of all unique values frequencies <- tabulate(match(x, ux)) modes <- frequencies == max(frequencies) # Determine number of modes nmodes <- sum(modes) nmodes <- ifelse(nmodes==n, 0L, nmodes) if (method %in% c("one", "mode", "") | is.na(method)) { # Return NA if not exactly one mode, else return the mode if (nmodes != 1) { return(NA) } else { return(ux[which(modes)]) } } else if (method %in% c("n", "nmodes")) { # Return the number of modes return(nmodes) } else if (method %in% c("all", "modes")) { # Return NA if no modes exist, else return all modes if (nmodes > 0) { return(ux[which(modes)]) } else { return(NA) } } warning("Warning: method not recognised. Valid methods are 'one'/'mode' [default], 'n'/'nmodes' and 'all'/'modes'") }

由于它使用Ken的方法来计算频率，所以性能也得到了优化，使用AkselA的post我基于以前的一些答案来说明我的函数如何接近Ken的性能，而各种输出选项的条件只会造成很小的开销：模式功能的比较

这个黑客应该可以正常工作。给你的价值以及模式的数量：

 Mode <- function(x){ a = table(x) # x is a vector return(a[which.max(a)]) }

R有这么多附加软件包，其中一些可能会提供数字列表/系列/vector的[统计]模式。

但是R本身的标准库似乎没有这样的内置方法！解决这个问题的一种方法是使用如下的构造（如果经常使用，则将其转换为函数）：

 mySamples <- c(19, 4, 5, 7, 29, 19, 29, 13, 25, 19) tabSmpl<-tabulate(mySamples) SmplMode<-which(tabSmpl== max(tabSmpl)) if(sum(tabSmpl == max(tabSmpl))>1) SmplMode<-NA > SmplMode [1] 19

对于更大的样本列表，应该考虑使用max（tabSmpl）值的临时variables（我不知道R会自动优化这个值）

参考：请参阅“中位数和模式怎么样？” 在这个KickStarting R课程中
这似乎证实了（至less在本课写作时）R（well … mode（）中没有模式函数，因为您发现它用于声明variables的types）。

这里是find模式的function：

 mode <- function(x) { unique_val <- unique(x) counts <- vector() for (i in 1:length(unique_val)) { counts[i] <- length(which(x==unique_val[i])) } position <- c(which(counts==max(counts))) if (mean(counts)==max(counts)) mode_x <- 'Mode does not exist' else mode_x <- unique_val[position] return(mode_x) }

这工作很好

 > a<-c(1,1,2,2,3,3,4,4,5) > names(table(a))[table(a)==max(table(a))]

虽然我喜欢肯威廉姆斯简单的function，我想检索多种模式，如果他们存在。考虑到这一点，我使用下面的函数返回多个或单个模式的列表。

 rmode <- function(x) { x <- sort(x) u <- unique(x) y <- lapply(u, function(y) length(x[x==y])) u[which( unlist(y) == max(unlist(y)) )] }

我正在浏览所有这些选项，并开始怀疑它们的相对特征和性能，所以我做了一些testing。如果有其他人对此感到好奇，我会在这里分享我的结果。

我不想打扰在这里发布的所有function，我select了基于几个标准的示例：该函数应该在字符，因子，逻辑和数字向量上工作，它应该适当地处理NA和其他有问题的值，而且输出应该是“明智的”，即没有数字或字符或其他这样的愚蠢。

我还增加了一个我自己的function，这个function基于与chrispy相同的思想，除了适用于更一般的用途：

 library(magrittr) Aksel <- function(x, freq=FALSE) { z <- 2 if (freq) z <- 1:2 run <- x %>% as.vector %>% sort %>% rle %>% unclass %>% data.frame colnames(run) <- c("freq", "value") run[which(run$freq==max(run$freq)), z] %>% as.vector } set.seed(2) F <- sample(c("yes", "no", "maybe", NA), 10, replace=TRUE) %>% factor Aksel(F) # [1] maybe yes C <- sample(c("Steve", "Jane", "Jonas", "Petra"), 20, replace=TRUE) Aksel(C, freq=TRUE) # freq value # 7 Steve

我通过microbenchmark结束了两套testing数据的五个function。函数名称是指它们各自的作者：

在这里输入图像描述

Chris的function被设置为method="modes" ，默认情况下na.rm=TRUE使之更具可比性，但na.rm=TRUE ，其作者使用的function除外。

在速度问题上Kens版本得手，但也是唯一一个只报告一种模式的方法，不pipe有多less。通常情况下，速度和多function性之间有一个折衷。在method="mode" ，如果有一个模式，那么Chris的版本将返回一个值，否则NA。我认为这是一个很好的接触。我还认为有趣的是，一些function如何受到独特价值的增加的影响，而另外一些function则几乎没有。我没有详细研究代码，以便弄清楚为什么除了消除作为原因的逻辑/数字之外。

另一个简单的选项是按照频率sorting的所有值都是使用rle ：

 df = as.data.frame(unclass(rle(sort(mySamples)))) df = df[order(-df$lengths),] head(df)

我将使用density（）函数来确定（可能是连续的）分布的平滑最大值：

 function(x) density(x, 2)$x[density(x, 2)$y == max(density(x, 2)$y)]

其中x是数据收集。注意调节平滑的密度函数的调整参数。

另一个可能的解

 Mode <- function(x) { if (is.numeric(x)) { x_table <- table(x) return(as.numeric(names(x_table)[which.max(x_table)])) } }

用法：

 set.seed(100) v <- sample(x = 1:100, size = 1000000, replace = TRUE) system.time(Mode(v))

输出：

  user system elapsed 0.32 0.00 0.31

对Ken Williams的答案做一个小小的修改，添加可选参数na.rm和return_multiple 。

与依赖于names()的答案不同，这个答案在返回的值中维护x的数据types。

 stat_mode <- function(x, return_multiple = TRUE, na.rm = FALSE) { if(na.rm){ x <- na.omit(x) } ux <- unique(x) freq <- tabulate(match(x, ux)) mode_loc <- if(return_multiple) which(freq==max(freq)) else which.max(freq) return(ux[mode_loc]) }

要显示它与可选参数一起使用并保持数据types：

 foo <- c(2L, 2L, 3L, 4L, 4L, 5L, NA, NA) bar <- c('mouse','mouse','dog','cat','cat','bird',NA,NA) str(stat_mode(foo)) # int [1:3] 2 4 NA str(stat_mode(bar)) # chr [1:3] "mouse" "cat" NA str(stat_mode(bar, na.rm=T)) # chr [1:2] "mouse" "cat" str(stat_mode(bar, return_mult=F, na.rm=T)) # chr "mouse"

感谢@Frank的简化。

对不起，我可能太简单了，但这不是做这个工作吗？（在我的机器上1E6的值在1.3秒内）：

 t0 <- Sys.time() summary(as.factor(round(rnorm(1e6), 2)))[1] Sys.time()-t0

你只需要用你的向量来replace“round（rnorm（1e6），2）”。

你也可以计算一个实例在你的集合中发生的次数，并find最大数量。例如

 > temp <- table(as.vector(x)) > names (temp)[temp==max(temp)] [1] "1" > as.data.frame(table(x)) r5050 Freq 1 0 13 2 1 15 3 2 6 >

可以试试以下function：

将数值转换为因子
使用summary（）获取频率表
返回模式是频率最大的索引
即使有超过1个模式，也可以将数字转换回数字，这个function运作良好！

 mode <- function(x){ y <- as.factor(x) freq <- summary(y) mode <- names(freq)[freq[names(freq)] == max(freq)] as.numeric(mode) }

计算模式主要是在因素variables的情况下，那么我们可以使用

 labels(table(HouseVotes84$V1)[as.numeric(labels(max(table(HouseVotes84$V1))))])

HouseVotes84是“mlbench”包中的数据集。

它会给最大的标签值。无需编写function，内置function本身更易于使用。

下面是可用于在R中查找向量variables模式的代码。

 a <- table([vector]) names(a[a==max(a)])

计算包含离散值的vector“v”的MODE的简单方法是：

 names(sort(table(v)))[length(sort(table(v)))]

是否有一个内置的function来查找模式？

最简单的工具来衡量C程序caching命中/未命中和CPU时间在Linux？

一维观测数据中exception值检测的Pythonic方法

scipy.stats中的所有可用分布是什么样的？

C / C ++中的累积正态分布函数

添加误差线以在R中的图上显示标准偏差

PHPalgorithm从一个集合中生成一个特定大小的所有组合

斯卡拉有一个好的math/统计库吗？

几何平均值：是否有内置？

用给定的均值计算正态分布的概率，标准偏差为Python

用等高线贴图绘制三维曲面图