计算十亿个数字的中位数

如果你有十亿个数字和一百台电脑,找出这些数字中位数的最好方法是什么?

我拥有的一个解决scheme是:

  • 在电脑之间平均分配一套。
  • sorting他们。
  • find每个集合的中位数。
  • 对中位数进行sorting。
  • 从最低位到最高位中间一次合并两组。

如果我们有m1 < m2 < m3 ...那么首先合并Set1Set2并在结果集合中,我们可以丢弃所有低于Set12 (合并)的中值的Set12 。 所以在任何时候我们都有相同的尺寸。 顺便说一下,这不能以平行的方式完成。 有任何想法吗?

 sort -g numbers | head -n 500000001 | tail -n 2 | dc -e "1 k ? ? + 2 / p" 

啊,我的大脑刚刚起步,现在我有一个明智的build议。 如果这是一个采访,可能太晚了,但不要介意:

机器1将被称为“控制机器”,为了争论起见它要么从所有的数据开始,并将它们以相同的包裹发送到其他99台机器,否则数据开始在机器之间均匀分配,并且它将1/99的数据发送给其他每个数据。 分区不必相同,只需closures即可。

每个其他机器对其数据进行sorting,并且以这种方式首先寻找较低的值。 所以例如一个快速sorting,总是首先sorting分区的下半部分[*]。 它将数据写回到控制机器,尽可能快(使用asynchronousIO以继续sorting,并可能与Nagle开始:试验一下)。

控制机器在数据到达时对数据进行99路合并,但丢弃合并的数据,只保留已经看到的数值。 它计算的中位数为第二十亿和第二十五亿加上的平均值。

这受到了“牛群中最慢”的问题。 直到sorting机器发送的每个小于中值的值都不能完成。 有一个合理的机会,其中一个这样的价值将是相当高的数据包。 因此,一旦数据的初始划分完成,估计的运行时间是sorting1/99数据的时间的组合并且将其发送回控制计算机,并且控制读取1/2数据的时间。 “组合”介于最大值和这些时间之和之间,可能接近最大值。

我的直觉是,通过networking发送数据比sorting更快(更不用说只是select中位数),它需要是一个非常快的networking。 如果networking可以被假定是瞬时的,那么可能会是一个更好的前景,例如,如果有100个内核可以访问包含数据的RAM。

由于networkingI / O很可能是受限制的,所以可能会有一些技巧可以玩,至less是将数据传回控制机器。 例如,不是发送“1,2,3,… 100”,或许分拣机可以发送“100个值小于101”的消息。 然后控制机器可以执行一个修改合并,在这个合并中,它find所有那些范围最高的值中的最小值,然后告诉所有的分类机器它是什么,以便他们可以(a)告诉控制机器如何许多值要“低于”该值,并(b)从该点恢复发送它们的sorting数据。

更一般地说,控制机器可以使用99个分选机器来玩一个聪明的挑战 – 反应猜测游戏。

这涉及到机器之间的往返行程,尽pipe这是我的简单的第一个版本所避免的。 我真的不知道如何盲目评估他们的相对performance,而且由于取舍是复杂的,所以我认为有更好的解决scheme比我想象的要好得多,假设这是一个真正的问题。

[*]可用的堆栈许可 – 如果您没有O(N)多余的空间,您首先要做的部分的select受到限制。 但是如果你有足够的空间,你可以select,如果你没有足够的空间,你至less可以使用你必须削减的一些angular落,通过做第一个小分区的第一个小部分。

我讨厌在这里成为逆向投资者,但是我不认为需要分类,我认为任何涉及分类十亿分之一百的数字的algorithm都会变得很慢。 让我们考虑一台计算机上的algorithm。

1)从亿元中随机选取1000个数值,用它们来了解数字的分布,特别是一个范围。

2)而不是sorting值,根据您刚刚计算的分配将它们分配给桶。 select桶的数量,以便计算机可以有效地处理它们,否则应该尽可能大。 这个桶的范围应该是这样的,每个桶里有大致相同数量的值(这对algorithm来说并不重要,但是它有助于提高效率,100,000桶可能是合适的)。 请注意每个存储桶中的值的数量。 这是一个O(n)过程。

3)找出哪个中间位置的铲斗范围。 这可以通过简单地检查每个桶中的总数来完成。

4)通过检查该桶中的值来查找实际的中位数。 如果你喜欢,你可以在这里使用一种sorting,因为你只sorting了大概10,000个数字。 如果该桶中的值数量很大,则可以再次使用该algorithm,直到您有足够的数量进行sorting。

这种方法通过将计算机之间的值分开来并行化。 每台计算机将每个桶中的总数报告给执行步骤3的“控制”计算机。对于步骤4,每台计算机将相关桶中的(sorting的)值发送到控制计算机(也可以同时执行这两种algorithm,但它可能不值得)。

总的过程是O(n),因为步骤3和步骤4都是微不足道的,只要桶的数量足够大。

像中位数和第99百分位的顺序统计量的估计可以用诸如t-消化或Q-消化的algorithm来有效地分布。

使用任一algorithm,每个节点产生一个摘要,表示本地存储的值的分布。 摘要被收集在一个单一的节点,合并(有效地加总分布),然后可以查找中位数或任何其他百分位数。

这种方法被elasticsearch使用,大概是BigQuery (通过QUANTILES函数的描述)。

对于现代计算机来说,十亿实际上是一件无聊的事情。 我们在这里谈论的是4 GB的4字节整数… 4 GB …这是一些智能手机的RAM。

 public class Median { public static void main(String[] args) { long start = System.currentTimeMillis(); int[] numbers = new int[1_000_000_000]; System.out.println("created array after " + (System.currentTimeMillis() - start) + " ms"); Random rand = new Random(); for (int i = 0; i < numbers.length; i++) { numbers[i] = rand.nextInt(); } System.out.println("initialized array after " + (System.currentTimeMillis() - start) + " ms"); Arrays.sort(numbers); System.out.println("sorted array after " + (System.currentTimeMillis() - start) + " ms"); if (numbers.length % 2 == 1) { System.out.println("median = " + numbers[numbers.length / 2 - 1]); } else { int m1 = numbers[numbers.length / 2 - 1]; int m2 = numbers[numbers.length / 2]; double m = ((long) m1 + m2) / 2.0; System.out.println("median = " + new DecimalFormat("#.#").format(m)); } } 

在我的机器上输出:

 created array after 518 ms initialized array after 10177 ms sorted array after 102936 ms median = 19196 

所以这个在我的机器上用不到两分钟的时间就完成了(1:43其中的0:10是产生随机数字),它甚至可以做一个完整的分类。 没有真正的幻想。

对于更大的数字来说,这肯定是一个有趣的任务。 我只想在这里指出一点:花生是十亿分之一。 因此,在开始投入复杂的解决scheme之前,请仔细考虑,这些工作非常简单;)

这组数字的中位数

2,3,5,7,11,13,67,71,73,79,83,89,97

是67。

这组数字的中位数

2,3,5,7,11,13,67,71,73,79,83,89

是40。

假设问题是大约10亿个整数(x),其中0> =x≤2,147,483,647,并且OP正在寻找(元素(499,999,999)+元素(500,000,000))/ 2(如果数字被sorting)。 另外假设所有的100台电脑都是平等的。

使用我的笔记本电脑和GigE …

我发现我的笔记本电脑可以在1.3秒内对10,000,000个Int32进行sorting。 所以粗略的估计是,十亿分类需要100×1.3秒(2分10秒);)。

在千兆以太网上估计一个40MB文件的单向文件传输是0.32秒。 这意味着所有计算机的sorting结果将在大约32秒内返回(计算机99在开始之后30秒内没有得到他的文件)。 从那里不应该花很长的时间来丢弃最低的499,999,998号码,加上下一个2除以2。

奇怪的是,我认为如果你有足够的计算机,你最好是使用O(n)中值查找algorithm。 (除非你的内核速度非常非常慢,否则我只用一个,只用一个O(n)中间searchalgorithm来处理1e9的数字;如果你有1e12的话,这可能不太实际)。

无论如何,让我们假设我们有超过n个内核来处理这个问题,我们不关心功耗,只是快速得到答案。 让我们进一步假设这是一个SMP机器,所有的数据已经加载到内存中。 (例如,Sun的32核心机器是这种types的。)

一个线程将列表盲目地切成等份,并告诉其他M个线程对它们进行sorting。 那些线程努力这样做,在(n/M) log (n/M)时间。 然后,他们不仅返回他们的中位数,而且还返回他们的第25和第75百分位数(如果select稍微不同的数字,反常的最差情况会更好)。 现在你有4M范围的数据。 然后,您将这些范围进行sorting,并在列表中向上进行search,直到find一个数字,例如,如果您丢弃每个小于或包含数字的范围,则将丢失一半数据。 这是你的中位数的下限。 做同样的上限。 这需要M log M时间,所有内核都必须等待,所以它真的浪费了M^2 log M时间。 现在你已经有了单线程告诉其他人抛出范围之外的所有数据(你应该在每次通过时抛出大约一半)并重复 – 这是一个快速的操作,因为数据已经被sorting了。 在获取剩余数据并使用标准的O(n)中值search器之前,您不应该重复这一步,而是更快地log(n/M)次。

所以,总的复杂度就像O((n/M) log (n/M) + M^2 log M log (n/M)) 。 因此,如果M >> log(n/M)M^3 log M < n ,则这比在一个核心上的O(n)中值sorting更快,这对于你描述的情况是正确的。

我认为这是一个非常糟糕的主意,因为它效率低下,但速度更快。

这可能会让人感到惊讶,但是如果数字是小于32位(或更小)的整数,只需做一个sorting! 对于任何数量的32位整数,只需要16GB的RAM,并且在O(n)中运行,这应该比任何分布式系统在合理n,例如十亿上跑赢。

一旦你有了sorting的名单,挑出中位数是微不足道的。 实际上,你不需要构buildsorting列表,只需要看桶就可以了。

下面显示了一个简单的实现。 只适用于16位整数,但扩展到32位应该很容易。

 #include <stdio.h> #include <string.h> int main() { unsigned short buckets[65536]; int input, n=0, count=0, i; // calculate buckets memset(buckets, 0, sizeof(buckets)); while (scanf("%d", &input) != EOF) { buckets[input & 0xffff]++; n++; } // find median while (count <= n/2) { count += buckets[i++]; } printf("median: %d\n", i-1); return 0; } 

使用一个十亿(10 9 )数字的文本文件,并像这样运行

 time ./median < billion 

在我的机器上产生一个运行时间1m49.293s。 大部分的运行时间可能是磁盘IO。

一台电脑足以解决问题。

但是,我们假设有100台电脑。 你应该做的唯一复杂的事情是sorting列表。 将其分割为100个部分,将一部分发送到每台计算机,然后在那里进行sorting,然后合并部分。

然后从sorting列表的中间(即索引5 000 000 000)取数。

这取决于你的数据。 最坏的情况是它是统一分布的数字。

在这种情况下,你可以在这个例子中findO(N)时间的中位数:

假设你的号码是2,7,5,10,1,6,4,4,6,10,4,7,1,8,4,9,9,3,4,3(范围是1-10) 。

我们创造3个桶:1-3,4-7,8-10。 请注意,顶部和底部的尺寸相同。

我们用数字填充桶,计算每个中的最大和最小的下降量

  • 低(5):2,1,1,3,3,最小1,最大3
  • 中间(10):7,5,6,4,4,6,4,7,4,4,分4,最多7
  • 高(5):10,10,8,9,9,最小8,最大10

平均水平落在中间水桶,我们不理会其余的

我们创build了3个桶:4,5到6,7。低将从5开始计数,最大值为3,最大值为8,计数为5。

对于每个数字,我们计算在低和高的桶,最大和最小的多less下降,并保持中间桶。

  • 旧的低(5)
  • 低(5):4,4,4,4,最多4
  • 中间(3):5,6,6
  • 高(2):7,7,分钟7
  • 旧的高(5)

现在我们可以直接计算中位数:我们有这样的情况

 old low low middle high old high xxxxx 4 4 4 4 4 4 5 6 6 7 7 xxxxx 

所以中位数是4.5。

假设你对分布有一些了解,你可以微调如何定义范围来优化速度。 在任何情况下,性能都应该与O(N)一致,因为1 + 1/3 + 1/9 … = 1.5

由于边缘情况,您需要最小值和最大值(例如,如果中值是旧的最大值和下一个元素之间的平均值)。

所有这些操作都可以并行化,每台计算机可以给出1/100的数据,并计算每个节点的3个桶,然后分配你保留的桶。 这又使你有效地使用networking,因为每个数字平均被传递1.5次(所以O(N))。 如果只传递节点之间的最小数字(例如,如果节点1有100个数字,节点2有150个数字,则节点2可以给节点1 25个数字),甚至可以击败该节点。

除非你对分布有更多的了解,否则我怀疑你在这里可能比O(N)做得更好,因为实际上你至less需要对这些元素进行一次计数。

拆分10 ^ 9个数字,每个计算机10 ^ 7〜80MB。 每台计算机对其号码进行sorting。 然后,计算机1将它自己的数字与计算机2,计算机3和4等的数字合并。然后,计算机1将数字的一半写回到2,3到4等等。然后,1合并对来自计算机的数字进行sorting1,2,3,4,写回来。 等等。 根据计算机上RAM的大小,您可能不会将每个步骤中的所有数字都写回个人计算机,您可能可以在计算机1上累积数字,但需要进行math计算。

哦,终于得到了5000万和50000000的价值(但检查有足够的00s在那里,我没有)。

编辑:@罗曼 – 好吧,如果你不能相信它,那么这是真的,那么我揭示这个命题的真相或虚假是没有意义的。 我的意思是,蛮力在比赛中有时会比较聪明。 我花了大约15秒来devise一个algorithm,我相信我可以实现这个algorithm,这个algorithm可以工作,并且可以适应各种规模的input和计算机数量,并且可以根据计算机的特性进行调整。networking安排。 如果你或者其他人用15分钟的时间来devise一个更复杂的algorithm,那么我有14米45的优势来编写我的解决scheme并开始运行。

但我自己承认这是所有的说法,我没有测量任何东西。

我认为Steve Jessop的答案将是最快的。

如果networking数据传输的大小是瓶颈,这是另一种方法。

 Divide the numbers into 100 computers (10 MB each). Loop until we have one element in each list Find the meadian in each of them with quickselect which is O(N) and we are processing in parallel. The lists will be partitioned at the end wrt median. Send the medians to a central computer and find the median of medians. Then send the median back to each computer. For each computer, if the overall median that we just computed is smaller than its median, continue in the lower part of the list (it is already partitioned), and if larger in the upper part. When we have one number in each list, send them to the central computer and find and return the median. 

这可以比algorithm投票(n log n)

– 订单统计分布式selectalgorithm – O(n)
将问题简化为在未sorting数组中find第k个数字的原始问题。
– 计数sorting直方图O(n)
你必须假设一些有关数字范围的属性 – 范围能适应内存吗? – 外部合并sorting – O(n log n) – 如上所述
你基本上sorting第一遍的数字,然后find第二的中位数。
– 如果知道数字的分布,可以生成其他algorithm。

有关更多细节和实现,请参阅:
http://www.fusu.us/2013/07/median-in-large-set-across-1000-servers.html

这可以在节点上使用不按节点sorting的数据(例如从日志文件中)以下列方式完成。

有1个父节点和99个子节点。 子节点有两个API调用:

  • stats():返回最小值,最大值和计数
  • compare(median_guess):返回计数匹配值,计数小于值并计数大于值

父节点在所有子节点上调用stats(),注意所有节点的最小值和最大值。

二进制search现在可以按以下方式进行:

  1. 平分最小和最大舍入 – 这是中位数“猜测”
  2. 如果大于计数大于小于计数,则将最小值设置为猜测
  3. 如果大于小于小于计数,则将最大值设置为猜测
  4. 如果计数是奇数,当最小和最大值相等时
  5. 如果计数甚至在最大值<=最小值+ guess.match_count时完成,则可以在使用未分类数据的节点(比如从日志文件中)按以下方式完成。

有1个父节点和99个子节点。 子节点有两个API调用:

  • stats():返回最小值,最大值和计数
  • compare(median_guess):返回计数匹配值,计数小于值并计数大于值

The parent node calls stats() on all child nodes, noting the minimum and maximum of all nodes.

A binary search may now be conducted in the following way:

  1. Bisect the minimum and maximum rounding down – this is the median 'guess'
  2. If the greater than count is more than the less than count, set the minimum to the guess
  3. If the greater than count is less than the less than count, set the maximum to the guess
  4. If count is odd finish when minimum and maximum are equal
  5. If count is even finish when maximum <= minimum + guess.match_count

If the stats() and compare() could be pre-calculated with a O(N/Mlogn/M) sort, then a O(N/M) pre-calculation with a memory complexity of O(N) for the pre-calculation. Then you could do compare() in constant time, so the whole thing (including pre-calculation) would run in O(N/MlogN/M)+O(logN)

Let me know if I have made a mistake!

An easier method is to have weighted numbers.

  • Split the large set among computers
  • Sort each set
  • iterate through the small-set, and calculate weights to repeated elements
  • merge each 2 sets into 1 (each is sorted already) updating weights
  • keep merging sets until you get only one set
  • iterate through this set accumulating weights until you reach OneBillion/2

How about this:- each node can take 1Billion/100 numbers. At each node the elements can be sorted and median can be found. Find the median of medians. we can, by aggregating the counts of numbers less than median-of-median on all nodes find out x%:y% split which the median-of-medians makes. Now ask all nodes to delete elements less than the median of medians( taking example of 30%:70% split).30% numbers are deleted. 70% of 1Billion is 700million. Now all nodes which deleted less than 3million nodes can send those extra nodes back to a main computer. The main computer redistributes in such a way that now all nodes will have almost equal number of nodes(7million). Now that the problem is reduced to 700million numbers…. goes on until we have a smaller set which can be computed on one comp.

Let's first work out how to find a median of n numbers on a single machine: I am basically using partitioning strategy.

Problem :selection(n,n/2) : Find n/2 th number from least number.

You pick say middle element k and partition data into 2 sub arrays. the 1st contains all elements < k and 2nd contains all elements >= k.

if sizeof(1st sub-array) >= n/2, you know that this sub-array contains the median. You can then throw-off the 2nd sub-array. Solve this problem selection(sizeof 1st sub-array,n/2) .

In else case, throw off this 1st subarray and solve selection(2nd subarray , n/2 – sizeof(1st subarray))

Do it recursively.

time complexity is O(n) expected time.

Now if we have many machines, in each iteration, we have to process an array to split, we distribute the array into diff machines. Each machine processes their chunk of array and sends back the summary to hub controlling machine ie size of 1st subarray and size of 2nd subarray. The hub machines adds up summaries and decide which subarray (1st or 2nd) to process further and 2nd parameter of selection and sends it back to each machine. 等等。

This algorithm can be implemented very neatly using map reduce?

How does it look?

我会这样做:

in the beginning all 100 work to find the highest and the lowest number; each of the computer has his part of the database/file which it queries;

when the highest and lowest numbers are found, one computer reads the data, and distributes each number, evenly, to the rest of the 99; the numbers are distributed by equal intervals; (one may take from -100 million to 0, another – from 0 to 100 million, etc);

While receiving numbers, each of the 99 of the computers already sorts them;

Then, it's easy to find the median… See how many numbers has each computer, add all of them (the sum of how many numbers there are, not the numbers themselves), divide by 2; calculate in which computer is the number, and at which index;

🙂 voilla

PS Seems there's a lot of confusion here; the MEDIAN – is the NUMBER IN THE MIDDLE OF A SORTED LIST OF NUMBERS!

You can use the tournament tree method for finding the median. We can create a tree with 1000 leave nodes such that each leaf node is an array. We then conduct n/2 tournaments between the different arrays.The value on the root after the n/2 tournaments is the result.

http://www.geeksforgeeks.org/tournament-tree-and-binary-heap/

If the numbers are not distinct, and only belong to a certain range, that is they are repeated, then a simple solution that comes to my mind is to distribute the numbers among 99 machines equally, and keep one machine as the master. Now every machine iterates over its given numbers, and stores the count of each number in a hash set. Each time the number gets repeated in the set of numbers allotted to that particular computer, it updates its count in the hash set.

All the machines then return their hash set to the master machine. The master machine combines the hash sets, summing the count of the same key found in a hash set. For example machine#1's hash set had an entry of ("1",7), and machine#2's hash set had an entry of ("1",9), so the master machine when combing the hash sets makes an entry of ("1", 16), and so on.

Once the hash sets have been merged, then just sort the keys, and now you can easily find the (n/2)th item and the (n+2/2)th item, from the sorted hash set.

This method won't be beneficial if the billion numbers are distinct.

Well, suppose you know that the number of distinct integers is (say) 4 billion, then you can bucket them into 64k buckets and get a distributed count for each bucket from each machine in the cluster(100 computers). Combine all these counts. Now, find the bucket which has the median, and this time only ask for buckets for the 64k elements that would lie in your target bucket. This requires O(1) (specifically 2) queries over your "cluster". :d

My penny worth, after all that has already been brought up by others:

Finding the median on a single machine is O(N): https://en.wikipedia.org/wiki/Selection_algorithm .

Sending N numbers to 100 machines is also O(N). So, in order to make using 100 machines interesting, either the communication must be relatively fast, or N is so large that a single machine cannot handle it while N/100 is doable, or we just want to consider the mathematical problem without bothering about datacommunication.

To cut things short I'll assume therefore that, within reasonable limits, we can send/distribute the numbers without affecting the efficiency analysis.

Consider then the following approach, where one machine is assigned to be the "master" for some general processing. This will be comparatively fast, so the "master" also participates in the common tasks that each machine performs.

  1. Each machine receives N/100 of the numbers, computes its own median and sends that information to the master.
  2. The master compiles a sorted list of all distinct medians and sends that back to each machine, defining an ordered sequence of buckets (on each machine the same), one for each median value (a single-value bucket) and one for each interval between adjacent medians. Of course there are also the lower-end and higher-end buckets for values below the lowest median and above the hightest.
  3. Each machine computes how many numbers fall in each bucket and communicates that information back to the master.
  4. The master determines which bucket contains the median, how many lower values (in total) fall below that bucket, and how many above.
  5. If the selected bucket is a single-value bucket (one of the medians) orelse the selected bucket contains only 1 (N odd) or 2 (N even) values we're done. Otherwise we repeat the steps above with the following (obvious) modifications:
  6. Only the numbers from the selected bucket are (re)distributed from the master to the 100 machines, and moreover
  7. We're not going to compute (on each machine) the median, but the k-th value, where we take into account how many higher numbers have been discarded from the total, and how many lower numbers. Conceptually each machine has also its share of the discarded low/high numbers and takes that into account when computing the new median in the set that (conceptually) includes (its share of) the discarded numbers.

Time-complexity:

  1. A little thinking will convince you that on each step the total number of values to analyse is reduced by a factor at least two (2 would be a rather sick case; you may expect a significantly better reduction). From this we get:
  2. Assuming that finding the median (or k-th value), which is O(N), takes c*N time where the prefactor c does not vary too wildly with N so that we can take it as a constant for the moment, we'll get our final result in at most 2*c*N/100 time. Using 100 machines gives us, therefore, a speedup factor of 100/2 (at least).
  3. As remarked initially: the time involved in communicating the numbers between the machines may make it more attractive to simply do everything on one machine. However, IF we go for the distributed approach, the total count of numbers to be communicated in all steps together will not exceed 2*N (N for the first time, <=N/2 the second time, <= half of that the third, and so on).
  1. Divide the 1 billion numbers into 100 machines. Each machine will have 10^7 numbers.

  2. For each incoming number to a machine, store the number in a frequency map, number -> count. Also store the min number in each machine.

  3. Find median in each machine: starting from min number in each machine, sum the counts until median index is reached. The median in each machine, will be the approx. lesser and greater than 5*10^6 numbers.

  4. Find median of all medians, which will be lesser and greater than approx. 50*10^7 numbers, which is the median of 1 billion numbers.

Now some optimization of 2nd step: Instead of storing in a frequency map, store the counts in a variable bit array. For example: Lets say starting from min number in a machine, these are frequency counts:

 [min number] - 8 count [min+1 number] - 7 count [min+2 number] - 5 count 

The above can be stored in bit array as:

 [min number] - 10000000 [min+1 number] - 1000000 [min+2 number] - 10000 

Note that altogether it will cost about 10^7 bits for each machine, since each machine only handles 10^7 numbers. 10^7bits = 1.25*10^6 bytes, which is 1.25MB

So with the above approach each machine will need 1.25MB of space to compute local median. And median of medians can be computed from those 100 local medians, resulting in median of 1 billion numbers.

I suggest a method to calculate approximately the Median. 🙂 If these one billion numbers are in a randomly order, I think I can pick 1/100 or 1/10 of one billion number randomly, sort them with 100 machine, then pick the median of them. Or let's split billion numbers in 100 parts, let each machine pick 1/10 of each part randomly, calculate the median of them. After that we have 100 numbers and we can calculate the median of the 100 number easier. Just a suggestion, I'm not sure if it's mathematically correct. But I think you can show the result to a not-so-good-at-math manager.

Steve Jessop's answer is wrong:

consider the following four groups:

{2, 4, 6, 8, 10}

{21, 21, 24, 26, 28}

{12, 14, 30, 32, 34}

{16, 18, 36, 38, 40}

The median is 21, which is contained in the second group.

The median of the four groups are 6, 24, 30, 36, The total median is 27.

So after the first loop, the four groups will become:

{6, 8, 10}

{24, 26, 28}

{12, 14, 30}

{16, 18, 36}

The 21 is already wrongly discarded.

This algorithm only support the case when there are two groups.