在1 MB RAM中对100万个8位数字进行sorting

我有一个1 MB的RAM的计算机,没有其他本地存储。 我必须用它通过TCP连接接受一百万个8位十进制数,对它们进行sorting,然后通过另一个TCP连接发送sorting列表。

数字列表可能包含重复,我不能放弃。 代码将被放置在ROM中,所以我不需要从1 MB中减去我的代码的大小。 我已经有了驱动以太网端口和处理TCP / IP连接的代码,其状态数据需要2 KB,包括一个1 KB的缓冲区,通过这个缓冲区,代码将读取和写入数据。 有没有解决这个问题?

问题和答案的来源:
slashdot.org

cleaton.net

这里有一些正在工作的C ++代码可以解决这个问题。

certificate内存约束得到满足:

编辑:作者无论是在这篇文章还是在他的博客中都没有提供最大内存需求的certificate。 由于编码值所需的位数取决于先前编码的值,因此这种certificate可能是不重要的。 作者指出,他可能遇到的最大编码大小是1011732 ,并任意select缓冲区大小1013000

 typedef unsigned int u32; namespace WorkArea { static const u32 circularSize = 253250; u32 circular[circularSize] = { 0 }; // consumes 1013000 bytes static const u32 stageSize = 8000; u32 stage[stageSize]; // consumes 32000 bytes ... 

总之,这两个arrays需要1045000个字节的存储空间。 剩下的variables和堆栈空间剩下1048576 – 1045000 – 2×1024 = 1528字节。

它至less在我的Xeon W3520上运行了23秒。 您可以使用以下Python脚本validation程序是否正常工作,假定程序名为sort1mb.exe

 from subprocess import * import random sequence = [random.randint(0, 99999999) for i in xrange(1000000)] sorter = Popen('sort1mb.exe', stdin=PIPE, stdout=PIPE) for value in sequence: sorter.stdin.write('%08d\n' % value) sorter.stdin.close() result = [int(line) for line in sorter.stdout] print('OK!' if result == sorted(sequence) else 'Error!') 

该algorithm的详细解释可以在以下系列文章中find:

  • 1MB分类解释
  • 算术编码和1MBsorting问题
  • 使用定点math的算术编码

有一个相当偷偷摸摸的技巧到目前为止还没有提到。 我们假设你没有额外的方式来存储数据,但这不是严格的。

解决您的问题的一个方法是执行以下可怕的事情,在任何情况下都不应该由任何人尝试:使用networkingstream量来存储数据。 不,我不是说NAS。

您可以按照以下方式只用几个字节的RAM对数字sorting:

  • 首先需要2个variables: COUNTERVALUE
  • 首先将所有寄存器设置为0 ;
  • 每收到一个整数I ,增加COUNTER并将VALUE设置为max(VALUE, I) ;
  • 然后发送一个ICMP回应请求数据包到数据集I到路由器。 擦除我并重复。
  • 每次收到返回的ICMP数据包时,只需提取整数并在另一个回应请求中再次将其发回。 这产生了大量的包含整数的前后剔除的ICMP请求。

一旦COUNTER达到10000001000000所有的值存储在不断的ICMP请求stream中,而VALUE现在包含最大整数。 select一些threshold T >> 1000000 。 将COUNTER设置为零。 每次收到一个ICMP数据包时,增加COUNTER并将所包含的整数I发送回另一个回应请求中,除非I=VALUE ,在这种情况下,将它传送到sorting整数的目的地。 一旦COUNTER=T ,将VALUE1 ,将COUNTER重置为零并重复。 一旦VALUE达到零,你应该已经按照从最大到最小的顺序传送所有整数到目的地,并且只使用大约47位的RAM作为两个持久variables(以及临时值所需的任何小数量)。

我知道这太可怕了,我知道可能有各种各样的实际问题,但是我认为这可能会让你们有些发笑,至less可怕的是你们。

请看第一个正确的答案或后面的算术编码答案 。 下面你可能会发现一些有趣的,但不是100%的防弹解决scheme。

这是一个非常有趣的任务,这是另一个解决scheme。 我希望有人会发现有用的结果(或至less有趣的)。

阶段1:初始数据结构,粗略压缩方法,基本结果

让我们做一些简单的math:我们有1M(1048576字节)的RAM最初可用来存储10 ^ 6 8位十进制数字。 [0; 99999999]。 所以要存储一个数字需要27位(假定使用无符号数字)。 因此,要存储一个原始stream〜3.5M的RAM将是需要的。 有人已经说过这似乎不太可行,但我认为如果投入“足够好”,任务就可以解决。 基本上,这个想法是压缩input数据的压缩因子0.29或更高,并以适当的方式进行sorting。

首先解决压缩问题。 有一些相关的testing已经可用:

http://www.theeggeadventure.com/wikimedia/index.php/Java_Data_Compression

“我进行了一个testing,使用各种forms的压缩来压缩一百万个连续的整数,结果如下:

 None 4000027 Deflate 2006803 Filtered 1391833 BZip2 427067 Lzma 255040 

看起来像LZMA( Lempel-Ziv-Markov链algorithm )是继续的好select。 我准备了一个简单的PoC,但是仍然有一些细节需要强调:

  1. 内存是有限的,所以想法是预设数字,并使用压缩桶(dynamic大小)作为临时存储
  2. 使用预分类数据更容易实现更好的压缩因子,因此每个桶都有一个静态缓冲区(缓冲区中的数字要在LZMA之前sorting)
  3. 每个桶都有一个特定的范围,所以最后的分类可以分别为每个桶完成
  4. 桶的大小可以正确设置,因此将有足够的内存来解压缩存储的数据,并对每个桶分别进行最后的sorting

内存中的排序

请注意,附加的代码是一个POC ,它不能作为最终的解决scheme,它只是展示了使用几个较小的缓冲区以一些最佳方式(可能压缩)存储预分类数字的想法。 不build议LZMA作为最终解决scheme。 它被用作为这个PoC引入压缩的最快的方式。

看下面的PoC代码(请注意它只是一个演示,编译它需要LZMA-Java ):

 public class MemorySortDemo { static final int NUM_COUNT = 1000000; static final int NUM_MAX = 100000000; static final int BUCKETS = 5; static final int DICT_SIZE = 16 * 1024; // LZMA dictionary size static final int BUCKET_SIZE = 1024; static final int BUFFER_SIZE = 10 * 1024; static final int BUCKET_RANGE = NUM_MAX / BUCKETS; static class Producer { private Random random = new Random(); public int produce() { return random.nextInt(NUM_MAX); } } static class Bucket { public int size, pointer; public int[] buffer = new int[BUFFER_SIZE]; public ByteArrayOutputStream tempOut = new ByteArrayOutputStream(); public DataOutputStream tempDataOut = new DataOutputStream(tempOut); public ByteArrayOutputStream compressedOut = new ByteArrayOutputStream(); public void submitBuffer() throws IOException { Arrays.sort(buffer, 0, pointer); for (int j = 0; j < pointer; j++) { tempDataOut.writeInt(buffer[j]); size++; } pointer = 0; } public void write(int value) throws IOException { if (isBufferFull()) { submitBuffer(); } buffer[pointer++] = value; } public boolean isBufferFull() { return pointer == BUFFER_SIZE; } public byte[] compressData() throws IOException { tempDataOut.close(); return compress(tempOut.toByteArray()); } private byte[] compress(byte[] input) throws IOException { final BufferedInputStream in = new BufferedInputStream(new ByteArrayInputStream(input)); final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(compressedOut)); final Encoder encoder = new Encoder(); encoder.setEndMarkerMode(true); encoder.setNumFastBytes(0x20); encoder.setDictionarySize(DICT_SIZE); encoder.setMatchFinder(Encoder.EMatchFinderTypeBT4); ByteArrayOutputStream encoderPrperties = new ByteArrayOutputStream(); encoder.writeCoderProperties(encoderPrperties); encoderPrperties.flush(); encoderPrperties.close(); encoder.code(in, out, -1, -1, null); out.flush(); out.close(); in.close(); return encoderPrperties.toByteArray(); } public int[] decompress(byte[] properties) throws IOException { InputStream in = new ByteArrayInputStream(compressedOut.toByteArray()); ByteArrayOutputStream data = new ByteArrayOutputStream(10 * 1024); BufferedOutputStream out = new BufferedOutputStream(data); Decoder decoder = new Decoder(); decoder.setDecoderProperties(properties); decoder.code(in, out, 4 * size); out.flush(); out.close(); in.close(); DataInputStream input = new DataInputStream(new ByteArrayInputStream(data.toByteArray())); int[] array = new int[size]; for (int k = 0; k < size; k++) { array[k] = input.readInt(); } return array; } } static class Sorter { private Bucket[] bucket = new Bucket[BUCKETS]; public void doSort(Producer p, Consumer c) throws IOException { for (int i = 0; i < bucket.length; i++) { // allocate buckets bucket[i] = new Bucket(); } for(int i=0; i< NUM_COUNT; i++) { // produce some data int value = p.produce(); int bucketId = value/BUCKET_RANGE; bucket[bucketId].write(value); c.register(value); } for (int i = 0; i < bucket.length; i++) { // submit non-empty buffers bucket[i].submitBuffer(); } byte[] compressProperties = null; for (int i = 0; i < bucket.length; i++) { // compress the data compressProperties = bucket[i].compressData(); } printStatistics(); for (int i = 0; i < bucket.length; i++) { // decode & sort buckets one by one int[] array = bucket[i].decompress(compressProperties); Arrays.sort(array); for(int v : array) { c.consume(v); } } c.finalCheck(); } public void printStatistics() { int size = 0; int sizeCompressed = 0; for (int i = 0; i < BUCKETS; i++) { int bucketSize = 4*bucket[i].size; size += bucketSize; sizeCompressed += bucket[i].compressedOut.size(); System.out.println(" bucket[" + i + "] contains: " + bucket[i].size + " numbers, compressed size: " + bucket[i].compressedOut.size() + String.format(" compression factor: %.2f", ((double)bucket[i].compressedOut.size())/bucketSize)); } System.out.println(String.format("Data size: %.2fM",(double)size/(1014*1024)) + String.format(" compressed %.2fM",(double)sizeCompressed/(1014*1024)) + String.format(" compression factor %.2f",(double)sizeCompressed/size)); } } static class Consumer { private Set<Integer> values = new HashSet<>(); int v = -1; public void consume(int value) { if(v < 0) v = value; if(v > value) { throw new IllegalArgumentException("Current value is greater than previous: " + v + " > " + value); }else{ v = value; values.remove(value); } } public void register(int value) { values.add(value); } public void finalCheck() { System.out.println(values.size() > 0 ? "NOT OK: " + values.size() : "OK!"); } } public static void main(String[] args) throws IOException { Producer p = new Producer(); Consumer c = new Consumer(); Sorter sorter = new Sorter(); sorter.doSort(p, c); } } 

随机数字产生以下内容:

 bucket[0] contains: 200357 numbers, compressed size: 353679 compression factor: 0.44 bucket[1] contains: 199465 numbers, compressed size: 352127 compression factor: 0.44 bucket[2] contains: 199682 numbers, compressed size: 352464 compression factor: 0.44 bucket[3] contains: 199949 numbers, compressed size: 352947 compression factor: 0.44 bucket[4] contains: 200547 numbers, compressed size: 353914 compression factor: 0.44 Data size: 3.85M compressed 1.70M compression factor 0.44 

对于简单的递增序列(使用一个桶),它会产生:

 bucket[0] contains: 1000000 numbers, compressed size: 256700 compression factor: 0.06 Data size: 3.85M compressed 0.25M compression factor 0.06 

编辑

结论:

  1. 不要试图欺骗大自然
  2. 使用更简单的压缩和更less的内存占用
  3. 一些额外的线索是真的需要。 常见的防弹解决scheme似乎不可行。

阶段2:增强压缩,最终结论

正如前一节已经提到的那样,可以使用任何合适的压缩技术。 所以让我们摆脱LZMA,转而采用更简单,更好的方法(如果可能的话)。 有很多很好的解决scheme,包括算术编码 , 基数树等。

无论如何,简单但有用的编码scheme将比另一个外部库更具说明性,提供了一些漂亮的algorithm。 实际的解决scheme非常简单:由于有部分sorting数据的桶,所以可以使用delta来代替数字。

编码方案

随机inputtesting显示稍好的结果:

 bucket[0] contains: 10103 numbers, compressed size: 13683 compression factor: 0.34 bucket[1] contains: 9885 numbers, compressed size: 13479 compression factor: 0.34 ... bucket[98] contains: 10026 numbers, compressed size: 13612 compression factor: 0.34 bucket[99] contains: 10058 numbers, compressed size: 13701 compression factor: 0.34 Data size: 3.85M compressed 1.31M compression factor 0.34 

示例代码

  public static void encode(int[] buffer, int length, BinaryOut output) { short size = (short)(length & 0x7FFF); output.write(size); output.write(buffer[0]); for(int i=1; i< size; i++) { int next = buffer[i] - buffer[i-1]; int bits = getBinarySize(next); int len = bits; if(bits > 24) { output.write(3, 2); len = bits - 24; }else if(bits > 16) { output.write(2, 2); len = bits-16; }else if(bits > 8) { output.write(1, 2); len = bits - 8; }else{ output.write(0, 2); } if (len > 0) { if ((len % 2) > 0) { len = len / 2; output.write(len, 2); output.write(false); } else { len = len / 2 - 1; output.write(len, 2); } output.write(next, bits); } } } public static short decode(BinaryIn input, int[] buffer, int offset) { short length = input.readShort(); int value = input.readInt(); buffer[offset] = value; for (int i = 1; i < length; i++) { int flag = input.readInt(2); int bits; int next = 0; switch (flag) { case 0: bits = 2 * input.readInt(2) + 2; next = input.readInt(bits); break; case 1: bits = 8 + 2 * input.readInt(2) +2; next = input.readInt(bits); break; case 2: bits = 16 + 2 * input.readInt(2) +2; next = input.readInt(bits); break; case 3: bits = 24 + 2 * input.readInt(2) +2; next = input.readInt(bits); break; } buffer[offset + i] = buffer[offset + i - 1] + next; } return length; } 

请注意,这种方法:

  1. 不消耗大量的内存
  2. 与stream一起工作
  3. 提供了不那么糟糕的结果

完整的代码可以在这里find,BinaryInput和BinaryOutput实现可以在这里find

定论

没有最后的结论:)有时候,把一个层次上移,从一个元层次的angular度来审查这个任务是个好主意。

花一些时间来完成这个任务很有意思。 顺便说一句,下面有很多有趣的答案。 感谢您的关注和快乐编纂。

一个解决scheme可能只是因为1兆字节和100万字节之间的差异。 大约有2种不同的方式可以select100万个8位数的数字,允许重复和不重要,所以一个只有100万字节RAM的机器没有足够的状态来表示所有的可能性。 但是1M(TCP / IPless于2k)是1022 * 1024 * 8 = 8372224位,所以解决scheme是可能的。

第1部分,初步解决scheme

这种方法需要1M多一点的时间,我会稍微改进一下。

我将把一个紧凑的数字sorting列表存储在0到99999999范围内,作为7位数字的子列表序列。 第一个子列表包含从0到127的数字,第二个子列表包含从128到255的数字等。100000000/128正好是781250,因此将需要781250个这样的子列表。

每个子列表包含一个2位子列表头,后跟一个子列表体。 子列表主体每个子列表项占用7位。 子列表都连接在一起,格式可以告诉子列表结束和下一个开始。 完全填充列表所需的总存储量为2 * 781250 + 7 * 1000000 = 8562500位,大约为1.021 M-bytes。

4个可能的子列表标题值是:

00空的子列表,没有任何跟随。

01 Singleton,子列表中只有一个条目,下一个7位保存它。

10子列表至less包含2个不同的数字。 条目以非递减顺序存储,除了最后一个条目小于或等于第一个条目。 这允许识别子列表的末尾。 例如,数字2,4,6将被存储为(4,6,2)。 数字2,2,3,4,4将被存储为(2,3,4,4,2)。

11子列表包含2个或更多重复的单个数字。 接下来的7位给出了这个数字。 然后出现零个或多个值为1的7位条目,然后是值为0的7位条目。子列表体的长度决定了重复次数。 例如,数字12,12将被存储为(12,0),数字12,12,12将被存储为(12,1,0),12,12,12,12将是(12,1 ,1,0)等等。

我从一个空的列表开始,读取一堆数字并将它们存储为32位整数,将新数字sorting(可能使用heapsort),然后将它们合并到一个新的紧凑sorting列表中。 重复,直到没有更多的数字要读取,然后再一次走紧凑的清单生成输出。

下面一行代表列表合并操作开始之前的内存。 “O”是保存sorting的32位整数的区域。 “X”是保存旧的紧凑列表的区域。 “=”符号是紧凑列表的扩展空间,“O”中的每个整数有7位。 “Z”是其他随机开销。

 ZZZOOOOOOOOOOOOOOOOOOOOOOOOOO==========XXXXXXXXXXXXXXXXXXXXXXXXXX 

合并程序开始在最左边的“O”和最左边的“X”处进行读取,并开始在最左边的“=”处进行写入。 写入指针不会捕获紧凑列表读取指针,直到所有新的整数都被合并为止,因为两个指针都为每个子列表提前2位,在旧的紧凑列表中每个条目提前7位,新号码的7位条目。

第2部分,将其填充到1M

为了将上面的解决scheme压缩到1M,我需要使紧凑列表格式更紧凑。 我将摆脱其中一个子列表types,以便只有3个不同的子列表标题值。 然后我可以使用“00”,“01”和“1”作为子表头值并保存几位。 子列表types是:

一个空的子列表,没有任何后续。

B Singleton,子列表中只有一个条目,下一个7位保存它。

C子列表至less包含2个不同的数字。 条目以非递减顺序存储,除了最后一个条目小于或等于第一个条目。 这允许识别子列表的末尾。 例如,数字2,4,6将被存储为(4,6,2)。 数字2,2,3,4,4将被存储为(2,3,4,4,2)。

D子列表包含2个或更多个单个数字的重复。

我的3个子表头值将是“A”,“B”和“C”,所以我需要一种方式来表示D型子列表。

假设我有Ctypes的子表头,然后是3个条目,比如“C [17] [101] [58]”。 这不能成为如上所述的有效的C型子列表的一部分,因为第三个条目小于第二个但是比第一个更多。 我可以使用这种types的构造来表示一个D型的子列表。 就位而言,任何有“C {00 ?????} {1 ??????} {01 ?????}”的地方都是不可能的C型子列表。 我将用它来表示一个由3个或更多的单个数字重复组成的子列表。 前两个7位字编码数字(下面的“N”位),后跟零个或多个{0100001}字,后跟一个{0100000}字。

 For example, 3 repetitions: "C{00NNNNN}{1NN0000}{0100000}", 4 repetitions: "C{00NNNNN}{1NN0000}{0100001}{0100000}", and so on. 

这只是留下了一个单一的数字重复的列表。 我将用另一个不可能的C型子列表模式来表示:“C {0 ??????} {11 ?????} {10 ?????}”。 前两个字的数字有7位,但是这个模式比它所表示的子列表要长,这使得事情变得复杂一些。 最后五个问号可以认为不是模式的一部分,所以我有:“C {0NNNNNN} {11N ????} 10”作为我的模式,数字要重复存储在“N “S。 这是2位太长了。

我将不得不借用2位,并在这种模式下从4个未使用的位中还钱。 读取时,在遇到“C {0NNNNNN} {11N00AB} 10”时,输出“N”中的数字2个实例,最后用A和B覆盖“10”,并将读指针倒回2位。 对于这个algorithm来说,破坏性读取是可以的,因为每个紧凑列表只能走一次。

当写入一个重复次数为2的子列表时,写入“C {0NNNNNN} 11N00”并将借用位计数器设置为2.每当借用位计数器为非零时,写入的每一位都会递减;当计数器达到零时写入“10”。 所以接下来的2位写入时隙A和B,然后“10”将被丢弃。

用“00”,“01”和“1”表示3个子表头值,我可以给最stream行的子表types分配“1”。 我需要一个小表将子列表标题值映射到子列表types,我需要每个子列表types的出现计数器,以便我知道最好的子列表标题映射是什么。

当所有的子types都同样受欢迎时,就会出现一个完全填充的紧凑列表的最坏情况。 在这种情况下,我为每3个子表头保存1位,所以列表大小为2 * 781250 + 7 * 1000000 – 781250/3 = 8302083.3位。 舍入到32位字边界,即8302112位或1037764字节。

1M减去2k的TCP / IP状态和缓冲区是1022 * 1024 = 1046528字节,留给我8764字节玩。

但是更改子列表头映射的过程呢? 在下面的内存映射中,“Z”是随机开销,“=”是可用空间,“X”是紧凑列表。

 ZZZ=====XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

从最左边的“X”处开始阅读,然后开始写在最左边的“=”处工作。 一旦完成,紧凑列表将会缩短一点,并且会成为内存错误的结尾:

 ZZZXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX======= 

那么我需要把它分stream到右边:

 ZZZ=======XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 

在标题映射的变化过程中,最多有1/3的子标题将从1位变为2位。 在最糟糕的情况下,这些都将在列表的头部,所以在启动之前我至less需要781250/3位的空闲存储空间,这使我回到了以前版本的精简列表的内存要求: (

为了解决这个问题,我们将781250个子列表分成10个子列表组,每个子列表包含78125个子列表。 每个组都有自己的独立的子列表标题映射。 对于组使用字母A到J:

 ZZZ=====AAAAAABBCCCCDDDDDEEEFFFGGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ 

子列表标题映射更改期间,每个子列表组缩小或保持不变:

 ZZZ=====AAAAAABBCCCCDDDDDEEEFFFGGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ ZZZAAAAAA=====BBCCCCDDDDDEEEFFFGGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ ZZZAAAAAABB=====CCCCDDDDDEEEFFFGGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ ZZZAAAAAABBCCC======DDDDDEEEFFFGGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ ZZZAAAAAABBCCCDDDDD======EEEFFFGGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ ZZZAAAAAABBCCCDDDDDEEE======FFFGGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ ZZZAAAAAABBCCCDDDDDEEEFFF======GGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ ZZZAAAAAABBCCCDDDDDEEEFFFGGGGGGGGGG=======HHIJJJJJJJJJJJJJJJJJJJJ ZZZAAAAAABBCCCDDDDDEEEFFFGGGGGGGGGGHH=======IJJJJJJJJJJJJJJJJJJJJ ZZZAAAAAABBCCCDDDDDEEEFFFGGGGGGGGGGHHI=======JJJJJJJJJJJJJJJJJJJJ ZZZAAAAAABBCCCDDDDDEEEFFFGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ======= ZZZ=======AAAAAABBCCCDDDDDEEEFFFGGGGGGGGGGHHIJJJJJJJJJJJJJJJJJJJJ 

在映射变化期间最坏的情况下,子列表组的扩展是78125/3 = 26042位。 如果我允许4k加1037764字节为一个完全填充的紧凑列表,这留下了8764 – 4096 = 4668字节的“Z”在内存映射。

这应该是足够的10个子表标题映射表,30个子表标题出现计数和其他less数计数器,指针和小缓冲区我将需要,我已经用过的空间没有注意到,如堆栈空间的函数调用返回地址和局部variables。

第3部分,需要多长时间才能运行?

在一个空的紧凑列表中,1位列表标题将被用于一个空子列表,并且列表的开始大小将是781250位。 在最坏的情况下,列表中的每个数字增加了8位,因此32位数字中的每一个都需要32 + 8 = 40位的自由空间位于列表缓冲区的顶部,然后进行sorting和合并。 在最坏的情况下,更改子列表标题映射会导致空间使用率为2 * 781250 + 7 *条目 – 781250/3位。

在每五次合并一次后,如果列表中至less有80万个数字,就会改变子列表头部映射的策略,最坏的情况下会涉及到大约3000万的紧凑列表读写活动。

资源:

http://nick.cleaton.net/ramsortsol.html

吉尔马诺夫的回答在假设上是错误的。 它开始基于一百万连续整数毫无意义的测量。 这意味着没有差距。 那些随意的差距,尽pipe很小,但真的使它成为一个不好的主意。

亲自尝试一下。 获得100万随机27​​位整数,sorting,用7-Zip ,xz压缩,无论你想要的LZMA。 结果超过1.5 MB。 最重要的前提是连续数字的压缩。 即使是三angular洲编码 超过1.1 MB 。 没关系,这是使用超过100 MB的RAM压缩。 所以即使是压缩的整数也不适合这个问题,不用担心运行时RAM的使用

让我感到悲伤的是,人们如何提高漂亮的graphics和合理性。

 #include <stdint.h> #include <stdlib.h> #include <time.h> int32_t ints[1000000]; // Random 27-bit integers int cmpi32(const void *a, const void *b) { return ( *(int32_t *)a - *(int32_t *)b ); } int main() { int32_t *pi = ints; // Pointer to input ints (REPLACE W/ read from net) // Fill pseudo-random integers of 27 bits srand(time(NULL)); for (int i = 0; i < 1000000; i++) ints[i] = rand() & ((1<<27) - 1); // Random 32 bits masked to 27 bits qsort(ints, 1000000, sizeof (ints[0]), cmpi32); // Sort 1000000 int32s // Now delta encode, optional, store differences to previous int for (int i = 1, prev = ints[0]; i < 1000000; i++) { ints[i] -= prev; prev += ints[i]; } FILE *f = fopen("ints.bin", "w"); fwrite(ints, 4, 1000000, f); fclose(f); exit(0); } 

Now compress ints.bin with LZMA…

 $ xz -f --keep ints.bin # 100 MB RAM $ 7z a ints.bin.7z ints.bin # 130 MB RAM $ ls -lh ints.bin* 3.8M ints.bin 1.1M ints.bin.7z 1.2M ints.bin.xz 

I think one way to think about this is from a combinatorics viewpoint: how many possible combinations of sorted number orderings are there? If we give the combination 0,0,0,….,0 the code 0, and 0,0,0,…,1 the code 1, and 99999999, 99999999, … 99999999 the code N, what is N? In other words, how big is the result space?

Well, one way to think about this is noticing that this is a bijection of the problem of finding the number of monotonic paths in an N x M grid, where N = 1,000,000 and M = 100,000,000. In other words, if you have a grid that is 1,000,000 wide and 100,000,000 tall, how many shortest paths from the bottom left to the top right are there? Shortest paths of course require you only ever either move right or up (if you were to move down or left you would be undoing previously accomplished progress). To see how this is a bijection of our number sorting problem, observe the following:

You can imagine any horizontal leg in our path as a number in our ordering, where the Y location of the leg represents the value.

在这里输入图像说明

So if the path simply moves to the right all the way to the end, then jumps all the way to the top, that is equivalent to the ordering 0,0,0,…,0. if it instead begins by jumping all the way to the top and then moves to the right 1,000,000 times, that is equivalent to 99999999,99999999,…, 99999999. A path where it moves right once, then up once, then right one, then up once, etc to the very end (then necessarily jumps all the way to the top), is equivalent to 0,1,2,3,…,999999.

Luckily for us this problem has already been solved, such a grid has (N + M) Choose (M) paths:

(1,000,000 + 100,000,000) Choose (100,000,000) ~= 2.27 * 10^2436455

N thus equals 2.27 * 10^2436455, and so the code 0 represents 0,0,0,…,0 and the code 2.27 * 10^2436455 and some change represents 99999999,99999999,…, 99999999.

In order to store all the numbers from 0 to 2.27 * 10^2436455 you need lg2 (2.27 * 10^2436455) = 8.0937 * 10^6 bits.

1 megabyte = 8388608 bits > 8093700 bits

So it appears that we at least actually have enough room to store the result! Now of course the interesting bit is doing the sorting as the numbers stream in. Not sure the best approach to this is given we have 294908 bits remaining. I imagine an interesting technique would be to at each point assume that that is is the entire ordering, finding the code for that ordering, and then as you receive a new number going back and updating the previous code. Hand wave hand wave.

My suggestions here owe a lot to Dan's solution

First off I assume the solution must handle all possible input lists. I think the popular answers do not make this assumption (which IMO is a huge mistake).

It is known that no form of lossless compression will reduce the size of all inputs.

All the popular answers assume they will be able to apply compression effective enough to allow them extra space. In fact, a chunk of extra space large enough to hold some portion of their partially completed list in an uncompressed form and allow them to perform their sorting operations. This is just a bad assumption.

For such a solution, anyone with knowledge of how they do their compression will be able to design some input data that does not compress well for this scheme, and the "solution" will most likely then break due to running out of space.

Instead I take a mathematical approach. Our possible outputs are all the lists of length LEN consisting of elements in the range 0..MAX. Here the LEN is 1,000,000 and our MAX is 100,000,000.

For arbitrary LEN and MAX, the amount of bits needed to encode this state is:

Log2(MAX Multichoose LEN)

So for our numbers, once we have completed recieving and sorting, we will need at least Log2(100,000,000 MC 1,000,000) bits to store our result in a way that can uniquely distinguish all possible outputs.

This is ~= 988kb . So we actually have enough space to hold our result. From this point of view, it is possible.

[Deleted pointless rambling now that better examples exist…]

Best answer is here .

Another good answer is here and basically uses insertion sort as the function to expand the list by one element (buffers a few elements and pre-sorts, to allow insertion of more than one at a time, saves a bit of time). uses a nice compact state encoding too, buckets of seven bit deltas

Suppose this task is possible. Just prior to output, there will be an in-memory representation of the million sorted numbers. How many different such representations are there? Since there may be repeated numbers we can't use nCr (choose), but there is an operation called multichoose that works on multisets .

  • There are 2.2e2436455 ways to choose a million numbers in range 0..99,999,999.
  • That requires 8,093,730 bits to represent every possible combination, or 1,011,717 bytes.

So theoretically it may be possible, if you can come up with a sane (enough) representation of the sorted list of numbers. For example, an insane representation might require a 10MB lookup table or thousands of lines of code.

However, if "1M RAM" means one million bytes, then clearly there is not enough space. The fact that 5% more memory makes it theoretically possible suggests to me that the representation will have to be VERY efficient and probably not sane.

(My original answer was wrong, sorry for the bad math, see below the break.)

这个怎么样?

The first 27 bits store the lowest number you have seen, then the difference to the next number seen, encoded as follows: 5 bits to store the number of bits used in storing the difference, then the difference. Use 00000 to indicate that you saw that number again.

This works because as more numbers are inserted, the average difference between numbers goes down, so you use less bits to store the difference as you add more numbers. I believe this is called a delta list.

The worst case I can think of is all numbers evenly spaced (by 100), eg Assuming 0 is the first number:

 000000000000000000000000000 00111 1100100 ^^^^^^^^^^^^^ a million times 27 + 1,000,000 * (5+7) bits = ~ 427k 

Reddit to the rescue!

If all you had to do was sort them, this problem would be easy. It takes 122k (1 million bits) to store which numbers you have seen (0th bit on if 0 was seen, 2300th bit on if 2300 was seen, etc.

You read the numbers, store them in the bit field, and then shift the bits out while keeping a count.

BUT, you have to remember how many you have seen. I was inspired by the sublist answer above to come up with this scheme:

Instead of using one bit, use either 2 or 27 bits:

  • 00 means you did not see the number.
  • 01 means you saw it once
  • 1 means you saw it, and the next 26 bits are the count of how many times.

I think this works: if there are no duplicates, you have a 244k list. In the worst case you see each number twice (if you see one number three times, it shortens the rest of the list for you), that means you have seen 50,000 more than once, and you have seen 950,000 items 0 or 1 times.

50,000 * 27 + 950,000 * 2 = 396.7k.

You can make further improvements if you use the following encoding:

0 means you did not see the number 10 means you saw it once 11 is how you keep count

Which will, on average, result in 280.7k of storage.

EDIT: my Sunday morning math was wrong.

The worst case is we see 500,000 numbers twice, so the math becomes:

500,000 *27 + 500,000 *2 = 1.77M

The alternate encoding results in an average storage of

500,000 * 27 + 500,000 = 1.70M

: (

There is one solution to this problem across all possible inputs. Cheat.

  1. Read m values over TCP, where m is near the max that can be sorted in memory, maybe n/4.
  2. Sort the 250,000 (or so) numbers and output them.
  3. Repeat for the other 3 quarters.
  4. Let the receiver merge the 4 lists of numbers it has received as it processes them. (It's not much slower than using a single list.)

I would try a Radix Tree . If you could store the data in a tree, you could then do an in-order traverse to transmit the data.

I'm not sure you could fit this into 1MB, but I think it's worth a try.

What kind of computer are you using? It may not have any other "normal" local storage, but does it have video RAM, for example? 1 megapixel x 32 bits per pixel (say) is pretty close to your required data input size.

(I largely ask in memory of the old Acorn RISC PC , which could 'borrow' VRAM to expand the available system RAM, if you chose a low resolution or low colour-depth screen mode!). This was rather useful on a machine with only a few MB of normal RAM.

A radix tree representation would come close to handling this problem, since the radix tree takes advantage of "prefix compression". But it's hard to conceive of a radix tree representation that could represent a single node in one byte — two is probably about the limit.

But, regardless of how the data is represented, once it is sorted it can be stored in prefix-compressed form, where the numbers 10, 11, and 12 would be represented by, say 001b, 001b, 001b, indicating an increment of 1 from the previous number. Perhaps, then, 10101b would represent an increment of 5, 1101001b an increment of 9, etc.

I have a computer with 1M of RAM and no other local storage

Another way to cheat: you could use non-local (networked) storage instead (your question does not preclude this) and call a networked service that could use straightforward disk-based mergesort (or just enough RAM to sort in-memory, since you only need to accept 1M numbers), without needing the (admittedly extremely ingenious) solutions already given.

This might be cheating, but it's not clear whether you are looking for a solution to a real-world problem, or a puzzle that invites bending of the rules… if the latter, then a simple cheat may get better results than a complex but "genuine" solution (which as others have pointed out, can only work for compressible inputs).

There are 10^6 values in a range of 10^8, so there's one value per hundred code points on average. Store the distance from the Nth point to the (N+1)th. Duplicate values have a skip of 0. This means that the skip needs an average of just under 7 bits to store, so a million of them will happily fit into our 8 million bits of storage.

These skips need to be encoded into a bitstream, say by Huffman encoding. Insertion is by iterating through the bitstream and rewriting after the new value. Output by iterating through and writing out the implied values. For practicality, it probably wants to be done as, say, 10^4 lists covering 10^4 code points (and an average of 100 values) each.

A good Huffman tree for random data can be built a priori by assuming a Poisson distribution (mean=variance=100) on the length of the skips, but real statistics can be kept on the input and used to generate an optimal tree to deal with pathological cases.

I think the solution is to combine techniques from video encoding, namely the discrete cosine transformation. In digital video, rather recording the changing the brightness or colour of video as regular values such as 110 112 115 116, each is subtracted from the last (similar to run length encoding). 110 112 115 116 becomes 110 2 3 1. The values, 2 3 1 require less bits than the originals.

So lets say we create a list of the input values as they arrive on the socket. We are storing in each element, not the value, but the offset of the one before it. We sort as we go, so the offsets are only going to be positive. But the offset could be 8 decimal digits wide which this fits in 3 bytes. Each element can't be 3 bytes, so we need to pack these. We could use the top bit of each byte as a "continue bit", indicating that the next byte is part of the number and the lower 7 bits of each byte need to be combined. zero is valid for duplicates.

As the list fills up, the numbers should be get closer together, meaning on average only 1 byte is used to determine the distance to the next value. 7 bits of value and 1 bit of offset if convenient, but there may be a sweet spot that requires less than 8 bits for a "continue" value.

Anyway, I did some experiment. I use a random number generator and I can fit a million sorted 8 digit decimal numbers into about 1279000 bytes. The average space between each number is consistently 99…

 public class Test { public static void main(String[] args) throws IOException { // 1 million values int[] values = new int[1000000]; // create random values up to 8 digits lrong Random random = new Random(); for (int x=0;x<values.length;x++) { values[x] = random.nextInt(100000000); } Arrays.sort(values); ByteArrayOutputStream baos = new ByteArrayOutputStream(); int av = 0; writeCompact(baos, values[0]); // first value for (int x=1;x<values.length;x++) { int v = values[x] - values[x-1]; // difference av += v; System.out.println(values[x] + " diff " + v); writeCompact(baos, v); } System.out.println("Average offset " + (av/values.length)); System.out.println("Fits in " + baos.toByteArray().length); } public static void writeCompact(OutputStream os, long value) throws IOException { do { int b = (int) value & 0x7f; value = (value & 0x7fffffffffffffffl) >> 7; os.write(value == 0 ? b : (b | 0x80)); } while (value != 0); } } 

We could play with the networking stack to send the numbers in sorted order before we have all the numbers. If you send 1M of data, TCP/IP will break it into 1500 byte packets and stream them in order to the target. Each packet will be given a sequence number.

We can do this by hand. Just before we fill our RAM we can sort what we have and send the list to our target but leave holes in our sequence around each number. Then process the 2nd 1/2 of the numbers the same way using those holes in the sequence.

The networking stack on the far end will assemble the resulting data stream in order of sequence before handing it up to the application.

It's using the network to perform a merge sort. This is a total hack, but I was inspired by the other networking hack listed previously.

Google 's (bad) approach, from HN thread. Store RLE-style counts.

Your initial data structure is '99999999:0' (all zeros, haven't seen any numbers) and then lets say you see the number 3,866,344 so your data structure becomes '3866343:0,1:1,96133654:0' as you can see the numbers will always alternate between number of zero bits and number of '1' bits so you can just assume the odd numbers represent 0 bits and the even numbers 1 bits. This becomes (3866343,1,96133654)

Their problem doesn't seem to cover duplicates, but let's say they use "0:1" for duplicates.

Big problem #1: insertions for 1M integers would take ages .

Big problem #2: like all plain delta encoding solutions, some distributions can't be covered this way. For example, 1m integers with distances 0:99 (eg +99 each one). Now think the same but with random distance in the range of 0:99 . (Note: 99999999/1000000 = 99.99)

Google's approach is both unworthy (slow) and incorrect. But to their defense, their problem might have been slightly different.

To represent the sorted array one can just store the first element and the difference between adjacent elements. In this way we are concerned with encoding 10^6 elements that can sum up to at most 10^8. Let's call this D . To encode the elements of D one can use a Huffman code . The dictionary for the Huffman code can be created on the go and the array updated every time a new item is inserted in the sorted array (insertion sort). Note that when the dictionary changes because of a new item the whole array should be updated to match the new encoding.

The average number of bits for encoding each element of D is maximized if we have equal number of each unique element. Say elements d1 , d2 , …, dN in D each appear F times. In that case (in worst case we have both 0 and 10^8 in input sequence) we have

sum(1<= i <= N ) F . di = 10^8

哪里

sum(1<= i <= N ) F = 10^6, or F =10^6/ N and the normalized frequency will be p = F /10^=1/ N

The average number of bits will be -log2(1/ P ) = log2( N ). Under these circumstances we should find a case that maximizes N . This happens if we have consecutive numbers for di starting from 0, or, di = i -1, therefore

10^8= sum(1<= i <= N ) F . di = sum(1<= i <= N ) (10^6/ N ) (i-1) = (10^6/ N ) N ( N -1)/2

N <= 201. And for this case average number of bits is log2(201)=7.6511 which means we will need around 1 byte per input element for saving the sorted array. Note that this doesn't mean D in general cannot have more than 201 elements. It just sows that if elements of D are uniformly distributed, it cannot have more than 201 unique values.

I would exploit the retransmission behaviour of TCP.

  1. Make the TCP component create a large receive window.
  2. Receive some amount of packets without sending an ACK for them.
    • Process those in passes creating some (prefix) compressed data structure
    • Send duplicate ack for last packet that is not needed anymore/wait for retransmission timeout
    • Goto 2
  3. All packets were accepted

This assumes some kind of benefit of buckets or multiple passes.

Probably by sorting the batches/buckets and merging them. -> radix trees

Use this technique to accept and sort the first 80% then read the last 20%, verify that the last 20% do not contain numbers that would land in the first 20% of the lowest numbers. Then send the 20% lowest numbers, remove from memory, accept the remaining 20% of new numbers and merge.**

We have 1 MB – 3 KB RAM = 2^23 – 3*2^13 bits = 8388608 – 24576 = 8364032 bits available.

We are given 10^6 numbers in a 10^8 range. This gives an average gap of ~100 < 2^7 = 128

Let's first consider the simpler problem of fairly evenly spaced numbers when all gaps are < 128. This is easy. Just store the first number and the 7-bit gaps:

(27 bits) + 10^6 7-bit gap numbers = 7000027 bits required

Note repeated numbers have gaps of 0.

But what if we have gaps larger than 127?

OK, let's say a gap size < 127 is represented directly, but a gap size of 127 is followed by a continuous 8-bit encoding for the actual gap length:

  10xxxxxx xxxxxxxx = 127 .. 16,383 110xxxxx xxxxxxxx xxxxxxxx = 16384 .. 2,097,151 

等等

Note this number representation describes its own length so we know when the next gap number starts.

With just small gaps < 127, this still requires 7000027 bits.

There can be up to (10^8)/(2^7) = 781250 23-bit gap number, requiring an extra 16*781,250 = 12,500,000 bits which is too much. We need a more compact and slowly increasing representation of gaps.

The average gap size is 100 so if we reorder them as [100, 99, 101, 98, 102, …, 2, 198, 1, 199, 0, 200, 201, 202, …] and index this with a dense binary Fibonacci base encoding with no pairs of zeros (for example, 11011=8+5+2+1=16) with numbers delimited by '00' then I think we can keep the gap representation short enough, but it needs more analysis.

If the input stream could be received few times this would be much easier (no information about that, idea and time-performance problem).

Then, we could count the decimal values. With counted values it would be easy to make the output stream. Compress by counting the values. It depends what would be in the input stream.

If the input stream could be received few times this would be much easier (no info about that, idea and time-performance problem). Then, we could count the decimal values. With counted values it would be easy to make the output stream. Compress by counting the values. It depends what would be in the input stream.

Sorting is a secondary problem here. As other said, just storing the integers is hard, and cannot work on all inputs , since 27 bits per number would be necessary.

My take on this is: store only the differences between the consecutive (sorted) integers, as they will be most likely small. Then use a compression scheme, eg with 2 additional bits per input number, to encode how many bits the number is stored on. Something like:

 00 -> 5 bits 01 -> 11 bits 10 -> 19 bits 11 -> 27 bits 

It should be possible to store a fair number of possible input lists within the given memory constraint. The maths of how to pick the compression scheme to have it work on the maximum number of inputs, are beyond me.

I hope you may be able to exploit domain-specific knowledge of your input to find a good enough integer compression scheme based on this.

Oh and then, you do an insertion sort on that sorted list as you receive data.

Now aiming to an actual solution, covering all possible cases of input in the 8 digit range with only 1MB of RAM. NOTE: work in progress, tomorrow will continue. Using arithmetic coding of deltas of the sorted ints, worst case for 1M sorted ints would cost about 7bits per entry (since 99999999/1000000 is 99, and log2(99) is almost 7 bits).

But you need the 1m integers sorted to get to 7 or 8 bits! Shorter series would have bigger deltas, therefore more bits per element.

I'm working on taking as many as possible and compressing (almost) in-place. First batch of close to 250K ints would need about 9 bits each at best. So result would take about 275KB. Repeat with remaining free memory a few times. Then decompress-merge-in-place-compress those compressed chunks. This is quite hard , but possible. I think.

The merged lists would get closer and closer to the 7bit per integer target. But I don't know how many iterations it would take of the merge loop. Perhaps 3.

But the imprecision of the arithmetic coding implementation might make it impossible. If this problem is possible at all, it would be extremely tight.

Any volunteers?

You just need to store the differences between the numbers in sequence, and use an encoding to compress these sequence numbers. We have 2^23 bits. We shall divide it into 6bit chunks, and let the last bit indicate whether the number extends to another 6 bits (5bits plus extending chunk).

Thus, 000010 is 1, and 000100 is 2. 000001100000 is 128. Now, we consider the worst cast in representing differences in sequence of a numbers up to 10,000,000. There can be 10,000,000/2^5 differences greater than 2^5, 10,000,000/2^10 differences greater than 2^10, and 10,000,000/2^15 differences greater than 2^15, etc.

So, we add how many bits it will take to represent our the sequence. We have 1,000,000*6 + roundup(10,000,000/2^5)*6+roundup(10,000,000/2^10)*6+roundup(10,000,000/2^15)*6+roundup(10,000,000/2^20)*4=7935479.

2^24 = 8388608. Since 8388608 > 7935479, we should easily have enough memory. We will probably need another little bit of memory to store the sum of where are when we insert new numbers. We then go through the sequence, and find where to insert our new number, decrease the next difference if necessary, and shift everything after it right.

If we don't know anything about those numbers, we are limited by the following constraints:

  • we need to load all numbers before we can sort them them,
  • the set of numbers is not compressible.

If these assumptions hold, there is no way to carry out your task, as you will need at least 26,575,425 bits of storage (3,321,929 bytes).

What can you tell us about your data ?

The trick is to represent the algorithms state, which is an integer multi-set, as a compressed stream of "increment counter"="+" and "output counter"="!" characters. For example, the set {0,3,3,4} would be represented as "!+++!!+!", followed by any number of "+" characters. To modify the multi-set you stream out the characters, keeping only a constant amount decompressed at a time, and make changes inplace before streaming them back in compressed form.

细节

We know there are exactly 10^6 numbers in the final set, so there are at most 10^6 "!" characters. We also know that our range has size 10^8, meaning there are at most 10^8 "+" characters. The number of ways we can arrange 10^6 "!"s amongst 10^8 "+"s is (10^8 + 10^6) choose 10^6 , and so specifying some particular arrangement takes ~0.965 MiB ` of data. That'll be a tight fit.

We can treat each character as independent without exceeding our quota. There are exactly 100 times more "+" characters than "!" characters, which simplifies to 100:1 odds of each character being a "+" if we forget that they are dependent. Odds of 100:101 corresponds to ~0.08 bits per character , for an almost identical total of ~0.965 MiB (ignoring the dependency has a cost of only ~12 bits in this case!).

The simplest technique for storing independent characters with known prior probability is Huffman coding . Note that we need an impractically large tree (A huffman tree for blocks of 10 characters has an average cost per block of about 2.4 bits, for a total of ~2.9 Mib. A huffman tree for blocks of 20 characters has an average cost per block of about 3 bits, which is a total of ~1.8 MiB. We're probably going to need a block of size on the order of a hundred, implying more nodes in our tree than all the computer equipment that has ever existed can store.). However, ROM is technically "free" according to the problem and practical solutions that take advantage of the regularity in the tree will look essentially the same.

Pseudo-code

  • Have a sufficiently large huffman tree (or similar block-by-block compression data) stored in ROM
  • Start with a compressed string of 10^8 "+" characters.
  • To insert the number N, stream out the compressed string until N "+" characters have gone past then insert a "!". Stream the recompressed string back over the previous one as you go, keeping a constant amount of buffered blocks to avoid over/under-runs.
  • Repeat one million times: [input, stream decompress>insert>compress], then decompress to output

While receiving the stream do these steps.

1st set some reasonable chunk size

Pseudo Code idea:

  1. The first step would be to find all the duplicates and stick them in a dictionary with its count and remove them.
  2. The third step would be to place number that exist in sequence of their algorithmic steps and place them in counters special dictionaries with the first number and their step like n, n+1 … ,n+2, 2n, 2n+1, 2n+2…
  3. Begin to compress in chunks some reasonable ranges of number like every 1000 or ever 10000 the remaining numbers that appear less often to repeat.
  4. Uncompress that range if a number is found and add it to the range and leave it uncompressed for a while longer.
  5. Otherwise just add that number to a byte[chunkSize]

Continue the first 4 steps while receiving the stream. The final step would be to either fail if you exceeded memory or start outputting the result once all the data is collected by beginning to sort the ranges and spit out the results in order and uncompressing those in order that need to be uncompressed and sort them when you get to them.

This is assuming base 10 and as an example, your memory is using 8 bit words: Memory map the entire range of numbers using 3 bit increments. The First 3 bits would correspond to the number 0. The second set of 3 bit would map to number 1. Three hundred thousandth set of 3-bit would map to the number 300k. Repeat this until you have mapped out all the 8 digit numbers. This would total 375k bytes in total if memory range was continuous.

The 1st bit out of the 3 would mark the presence of the number. The next 2 bits would indicate the amount of duplicates that could be represented in bytes(1..3) if none, the duplicates field would be 00. There will be a second list that uses a counter that increments each time a 3 bit field is marked as having a duplicate. If it marked with 1 it will have a single bit range to count the amount of duplicates it has. 8 bits can represent a range 255.

As I'm losing track of thoughts. The second list will keep track of how many duplicates for each number. if the 255th number has a duplicate and is the first number to have a duplicate it's index in the list will be 0. If 23,543 is the second number to have a duplicate it's index will be 1. Wash,rise and repeat.

Worst case scenario is you have 500k numbers with duplicates. This can be represented by a single byte(since 1 fits in easy). So 375kB(ideally) + 500kB bytes is close to .875MB. Depending on your processor this should leave enough room left over for pointers,indexing and all of the other fun stuff.

If you have a single number that has 1M duplicates. all you need is 3 bytes, since your limited to 1M numbers, that's all you have to worry about. So on your second list it will be just be 3 byes with the total amount.

Now for the fun part. The second list will need to be sorted for each new number that comes in. In the 3 bit field the last 2 are the number of bytes that contains the number of duplicates. Since the second list is expected to be in order it will need to be sorted. Since the amount of bytes can vary. Think insertion sort!

This would keep the amount of pointers and things you need to increment to a minimum so you should have a little bit of flexibility with the maybe 250k bytes left.

GoodLuck! This sounds so much more elegant in my mind…