O(nlogn)algorithm – 在二进制string中查找三个均匀分布的algorithm

我昨天在一个algorithmtesting中遇到过这个问题,我无法弄清楚答案。 这让我非常疯狂,因为它值得大约40分。 我认为大部分class级都没有正确解决,因为在过去24小时内我还没有提出解决scheme。

给定长度为n的任意二进制string,如果它们存在,则在string内find三个均匀分布的string。 写一个在O(n * log(n))时间内解决这个问题的algorithm。

所以像这样的string有三个“均匀间隔”:11100000,0100100100

编辑:这是一个随机数,所以它应该能够为任何数字工作。 我给的例子是说明“均匀间隔”的属性。 所以1001011是一个有效的数字。 1,4和7是均匀分布的。

最后! 在sdcvvc的回答中继续领导,我们有:问题的O(n log n)algorithm! 你理解之后也很简单。 那些猜测FFT是正确的。

问题是:给定一个长度为n的二进制stringS ,并且我们希望在其中find三个均匀间隔的1。 例如, S可以是110110010 ,其中n = 9。 它在第2,5和8位均匀分布1。

  1. 扫描S从左到右,并列出L的位置为1.对于上面的S=110110010 ,我们有列表L = [ S=110110010 ]。 这一步是O(n)。 现在问题是在L中find长度为3算术级数 ,即在L中find不同的a,b,c ,使得ba = cb ,或者等价于a + c = 2b 。 对于上面的例子,我们想要find进展(2,5,8)。

  2. p L每个k使用项x k作一个多项式 p 。 对于上面的例子,我们做多项式p(x)=(x + x 2 + x 4 + x 5 + x 8 。 这一步是O(n)。

  3. 使用快速傅立叶变换找出多项式q = p 2 。 对于上面的例子,我们得到多项式q(x)= x 16 + 2x 13 + 2x 12 + 3x 10 + 4x 9 + x 8 + 2x 7 + 4x 6 + 2x 5 + x 4 + 2x 3 + x 2这一步是O(n log n)。

  4. 忽略除L对应于x 2k的某些k的所有术语。 对于上面的例子,我们得到的条件x 16,3x 10 ,x 8 ,x 4 ,x 2 。 这一步是O(n),如果你select这样做的话。

这里有一个关键点: L b的任意x 2b的系数恰好L中的对(a,c)的个数,使得a + c = 2b 。 [CLRS,Ex。 30.1-7]一个这样的对是(b,b)总是(所以系数至less为1),但是如果存在任何其他对(a,c) ,则系数至less为3,从(a,c )(c,a) 。 对于上面的例子,由于AP(2,5,8),我们将x 10的系数精确到3。 (由于上述原因,这些系数x 2b总是奇数,而q中的所有其他系数将始终是偶数)。

那么,algorithm就是查看这些项的系数x 2b ,看它们中的任何一个是否大于1.如果没有,那么没有均匀间隔的1。 如果在L有一个b的系数x 2b大于1,那么我们知道存在一对(a,c) – 除(b,b) – 其中a + c = 2b 。 为了find实际的对,我们简单地尝试在L每个a (相应的c将是2b-a ),并且看看在S位置2b-a处是否有1。 这一步是O(n)。

就这样,伙计们。


有人可能会问:我们是否需要使用FFT? 许多答案,例如beta , flybywire和rsp都表明,检查每对1的方法,看看在“第三”位置是否有1,可能在O(n log n)中工作,基于直觉如果有太多的1,我们会很容易地find一个三元组,如果1s太less,检查所有的对就花费很less的时间。 不幸的是,虽然这个直觉是正确的,简单的方法比O(n 2 )更好,但它并不好。 正如在sdcvvc的答案中 ,我们可以把长度为n = 3 k的string的“类似Cantor的集合”,在三进制表示只有0s和2s(no 1s)的位置加上1s。 这样的一个string有2 k = n (log 2)/(log 3) ≈n 0.63个string,并且没有均匀间隔的1,所以检查所有对的数量是1的平方数: 4 k≈n 1.26 ,不幸的是渐近地比(n log n)大得多。 事实上,最坏的情况更糟糕:1953年的Leo Moser 构造了 (有效的)这样的串,其中n 1-c /√(log n) 1s,但是没有均匀间隔的1s,这意味着在这样的串上,方法将需要Θ(n 2-2c /√(log n) – 只比Θ(n 2好一点点,令人惊讶!


关于长度为n的string中的最大数量1,没有3个均匀分布的string(我们上面看到的是从简单的类似Cantor的构造中至less是0.63 ,并且至lessn 1-c /√(log n)是Moser的build设) – 这是OEIS A003002 。 它也可以从OEIS A065825直接计算出来,使得A065825(k)≤n <A065825(k + 1)。 我写了一个程序来find这些,结果是贪婪的algorithm并没有给出最长的这样的string。 例如,对于n = 9,我们可以得到5 1(110100011),但贪婪只给出4(110110000),对于n = 26,我们可以得到11 1(11001010001000010110001101),但贪婪只给出8(11011000011011000000000000) n = 74我们可以得到22 1(11000010110001000001011010001000000000000000010001011010000010001101000011),但贪婪只给出了16(11011000011011000000000000011011000011011000000000000000000000000000000000)。 尽pipe如此,他们的确在很多地方达成一致,直到50(例如38到50)。 正如OEIS的参考文献所说,Jaroslaw Wroblewski似乎对这个问题感兴趣,并且在这些非平均集中维护一个网站。 确切的数字只能达到194。

你的问题在本文 (1999)中被称为AVERAGE:

一个问题是3SUM-hard,如果存在问题的次二次减less3SUM:给定一个n个整数的集合A,A中是否有元素a,b,c使得a + b + c = 0? 目前还不知道AVERAGE是否是3SUM-hard。 但是,从AVERAGE到3SUM的简单线性时间减less了,其描述我们省略了。

维基百科 :

当整数在[-u … u]的范围内时,通过将S表示为位向量并使用FFT执行卷积,可以在时间O(n + u lg u)上求解3SUM。

这足以解决你的问题:)。

最重要的是O(n log n)的复杂度是零和一的数量,而不是一个的数量(可以用数组的forms给出,比如[1,5,9,15])。 检查一个集合是否具有算术级数,1的条件是很难的,并且根据1999年的论文,不知道比O(n 2 )更快的algorithm,并且推测它不存在。 没有考虑到这一点的每个人都试图解决一个公开的问题。

其他有趣的信息,主要是不相干的:

下界:

一个简单的下界是Cantor-like集(数字1..3 ^ n-1在它们的三元展开中不包含1) – 它的密度是n ^(log_3 2)(大约0.631)。 所以任何检查,如果集合不是太大,然后检查所有的对是不够的,得到O(N日志)。 你必须更聪明地调查序列。 这里引用了一个更好的下界 – 它是n 1-c /(log(n))^(1/2) 。 这意味着康托尔集不是最佳的。

上限 – 我的老algorithm:

已知对于大n,不包含算术级数的子集{1,2,…,n}至多有n /(log n)^(1/20)个元素。 关于算术级数中的三元组certificate更多:该集不能包含超过n * 2 28 *(log log n / log n) 1/2个元素。 所以你可以检查一下是否达到了这个界限,如果不是的话,天真地检查对。 这是O(n 2 * log log n / log n)algorithm,比O(n 2 )快。 不幸的是,“在三倍…”在施普林格 – 但是第一页是可利用的,并且本·格林的博览会在这里 ,第28页,定理24。

顺便说一下,这些文件是从1999年开始的 – 就像我刚才提到的那一年那样,所以这可能就是为什么第一个没有提到这个结果。

这不是一个解决scheme,而是与Olexiy想法相似的思路

我正在创build最大数量的序列,而且都非常有趣,我得到了125位数字,这里是前3个数字,试图插入尽可能多的'1'位:

  • 11011000011011000000000000001101100001101100000000000000000000000000000000000000000110110000110110000000000000011011000011011
  • 10110100010110100000000000010110100010110100000000000000000000000000000000000000000101101000101101000000000000101101000101101
  • 10011001010011001000000000010011001010011001000000000000000000000000000000000000010011001010011001000000000010011001010011001

注意它们都是分形 (考虑到这些限制,并不令人惊讶)。 可能有些东西在向后思考,也许如果string不是一个具有特征的分形 ,那么它必须有一个重复的模式?

感谢testing,以更好地描述这些数字。

更新:唉,它看起来像一个足够大的初始string,如10000000000001:



我怀疑一个看起来像O(n ^ 2)的简单方法实际上会产生更好的东西,比如O(n ln(n))。 花费最长时间testing的序列(对于任何给定的n)是不包含三元组的那些序列,并且对可以在序列中的1的数量施加严格的限制。

我提出了一些挥之不去的论据,但是我还没有find一个清楚的证据。 我要在黑暗中刺探:答案是一个非常聪明的主意,教授已经知道这么长时间了,看起来很明显,但是对于学生来说太难了。 (不pipe是你还是睡在讲座上的话)

修订date:2009-10-17 23:00

我已经运行这个大数字(如2000万string),我现在认为这个algorithm不是O(n logn)。 尽pipe如此,这是一个足够酷的实现,并包含一些优化,使其运行速度非常快。 它在25秒内评估二进制string24或更less数字的所有排列。

我已经更新了代码,包括今天早些时候的0 <= L < M < U <= X-1观察值。


原版的

这在概念上与我回答的另一个问题类似。 该代码还查看了一系列三个值,并确定三元组是否满足条件。 这里是C#代码改编自:

 using System; using System.Collections.Generic; namespace StackOverflow1560523 { class Program { public struct Pair<T> { public T Low, High; } static bool FindCandidate(int candidate, List<int> arr, List<int> pool, Pair<int> pair, ref int iterations) { int lower = pair.Low, upper = pair.High; while ((lower >= 0) && (upper < pool.Count)) { int lowRange = candidate - arr[pool[lower]]; int highRange = arr[pool[upper]] - candidate; iterations++; if (lowRange < highRange) lower -= 1; else if (lowRange > highRange) upper += 1; else return true; } return false; } static List<int> BuildOnesArray(string s) { List<int> arr = new List<int>(); for (int i = 0; i < s.Length; i++) if (s[i] == '1') arr.Add(i); return arr; } static void BuildIndexes(List<int> arr, ref List<int> even, ref List<int> odd, ref List<Pair<int>> evenIndex, ref List<Pair<int>> oddIndex) { for (int i = 0; i < arr.Count; i++) { bool isEven = (arr[i] & 1) == 0; if (isEven) { evenIndex.Add(new Pair<int> {Low=even.Count-1, High=even.Count+1}); oddIndex.Add(new Pair<int> {Low=odd.Count-1, High=odd.Count}); even.Add(i); } else { oddIndex.Add(new Pair<int> {Low=odd.Count-1, High=odd.Count+1}); evenIndex.Add(new Pair<int> {Low=even.Count-1, High=even.Count}); odd.Add(i); } } } static int FindSpacedOnes(string s) { // List of indexes of 1s in the string List<int> arr = BuildOnesArray(s); //if (s.Length < 3) // return 0; // List of indexes to odd indexes in arr List<int> odd = new List<int>(), even = new List<int>(); // evenIndex has indexes into arr to bracket even numbers // oddIndex has indexes into arr to bracket odd numbers List<Pair<int>> evenIndex = new List<Pair<int>>(), oddIndex = new List<Pair<int>>(); BuildIndexes(arr, ref even, ref odd, ref evenIndex, ref oddIndex); int iterations = 0; for (int i = 1; i < arr.Count-1; i++) { int target = arr[i]; bool found = FindCandidate(target, arr, odd, oddIndex[i], ref iterations) || FindCandidate(target, arr, even, evenIndex[i], ref iterations); if (found) return iterations; } return iterations; } static IEnumerable<string> PowerSet(int n) { for (long i = (1L << (n-1)); i < (1L << n); i++) { yield return Convert.ToString(i, 2).PadLeft(n, '0'); } } static void Main(string[] args) { for (int i = 5; i < 64; i++) { int c = 0; string hardest_string = ""; foreach (string s in PowerSet(i)) { int cost = find_spaced_ones(s); if (cost > c) { hardest_string = s; c = cost; Console.Write("{0} {1} {2}\r", i, c, hardest_string); } } Console.WriteLine("{0} {1} {2}", i, c, hardest_string); } } } } 

主要区别是:

  1. 彻底search解决scheme
    此代码生成一组幂数据以find解决此algorithm的最难的input。
  2. 所有解决scheme与最难解决的问题
    上一个问题的代码使用python生成器生成了所有的解决scheme。 此代码只显示每个模式长度最难的。
  3. 评分algorithm
    这段代码检查从中间元素到左右边缘的距离。 python代码testing了总和是高于还是低于0。
  4. 收敛于候选人
    目前的代码从中间向边缘寻找候选人。 在前面的问题中的代码从边缘向中间工作。 这最后的改变带来了很大的性能改善。
  5. 使用偶数和奇数池
    根据本文末尾的观察,代码search奇数对的偶数对来找出L和U,保持M不变。 这通过预先计算信息来减lesssearch次数。 因此,代码在FindCandidate的主循环中使用两级间接寻址,并且需要对每个中间元素调用两次FindCandidate:一次为偶数,一次为奇数。

总体思路是处理索引,而不是数据的原始表示。 计算1出现的数组允许algorithm在时间上运行,与数据中1的数量成正比,而不是与数据的长度成比例。 这是一个标准的转换:创build一个数据结构,在保持相同的问题的同时允许更快的操作。

结果已过时:已删除。


编辑:2009-10-16 18:48

在yx的数据,这是在其他答复作为代表硬数据计算的信息,我得到这些结果…我删除了这些。 他们已经过时了。

我会指出这个数据对于我的algorithm并不是最难的,所以我认为yx的分形是最难解决的假设是错误的。 我预计,对于特定algorithm来说,最糟糕的情况将取决于algorithm本身,并且在不同algorithm之间可能不一致。


编辑:2009-10-17 13:30

对此进一步的观察。

首先,将0和1的string转换为1的每个位置的索引数组。 说数组A的长度是X.那么目标就是find

 0 <= L < M < U <= X-1 

这样

 A[M] - A[L] = A[U] - A[M] 

要么

 2*A[M] = A[L] + A[U] 

由于A [L]和A [U]和为偶数,所以它们不能是(偶数,奇数)或(奇数,偶数)。 通过将A []划分为奇数和偶数池,并依次在奇数和偶数的候选池中searchA [M]上的匹配,可以改进对匹配的search。

However, this is more of a performance optimization than an algorithmic improvement, I think. The number of comparisons should drop, but the order of the algorithm should be the same.


Edit 2009-10-18 00:45

Yet another optimization occurs to me, in the same vein as separating the candidates into even and odd. Since the three indexes have to add to a multiple of 3 (a, a+x, a+2x — mod 3 is 0, regardless of a and x), you can separate L, M, and U into their mod 3 values:

 MLU 0 0 0 1 2 2 1 1 0 2 1 1 2 0 2 0 1 1 0 2 2 

In fact, you could combine this with the even/odd observation and separate them into their mod 6 values:

 MLU 0 0 0 1 5 2 4 3 3 4 2 5 1 

等等。 This would provide a further performance optimization but not an algorithmic speedup.

Wasn't able to come up with the solution yet :(, but have some ideas.

What if we start from a reverse problem: construct a sequence with the maximum number of 1s and WITHOUT any evenly spaced trios. If you can prove the maximum number of 1s is o(n), then you can improve your estimate by iterating only through list of 1s only.

This may help….

This problem reduces to the following:

Given a sequence of positive integers, find a contiguous subsequence partitioned into a prefix and a suffix such that the sum of the prefix of the subsequence is equal to the sum of the suffix of the subsequence.

For example, given a sequence of [ 3, 5, 1, 3, 6, 5, 2, 2, 3, 5, 6, 4 ] , we would find a subsequence of [ 3, 6, 5, 2, 2] with a prefix of [ 3, 6 ] with prefix sum of 9 and a suffix of [ 5, 2, 2 ] with suffix sum of 9 .

The reduction is as follows:

Given a sequence of zeros and ones, and starting at the leftmost one, continue moving to the right. Each time another one is encountered, record the number of moves since the previous one was encountered and append that number to the resulting sequence.

For example, given a sequence of [ 0, 1, 1, 0, 0, 1, 0, 0, 0, 1 0 ] , we would find the reduction of [ 1, 3, 4] . From this reduction, we calculate the contiguous subsequence of [ 1, 3, 4] , the prefix of [ 1, 3] with sum of 4 , and the suffix of [ 4 ] with sum of 4 .

This reduction may be computed in O(n) .

Unfortunately, I am not sure where to go from here.

For the simple problem type (ie you search three "1" with only (ie zero or more) "0" between it), Its quite simple: You could just split the sequence at every "1" and look for two adjacent subsequences having the same length (the second subsequence not being the last one, of course). Obviously, this can be done in O(n) time.

For the more complex version (ie you search an index i and an gap g >0 such that s[i]==s[i+g]==s[i+2*g]=="1" ), I'm not sure, if there exists an O(n log n) solution, since there are possibly O(n²) triplets having this property (think of a string of all ones, there are approximately n²/2 such triplets). Of course, you are looking for only one of these, but I have currently no idea, how to find it …

A fun question, but once you realise that the actual pattern between two '1's does not matter, the algorithm becomes:

  • scan look for a '1'
  • starting from the next position scan for another '1' (to the end of the array minus the distance from the current first '1' or else the 3rd '1' would be out of bounds)
  • if at the position of the 2nd '1' plus the distance to the first 1' a third '1' is found, we have evenly spaces ones.

In code, JTest fashion, (Note this code isn't written to be most efficient and I added some println's to see what happens.)

 import java.util.Random; import junit.framework.TestCase; public class AlgorithmTest extends TestCase { /** * Constructor for GetNumberTest. * * @param name The test's name. */ public AlgorithmTest(String name) { super(name); } /** * @see TestCase#setUp() */ protected void setUp() throws Exception { super.setUp(); } /** * @see TestCase#tearDown() */ protected void tearDown() throws Exception { super.tearDown(); } /** * Tests the algorithm. */ public void testEvenlySpacedOnes() { assertFalse(isEvenlySpaced(1)); assertFalse(isEvenlySpaced(0x058003)); assertTrue(isEvenlySpaced(0x07001)); assertTrue(isEvenlySpaced(0x01007)); assertTrue(isEvenlySpaced(0x101010)); // some fun tests Random random = new Random(); isEvenlySpaced(random.nextLong()); isEvenlySpaced(random.nextLong()); isEvenlySpaced(random.nextLong()); } /** * @param testBits */ private boolean isEvenlySpaced(long testBits) { String testString = Long.toBinaryString(testBits); char[] ones = testString.toCharArray(); final char ONE = '1'; for (int n = 0; n < ones.length - 1; n++) { if (ONE == ones[n]) { for (int m = n + 1; m < ones.length - m + n; m++) { if (ONE == ones[m] && ONE == ones[m + m - n]) { System.out.println(" IS evenly spaced: " + testBits + '=' + testString); System.out.println(" at: " + n + ", " + m + ", " + (m + m - n)); return true; } } } } System.out.println("NOT evenly spaced: " + testBits + '=' + testString); return false; } } 

I thought of a divide-and-conquer approach that might work.

First, in preprocessing you need to insert all numbers less than one half your input size ( n /3) into a list.

Given a string: 0000010101000100 (note that this particular example is valid)

Insert all primes (and 1) from 1 to (16/2) into a list: {1, 2, 3, 4, 5, 6, 7}

Then divide it in half:

100000101 01000100

Keep doing this until you get to strings of size 1. For all size-one strings with a 1 in them, add the index of the string to the list of possibilities; otherwise, return -1 for failure.

You'll also need to return a list of still-possible spacing distances, associated with each starting index. (Start with the list you made above and remove numbers as you go) Here, an empty list means you're only dealing with one 1 and so any spacing is possible at this point; otherwise the list includes spacings that must be ruled out.

So continuing with the example above:

1000 0101 0100 0100

10 00 01 01 01 00 01 00

1 0 0 0 0 1 0 1 0 1 0 0 0 1 0 0

In the first combine step, we have eight sets of two now. In the first, we have the possibility of a set, but we learn that spacing by 1 is impossible because of the other zero being there. So we return 0 (for the index) and {2,3,4,5,7} for the fact that spacing by 1 is impossible. In the second, we have nothing and so return -1. In the third we have a match with no spacings eliminated in index 5, so return 5, {1,2,3,4,5,7}. In the fourth pair we return 7, {1,2,3,4,5,7}. In the fifth, return 9, {1,2,3,4,5,7}. In the sixth, return -1. In the seventh, return 13, {1,2,3,4,5,7}. In the eighth, return -1.

Combining again into four sets of four, we have:

1000 : Return (0, {4,5,6,7}) 0101 : Return (5, {2,3,4,5,6,7}), (7, {1,2,3,4,5,6,7}) 0100 : Return (9, {3,4,5,6,7}) 0100 : Return (13, {3,4,5,6,7})

Combining into sets of eight:

10000101 : Return (0, {5,7}), (5, {2,3,4,5,6,7}), (7, {1,2,3,4,5,6,7}) 01000100 : Return (9, {4,7}), (13, {3,4,5,6,7})

Combining into a set of sixteen:

10000101 01000100

As we've progressed, we keep checking all the possibilities so far. Up to this step we've left stuff that went beyond the end of the string, but now we can check all the possibilities.

Basically, we check the first 1 with spacings of 5 and 7, and find that they don't line up to 1's. (Note that each check is CONSTANT, not linear time) Then we check the second one (index 5) with spacings of 2, 3, 4, 5, 6, and 7– or we would, but we can stop at 2 since that actually matches up.

唷! That's a rather long algorithm.

I don't know 100% if it's O(n log n) because of the last step, but everything up to there is definitely O(n log n) as far as I can tell. I'll get back to this later and try to refine the last step.

EDIT: Changed my answer to reflect Welbog's comment. Sorry for the error. I'll write some pseudocode later, too, when I get a little more time to decipher what I wrote again. 😉

I'll give my rough guess here, and let those who are better with calculating complexity to help me on how my algorithm fares in O-notation wise

  1. given binary string 0000010101000100 (as example)
  2. crop head and tail of zeroes -> 00000 101010001 00
  3. we get 101010001 from previous calculation
  4. check if the middle bit is 'one', if true, found valid three evenly spaced 'ones' (only if the number of bits is odd numbered)
  5. correlatively, if the remained cropped number of bits is even numbered, the head and tail 'one' cannot be part of evenly spaced 'one',
  6. we use 1010100001 as example (with an extra 'zero' to become even numbered crop), in this case we need to crop again, then becomes -> 10101 00001
  7. we get 10101 from previous calculation, and check middle bit, and we found the evenly spaced bit again

I have no idea how to calculate complexity for this, can anyone help?

edit: add some code to illustrate my idea

edit2: tried to compile my code and found some major mistakes, fixed

 char *binaryStr = "0000010101000100"; int main() { int head, tail, pos; head = 0; tail = strlen(binaryStr)-1; if( (pos = find3even(head, tail)) >=0 ) printf("found it at position %d\n", pos); return 0; } int find3even(int head, int tail) { int pos = 0; if(head >= tail) return -1; while(binaryStr[head] == '0') if(head<tail) head++; while(binaryStr[tail] == '0') if(head<tail) tail--; if(head >= tail) return -1; if( (tail-head)%2 == 0 && //true if odd numbered (binaryStr[head + (tail-head)/2] == '1') ) { return head; }else { if( (pos = find3even(head, tail-1)) >=0 ) return pos; if( (pos = find3even(head+1, tail)) >=0 ) return pos; } return -1; } 

I came up with something like this:

 def IsSymetric(number): number = number.strip('0') if len(number) < 3: return False if len(number) % 2 == 0: return IsSymetric(number[1:]) or IsSymetric(number[0:len(number)-2]) else: if number[len(number)//2] == '1': return True return IsSymetric(number[:(len(number)//2)]) or IsSymetric(number[len(number)//2+1:]) return False 

This is inspired by andycjw.

  1. Truncate the zeros.
  2. If even then test two substring 0 – (len-2) (skip last character) and from 1 – (len-1) (skip the first char)
  3. If not even than if the middle char is one than we have success. Else divide the string in the midle without the midle element and check both parts.

As to the complexity this might be O(nlogn) as in each recursion we are dividing by two.

希望能帮助到你。

Ok, I'm going to take another stab at the problem. I think I can prove a O(n log(n)) algorithm that is similar to those already discussed by using a balanced binary tree to store distances between 1's. This approach was inspired by Justice's observation about reducing the problem to a list of distances between the 1's.

Could we scan the input string to construct a balanced binary tree around the position of 1's such that each node stores the position of the 1 and each edge is labeled with the distance to the adjacent 1 for each child node. 例如:

 10010001 gives the following tree 3 / \ 2 / \ 3 / \ 0 7 

This can be done in O(n log(n)) since, for a string of size n, each insertion takes O(log(n)) in the worst case.

Then the problem is to search the tree to discover whether, at any node, there is a path from that node through the left-child that has the same distance as a path through the right child. This can be done recursively on each subtree. When merging two subtrees in the search, we must compare the distances from paths in the left subtree with distances from paths in the right. Since the number of paths in a subtree will be proportional to log(n), and the number of nodes is n, I believe this can be done in O(n log(n)) time.

我错过了什么?

This seemed liked a fun problem so I decided to try my hand at it.

I am making the assumption that 111000001 would find the first 3 ones and be successful. Essentially the number of zeroes following the 1 is the important thing, since 0111000 is the same as 111000 according to your definition. Once you find two cases of 1, the next 1 found completes the trilogy.

Here it is in Python:

 def find_three(bstring): print bstring dict = {} lastone = -1 zerocount = 0 for i in range(len(bstring)): if bstring[i] == '1': print i, ': 1' if lastone != -1: if(zerocount in dict): dict[zerocount].append(lastone) if len(dict[zerocount]) == 2: dict[zerocount].append(i) return True, dict else: dict[zerocount] = [lastone] lastone = i zerocount = 0 else: zerocount = zerocount + 1 #this is really just book keeping, as we have failed at this point if lastone != -1: if(zerocount in dict): dict[zerocount].append(lastone) else: dict[zerocount] = [lastone] return False, dict 

This is a first try, so I'm sure this could be written in a cleaner manner. Please list the cases where this method fails down below.

I assume the reason this is nlog(n) is due to the following:

  • To find the 1 that is the start of the triplet, you need to check (n-2) characters. If you haven't found it by that point, you won't (chars n-1 and n cannot start a triplet) (O(n))
  • To find the second 1 that is the part of the triplet (started by the first one), you need to check m/2 (m=nx, where x is the offset of the first 1) characters. This is because, if you haven't found the second 1 by the time you're halfway from the first one to the end, you won't… since the third 1 must be exactly the same distance past the second. (O(log(n)))
  • It O(1) to find the last 1 since you know the index it must be at by the time you find the first and second.

So, you have n, log(n), and 1… O(nlogn)

Edit: Oops, my bad. My brain had it set that n/2 was logn… which it obviously isn't (doubling the number on items still doubles the number of iterations on the inner loop). This is still at n^2, not solving the problem. Well, at least I got to write some code 🙂


Implementation in Tcl

 proc get-triplet {input} { for {set first 0} {$first < [string length $input]-2} {incr first} { if {[string index $input $first] != 1} { continue } set start [expr {$first + 1}] set end [expr {1+ $first + (([string length $input] - $first) /2)}] for {set second $start} {$second < $end} {incr second} { if {[string index $input $second] != 1} { continue } set last [expr {($second - $first) + $second}] if {[string index $input $last] == 1} { return [list $first $second $last] } } } return {} } get-triplet 10101 ;# 0 2 4 get-triplet 10111 ;# 0 2 4 get-triplet 11100000 ;# 0 1 2 get-triplet 0100100100 ;# 1 4 7 

I think I have found a way of solving the problem, but I can't construct a formal proof. The solution I made is written in Java, and it uses a counter 'n' to count how many list/array accesses it does. So n should be less than or equal to stringLength*log(stringLength) if it is correct. I tried it for the numbers 0 to 2^22, and it works.

It starts by iterating over the input string and making a list of all the indexes which hold a one. This is just O(n).

Then from the list of indexes it picks a firstIndex, and a secondIndex which is greater than the first. These two indexes must hold ones, because they are in the list of indexes. From there the thirdIndex can be calculated. If the inputString[thirdIndex] is a 1 then it halts.

 public static int testString(String input){ //n is the number of array/list accesses in the algorithm int n=0; //Put the indices of all the ones into a list, O(n) ArrayList<Integer> ones = new ArrayList<Integer>(); for(int i=0;i<input.length();i++){ if(input.charAt(i)=='1'){ ones.add(i); } } //If less than three ones in list, just stop if(ones.size()<3){ return n; } int firstIndex, secondIndex, thirdIndex; for(int x=0;x<ones.size()-2;x++){ n++; firstIndex = ones.get(x); for(int y=x+1; y<ones.size()-1; y++){ n++; secondIndex = ones.get(y); thirdIndex = secondIndex*2 - firstIndex; if(thirdIndex >= input.length()){ break; } n++; if(input.charAt(thirdIndex) == '1'){ //This case is satisfied if it has found three evenly spaced ones //System.out.println("This one => " + input); return n; } } } return n; 

}

additional note: the counter n is not incremented when it iterates over the input string to construct the list of indexes. This operation is O(n), so it won't have an effect on the algorithm complexity anyway.

One inroad into the problem is to think of factors and shifting.

With shifting, you compare the string of ones and zeroes with a shifted version of itself. You then take matching ones. Take this example shifted by two:

 1010101010 1010101010 ------------ 001010101000 

The resulting 1's (bitwise ANDed), must represent all those 1's which are evenly spaced by two. The same example shifted by three:

 1010101010 1010101010 ------------- 0000000000000 

In this case there are no 1's which are evenly spaced three apart.

So what does this tell you? Well that you only need to test shifts which are prime numbers. For example say you have two 1's which are six apart. You would only have to test 'two' shifts and 'three' shifts (since these divide six). 例如:

 10000010 10000010 (Shift by two) 10000010 10000010 (We have a match) 10000010 10000010 (Shift by three) 10000010 (We have a match) 

So the only shifts you ever need to check are 2,3,5,7,11,13 etc. Up to the prime closest to the square root of size of the string of digits.

Nearly solved?

I think I am closer to a solution. 基本上:

  1. Scan the string for 1's. For each 1 note it's remainder after taking a modulus of its position. The modulus ranges from 1 to half the size of the string. This is because the largest possible separation size is half the string. This is done in O(n^2). 但。 Only prime moduli need be checked so O(n^2/log(n))
  2. Sort the list of modulus/remainders in order largest modulus first, this can be done in O(n*log(n)) time.
  3. Look for three consecutive moduli/remainders which are the same.
  4. Somehow retrieve the position of the ones!

I think the biggest clue to the answer, is that the fastest sort algorithms, are O(n*log(n)).

错误

Step 1 is wrong as pointed out by a colleague. If we have 1's at position 2,12 and 102. Then taking a modulus of 10, they would all have the same remainders, and yet are not equally spaced apart! 抱歉。

Here are some thoughts that, despite my best efforts, will not seem to wrap themselves up in a bow. Still, they might be a useful starting point for someone's analysis.

Consider the proposed solution as follows, which is the approach that several folks have suggested, including myself in a prior version of this answer. :)

  1. Trim leading and trailing zeroes.
  2. Scan the string looking for 1's.
  3. When a 1 is found:
    1. Assume that it is the middle 1 of the solution.
    2. For each prior 1, use its saved position to compute the anticipated position of the final 1.
    3. If the computed position is after the end of the string it cannot be part of the solution, so drop the position from the list of candidates.
    4. Check the solution.
  4. If the solution was not found, add the current 1 to the list of candidates.
  5. Repeat until no more 1's are found.

Now consider input strings strings like the following, which will not have a solution:

 101 101001 1010010001 101001000100001 101001000100001000001 

In general, this is the concatenation of k strings of the form j 0's followed by a 1 for j from zero to k-1.

 k=2 101 k=3 101001 k=4 1010010001 k=5 101001000100001 k=6 101001000100001000001 

Note that the lengths of the substrings are 1, 2, 3, etc. So, problem size n has substrings of lengths 1 to k such that n = k(k+1)/2.

 k=2 n= 3 101 k=3 n= 6 101001 k=4 n=10 1010010001 k=5 n=15 101001000100001 k=6 n=21 101001000100001000001 

Note that k also tracks the number of 1's that we have to consider. Remember that every time we see a 1, we need to consider all the 1's seen so far. So when we see the second 1, we only consider the first, when we see the third 1, we reconsider the first two, when we see the fourth 1, we need to reconsider the first three, and so on. By the end of the algorithm, we've considered k(k-1)/2 pairs of 1's. Call that p.

 k=2 n= 3 p= 1 101 k=3 n= 6 p= 3 101001 k=4 n=10 p= 6 1010010001 k=5 n=15 p=10 101001000100001 k=6 n=21 p=15 101001000100001000001 

The relationship between n and p is that n = p + k.

The process of going through the string takes O(n) time. Each time a 1 is encountered, a maximum of (k-1) comparisons are done. Since n = k(k+1)/2, n > k**2, so sqrt(n) > k. This gives us O(n sqrt(n)) or O(n**3/2). Note however that may not be a really tight bound, because the number of comparisons goes from 1 to a maximum of k, it isn't k the whole time. But I'm not sure how to account for that in the math.

It still isn't O(n log(n)). Also, I can't prove those inputs are the worst cases, although I suspect they are. I think a denser packing of 1's to the front results in an even sparser packing at the end.

Since someone may still find it useful, here's my code for that solution in Perl:

 #!/usr/bin/perl # read input as first argument my $s = $ARGV[0]; # validate the input $s =~ /^[01]+$/ or die "invalid input string\n"; # strip leading and trailing 0's $s =~ s/^0+//; $s =~ s/0+$//; # prime the position list with the first '1' at position 0 my @p = (0); # start at position 1, which is the second character my $i = 1; print "the string is $s\n\n"; while ($i < length($s)) { if (substr($s, $i, 1) eq '1') { print "found '1' at position $i\n"; my @t = (); # assuming this is the middle '1', go through the positions # of all the prior '1's and check whether there's another '1' # in the correct position after this '1' to make a solution while (scalar @p) { # $p is the position of the prior '1' my $p = shift @p; # $j is the corresponding position for the following '1' my $j = 2 * $i - $p; # if $j is off the end of the string then we don't need to # check $p anymore next if ($j >= length($s)); print "checking positions $p, $i, $j\n"; if (substr($s, $j, 1) eq '1') { print "\nsolution found at positions $p, $i, $j\n"; exit 0; } # if $j isn't off the end of the string, keep $p for next time push @t, $p; } @p = @t; # add this '1' to the list of '1' positions push @p, $i; } $i++; } print "\nno solution found\n"; 

While scanning 1s, add their positions to a List. When adding the second and successive 1s, compare them to each position in the list so far. Spacing equals currentOne (center) – previousOne (left). The right-side bit is currentOne + spacing. If it's 1, the end.

The list of ones grows inversely with the space between them. Simply stated, if you've got a lot of 0s between the 1s (as in a worst case), your list of known 1s will grow quite slowly.

 using System; using System.Collections.Generic; namespace spacedOnes { class Program { static int[] _bits = new int[8] {128, 64, 32, 16, 8, 4, 2, 1}; static void Main(string[] args) { var bytes = new byte[4]; var r = new Random(); r.NextBytes(bytes); foreach (var b in bytes) { Console.Write(getByteString(b)); } Console.WriteLine(); var bitCount = bytes.Length * 8; var done = false; var onePositions = new List<int>(); for (var i = 0; i < bitCount; i++) { if (isOne(bytes, i)) { if (onePositions.Count > 0) { foreach (var knownOne in onePositions) { var spacing = i - knownOne; var k = i + spacing; if (k < bitCount && isOne(bytes, k)) { Console.WriteLine("^".PadLeft(knownOne + 1) + "^".PadLeft(spacing) + "^".PadLeft(spacing)); done = true; break; } } } if (done) { break; } onePositions.Add(i); } } Console.ReadKey(); } static String getByteString(byte b) { var s = new char[8]; for (var i=0; i<s.Length; i++) { s[i] = ((b & _bits[i]) > 0 ? '1' : '0'); } return new String(s); } static bool isOne(byte[] bytes, int i) { var byteIndex = i / 8; var bitIndex = i % 8; return (bytes[byteIndex] & _bits[bitIndex]) > 0; } } } 

I thought I'd add one comment before posting the 22nd naive solution to the problem. For the naive solution, we don't need to show that the number of 1's in the string is at most O(log(n)), but rather that it is at most O(sqrt(n*log(n)).

Solver:

 def solve(Str): indexes=[] #O(n) setup for i in range(len(Str)): if Str[i]=='1': indexes.append(i) #O((number of 1's)^2) processing for i in range(len(indexes)): for j in range(i+1, len(indexes)): indexDiff = indexes[j] - indexes[i] k=indexes[j] + indexDiff if k<len(Str) and Str[k]=='1': return True return False 

It's basically a fair bit similar to flybywire's idea and implementation, though looking ahead instead of back.

Greedy String Builder:

 #assumes final char hasn't been added, and would be a 1 def lastCharMakesSolvable(Str): endIndex=len(Str) j=endIndex-1 while j-(endIndex-j) >= 0: k=j-(endIndex-j) if k >= 0 and Str[k]=='1' and Str[j]=='1': return True j=j-1 return False def expandString(StartString=''): if lastCharMakesSolvable(StartString): return StartString + '0' return StartString + '1' n=1 BaseStr="" lastCount=0 while n<1000000: BaseStr=expandString(BaseStr) count=BaseStr.count('1') if count != lastCount: print(len(BaseStr), count) lastCount=count n=n+1 

(In my defense, I'm still in the 'learn python' stage of understanding)

Also, potentially useful output from the greedy building of strings, there's a rather consistent jump after hitting a power of 2 in the number of 1's… which I was not willing to wait around to witness hitting 2096.

 strlength # of 1's 1 1 2 2 4 3 5 4 10 5 14 8 28 9 41 16 82 17 122 32 244 33 365 64 730 65 1094 128 2188 129 3281 256 6562 257 9842 512 19684 513 29525 1024 

I'll try to present a mathematical approach. This is more a beginning than an end, so any help, comment, or even contradiction – will be deeply appreciated. However, if this approach is proven – the algorithm is a straight-forward search in the string.

  1. Given a fixed number of spaces k and a string S , the search for a k-spaced-triplet takes O(n) – We simply test for every 0<=i<=(n-2k) if S[i]==S[i+k]==S[i+2k] . The test takes O(1) and we do it nk times where k is a constant, so it takes O(nk)=O(n) .

  2. Let us assume that there is an Inverse Proportion between the number of 1 's and the maximum spaces we need to search for. That is, If there are many 1 's, there must be a triplet and it must be quite dense; If there are only few 1 's, The triplet (if any) can be quite sparse. In other words, I can prove that if I have enough 1 's, such triplet must exist – and the more 1 's I have, a more dense triplet must be found. This can be explained by the Pigeonhole principle – Hope to elaborate on this later.

  3. Say have an upper bound k on the possible number of spaces I have to look for. Now, for each 1 located in S[i] we need to check for 1 in S[i-1] and S[i+1] , S[i-2] and S[i+2] , … S[ik] and S[i+k] . This takes O((k^2-k)/2)=O(k^2) for each 1 in S – due to Gauss' Series Summation Formula . Note that this differs from section 1 – I'm having k as an upper bound for the number of spaces, not as a constant space.

We need to prove O(n*log(n)) . That is, we need to show that k*(number of 1's) is proportional to log(n) .

If we can do that, the algorithm is trivial – for each 1 in S whose index is i , simply look for 1 's from each side up to distance k . If two were found in the same distance, return i and k . Again, the tricky part would be finding k and proving the correctness.

I would really appreciate your comments here – I have been trying to find the relation between k and the number of 1 's on my whiteboard, so far without success.

Assumption:

Just wrong, talking about log(n) number of upper limit of ones

编辑:

Now I found that using Cantor numbers (if correct), density on set is (2/3)^Log_3(n) (what a weird function) and I agree, log(n)/n density is to strong.

If this is upper limit, there is algorhitm who solves this problem in at least O(n*(3/2)^(log(n)/log(3))) time complexity and O((3/2)^(log(n)/log(3))) space complexity. (check Justice's answer for algorhitm)

This is still by far better than O(n^2)

This function ((3/2)^(log(n)/log(3))) really looks like n*log(n) on first sight.

How did I get this formula?

Applaying Cantors number on string.
Supose that length of string is 3^p == n
At each step in generation of Cantor string you keep 2/3 of prevous number of ones. Apply this p times.

That mean (n * ((2/3)^p)) -> (((3^p)) * ((2/3)^p)) remaining ones and after simplification 2^p. This mean 2^p ones in 3^p string -> (3/2)^p ones . Substitute p=log(n)/log(3) and get
((3/2)^(log(n)/log(3)))

How about a simple O(n) solution, with O(n^2) space? (Uses the assumption that all bitwise operators work in O(1).)

The algorithm basically works in four stages:

Stage 1: For each bit in your original number, find out how far away the ones are, but consider only one direction. (I considered all the bits in the direction of the least significant bit.)

Stage 2: Reverse the order of the bits in the input;

Stage 3: Re-run step 1 on the reversed input.

Stage 4: Compare the results from Stage 1 and Stage 3. If any bits are equally spaced above AND below we must have a hit.

Keep in mind that no step in the above algorithm takes longer than O(n). ^ _ ^

As an added benefit, this algorithm will find ALL equally spaced ones from EVERY number. So for example if you get a result of "0x0005" then there are equally spaced ones at BOTH 1 and 3 units away

I didn't really try optimizing the code below, but it is compilable C# code that seems to work.

 using System; namespace ThreeNumbers { class Program { const int uint32Length = 32; static void Main(string[] args) { Console.Write("Please enter your integer: "); uint input = UInt32.Parse(Console.ReadLine()); uint[] distancesLower = Distances(input); uint[] distancesHigher = Distances(Reverse(input)); PrintHits(input, distancesLower, distancesHigher); } /// <summary> /// Returns an array showing how far the ones away from each bit in the input. Only /// considers ones at lower signifcant bits. Index 0 represents the least significant bit /// in the input. Index 1 represents the second least significant bit in the input and so /// on. If a one is 3 away from the bit in question, then the third least significant bit /// of the value will be sit. /// /// As programed this algorithm needs: O(n) time, and O(n*log(n)) space. /// (Where n is the number of bits in the input.) /// </summary> public static uint[] Distances(uint input) { uint[] distanceToOnes = new uint[uint32Length]; uint result = 0; //Sets how far each bit is from other ones. Going in the direction of LSB to MSB for (uint bitIndex = 1, arrayIndex = 0; bitIndex != 0; bitIndex <<= 1, ++arrayIndex) { distanceToOnes[arrayIndex] = result; result <<= 1; if ((input & bitIndex) != 0) { result |= 1; } } return distanceToOnes; } /// <summary> /// Reverses the bits in the input. /// /// As programmed this algorithm needs O(n) time and O(n) space. /// (Where n is the number of bits in the input.) /// </summary> /// <param name="input"></param> /// <returns></returns> public static uint Reverse(uint input) { uint reversedInput = 0; for (uint bitIndex = 1; bitIndex != 0; bitIndex <<= 1) { reversedInput <<= 1; reversedInput |= (uint)((input & bitIndex) != 0 ? 1 : 0); } return reversedInput; } /// <summary> /// Goes through each bit in the input, to check if there are any bits equally far away in /// the distancesLower and distancesHigher /// </summary> public static void PrintHits(uint input, uint[] distancesLower, uint[] distancesHigher) { const int offset = uint32Length - 1; for (uint bitIndex = 1, arrayIndex = 0; bitIndex != 0; bitIndex <<= 1, ++arrayIndex) { //hits checks if any bits are equally spaced away from our current value bool isBitSet = (input & bitIndex) != 0; uint hits = distancesLower[arrayIndex] & distancesHigher[offset - arrayIndex]; if (isBitSet && (hits != 0)) { Console.WriteLine(String.Format("The {0}-th LSB has hits 0x{1:x4} away", arrayIndex + 1, hits)); } } } } } 

Someone will probably comment that for any sufficiently large number, bitwise operations cannot be done in O(1). You'd be right. However, I'd conjecture that every solution that uses addition, subtraction, multiplication, or division (which cannot be done by shifting) would also have that problem.

Below is a solution. There could be some little mistakes here and there, but the idea is sound.

Edit: It's not n * log(n)

PSEUDO CODE:

 foreach character in the string if the character equals 1 { if length cache > 0 { //we can skip the first one foreach location in the cache { //last in first out kind of order if ((currentlocation + (currentlocation - location)) < length string) if (string[(currentlocation + (currentlocation - location))] equals 1) return found evenly spaced string else break; } } remember the location of this character in a some sort of cache. } return didn't find evenly spaced string 

C# code:

 public static Boolean FindThreeEvenlySpacedOnes(String str) { List<int> cache = new List<int>(); for (var x = 0; x < str.Length; x++) { if (str[x] == '1') { if (cache.Count > 0) { for (var i = cache.Count - 1; i > 0; i--) { if ((x + (x - cache[i])) >= str.Length) break; if (str[(x + (x - cache[i]))] == '1') return true; } } cache.Add(x); } } return false; } 

怎么运行的:

 iteration 1: x | 101101001 // the location of this 1 is stored in the cache iteration 2: x | 101101001 iteration 3: axb | | | 101101001 //we retrieve location a out of the cache and then based on a //we calculate b and check if te string contains a 1 on location b //and of course we store x in the cache because it's a 1 iteration 4: axb 

101101001 axb | | | 101101001 iteration 5: x | 101101001 iteration 6: axb | | | 101101001 axb | | | 101101001 //return found evenly spaced string

Obviously we need to at least check bunches of triplets at the same time, so we need to compress the checks somehow. I have a candidate algorithm, but analyzing the time complexity is beyond my ability*time threshold.

Build a tree where each node has three children and each node contains the total number of 1's at its leaves. Build a linked list over the 1's, as well. Assign each node an allowed cost proportional to the range it covers. As long as the time we spend at each node is within budget, we'll have an O(n lg n) algorithm.

Start at the root. If the square of the total number of 1's below it is less than its allowed cost, apply the naive algorithm. Otherwise recurse on its children.

Now we have either returned within budget, or we know that there are no valid triplets entirely contained within one of the children. Therefore we must check the inter-node triplets.

Now things get incredibly messy. We essentially want to recurse on the potential sets of children while limiting the range. As soon as the range is constrained enough that the naive algorithm will run under budget, you do it. Enjoy implementing this, because I guarantee it will be tedious. There's like a dozen cases.

The reason I think that algorithm will work is because the sequences without valid triplets appear to go alternate between bunches of 1's and lots of 0's. It effectively splits the nearby search space, and the tree emulates that splitting.

The run time of the algorithm is not obvious, at all. It relies on the non-trivial properties of the sequence. If the 1's are really sparse then the naive algorithm will work under budget. If the 1's are dense, then a match should be found right away. But if the density is 'just right' (eg. near ~n^0.63, which you can achieve by setting all bits at positions with no '2' digit in base 3), I don't know if it will work. You would have to prove that the splitting effect is strong enough.

No theoretical answer here, but I wrote a quick Java program to explore the running-time behavior as a function of k and n, where n is the total bit length and k is the number of 1's. I'm with a few of the answerers who are saying that the "regular" algorithm that checks all the pairs of bit positions and looks for the 3rd bit, even though it would require O(k^2) in the worst case, in reality because the worst-case needs sparse bitstrings, is O(n ln n).

Anyway here's the program, below. It's a Monte-Carlo style program which runs a large number of trials NTRIALS for constant n, and randomly generates bitsets for a range of k-values using Bernoulli processes with ones-density constrained between limits that can be specified, and records the running time of finding or failing to find a triplet of evenly spaced ones, time measured in steps NOT in CPU time. I ran it for n=64, 256, 1024, 4096, 16384* (still running), first a test run with 500000 trials to see which k-values take the longest running time, then another test with 5000000 trials with narrowed ones-density focus to see what those values look like. The longest running times do happen with very sparse density (eg for n=4096 the running time peaks are in the k=16-64 range, with a gentle peak for mean runtime at 4212 steps @ k=31, max runtime peaked at 5101 steps @ k=58). It looks like it would take extremely large values of N for the worst-case O(k^2) step to become larger than the O(n) step where you scan the bitstring to find the 1's position indices.

 package com.example.math; import java.io.PrintStream; import java.util.BitSet; import java.util.Random; public class EvenlySpacedOnesTest { static public class StatisticalSummary { private int n=0; private double min=Double.POSITIVE_INFINITY; private double max=Double.NEGATIVE_INFINITY; private double mean=0; private double S=0; public StatisticalSummary() {} public void add(double x) { min = Math.min(min, x); max = Math.max(max, x); ++n; double newMean = mean + (x-mean)/n; S += (x-newMean)*(x-mean); // this algorithm for mean,std dev based on Knuth TAOCP vol 2 mean = newMean; } public double getMax() { return (n>0)?max:Double.NaN; } public double getMin() { return (n>0)?min:Double.NaN; } public int getCount() { return n; } public double getMean() { return (n>0)?mean:Double.NaN; } public double getStdDev() { return (n>0)?Math.sqrt(S/n):Double.NaN; } // some may quibble and use n-1 for sample std dev vs population std dev public static void printOut(PrintStream ps, StatisticalSummary[] statistics) { for (int i = 0; i < statistics.length; ++i) { StatisticalSummary summary = statistics[i]; ps.printf("%d\t%d\t%.0f\t%.0f\t%.5f\t%.5f\n", i, summary.getCount(), summary.getMin(), summary.getMax(), summary.getMean(), summary.getStdDev()); } } } public interface RandomBernoulliProcess // see http://en.wikipedia.org/wiki/Bernoulli_process { public void setProbability(double d); public boolean getNextBoolean(); } static public class Bernoulli implements RandomBernoulliProcess { final private Random r = new Random(); private double p = 0.5; public boolean getNextBoolean() { return r.nextDouble() < p; } public void setProbability(double d) { p = d; } } static public class TestResult { final public int k; final public int nsteps; public TestResult(int k, int nsteps) { this.k=k; this.nsteps=nsteps; } } //////////// final private int n; final private int ntrials; final private double pmin; final private double pmax; final private Random random = new Random(); final private Bernoulli bernoulli = new Bernoulli(); final private BitSet bits; public EvenlySpacedOnesTest(int n, int ntrials, double pmin, double pmax) { this.n=n; this.ntrials=ntrials; this.pmin=pmin; this.pmax=pmax; this.bits = new BitSet(n); } /* * generate random bit string */ private int generateBits() { int k = 0; // # of 1's for (int i = 0; i < n; ++i) { boolean b = bernoulli.getNextBoolean(); this.bits.set(i, b); if (b) ++k; } return k; } private int findEvenlySpacedOnes(int k, int[] pos) { int[] bitPosition = new int[k]; for (int i = 0, j = 0; i < n; ++i) { if (this.bits.get(i)) { bitPosition[j++] = i; } } int nsteps = n; // first, it takes N operations to find the bit positions. boolean found = false; if (k >= 3) // don't bother doing anything if there are less than 3 ones. :( { int lastBitSetPosition = bitPosition[k-1]; for (int j1 = 0; !found && j1 < k; ++j1) { pos[0] = bitPosition[j1]; for (int j2 = j1+1; !found && j2 < k; ++j2) { pos[1] = bitPosition[j2]; ++nsteps; pos[2] = 2*pos[1]-pos[0]; // calculate 3rd bit index that might be set; // the other two indices point to bits that are set if (pos[2] > lastBitSetPosition) break; // loop inner loop until we go out of bounds found = this.bits.get(pos[2]); // we're done if we find a third 1! } } } if (!found) pos[0]=-1; return nsteps; } /* * run an algorithm that finds evenly spaced ones and returns # of steps. */ public TestResult run() { bernoulli.setProbability(pmin + (pmax-pmin)*random.nextDouble()); // probability of bernoulli process is randomly distributed between pmin and pmax // generate bit string. int k = generateBits(); int[] pos = new int[3]; int nsteps = findEvenlySpacedOnes(k, pos); return new TestResult(k, nsteps); } public static void main(String[] args) { int n; int ntrials; double pmin = 0, pmax = 1; try { n = Integer.parseInt(args[0]); ntrials = Integer.parseInt(args[1]); if (args.length >= 3) pmin = Double.parseDouble(args[2]); if (args.length >= 4) pmax = Double.parseDouble(args[3]); } catch (Exception e) { System.out.println("usage: EvenlySpacedOnesTest N NTRIALS [pmin [pmax]]"); System.exit(0); return; // make the compiler happy } final StatisticalSummary[] statistics; statistics=new StatisticalSummary[n+1]; for (int i = 0; i <= n; ++i) { statistics[i] = new StatisticalSummary(); } EvenlySpacedOnesTest test = new EvenlySpacedOnesTest(n, ntrials, pmin, pmax); int printInterval=100000; int nextPrint = printInterval; for (int i = 0; i < ntrials; ++i) { TestResult result = test.run(); statistics[result.k].add(result.nsteps); if (i == nextPrint) { System.err.println(i); nextPrint += printInterval; } } StatisticalSummary.printOut(System.out, statistics); } } 
 # <algorithm> def contains_evenly_spaced?(input) return false if input.size < 3 one_indices = [] input.each_with_index do |digit, index| next if digit == 0 one_indices << index end return false if one_indices.size < 3 previous_indexes = [] one_indices.each do |index| if !previous_indexes.empty? previous_indexes.each do |previous_index| multiple = index - previous_index success_index = index + multiple return true if input[success_index] == 1 end end previous_indexes << index end return false end # </algorithm> def parse_input(input) input.chars.map { |c| c.to_i } end 

I'm having trouble with the worst-case scenarios with millions of digits. Fuzzing from /dev/urandom essentially gives you O(n), but I know the worst case is worse than that. I just can't tell how much worse. For small n , it's trivial to find inputs at around 3*n*log(n) , but it's surprisingly hard to differentiate those from some other order of growth for this particular problem.

Can anyone who was working on worst-case inputs generate a string with length greater than say, one hundred thousand?

An adaptation of the Rabin-Karp algorithm could be possible for you. Its complexity is 0(n) so it could help you.

Take a look http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm

Could this be a solution? I', not sure if it's O(nlogn) but in my opinion it's better than O(n²) because the the only way not to find a triple would be a prime number distribution.

There's room for improvement, the second found 1 could be the next first 1. Also no error checking.

 #include <iostream> #include <string> int findIt(std::string toCheck) { for (int i=0; i<toCheck.length(); i++) { if (toCheck[i]=='1') { std::cout << i << ": " << toCheck[i]; for (int j = i+1; j<toCheck.length(); j++) { if (toCheck[j]=='1' && toCheck[(i+2*(ji))] == '1') { std::cout << ", " << j << ":" << toCheck[j] << ", " << (i+2*(ji)) << ":" << toCheck[(i+2*(ji))] << " found" << std::endl; return 0; } } } } return -1; } int main (int agrc, char* args[]) { std::string toCheck("1001011"); findIt(toCheck); std::cin.get(); return 0; } 

I think this algorithm has O(n log n) complexity (C++, DevStudio 2k5). Now, I don't know the details of how to analyse an algorithm to determine its complexity, so I have added some metric gathering information to the code. The code counts the number of tests done on the sequence of 1's and 0's for any given input (hopefully, I've not made a balls of the algorithm). We can compare the actual number of tests against the O value and see if there's a correlation.

 #include <iostream> using namespace std; bool HasEvenBits (string &sequence, int &num_compares) { bool has_even_bits = false; num_compares = 0; for (unsigned i = 1 ; i <= (sequence.length () - 1) / 2 ; ++i) { for (unsigned j = 0 ; j < sequence.length () - 2 * i ; ++j) { ++num_compares; if (sequence [j] == '1' && sequence [j + i] == '1' && sequence [j + i * 2] == '1') { has_even_bits = true; // we could 'break' here, but I want to know the worst case scenario so keep going to the end } } } return has_even_bits; } int main () { int count; string input = "111"; for (int i = 3 ; i < 32 ; ++i) { HasEvenBits (input, count); cout << i << ", " << count << endl; input += "0"; } } 

This program outputs the number of tests for each string length up to 32 characters. Here's the results:

  n Tests n log (n) ===================== 3 1 1.43 4 2 2.41 5 4 3.49 6 6 4.67 7 9 5.92 8 12 7.22 9 16 8.59 10 20 10.00 11 25 11.46 12 30 12.95 13 36 14.48 14 42 16.05 15 49 17.64 16 56 19.27 17 64 20.92 18 72 22.59 19 81 24.30 20 90 26.02 21 100 27.77 22 110 29.53 23 121 31.32 24 132 33.13 25 144 34.95 26 156 36.79 27 169 38.65 28 182 40.52 29 196 42.41 30 210 44.31 31 225 46.23 

I've added the 'n log n' values as well. Plot these using your graphing tool of choice to see a correlation between the two results. Does this analysis extend to all values of n? 我不知道。