如何有效地从一堆袜子配对?

昨天我是从干净的洗衣店袜子配对,并找出我做的方式不是很有效。 我正在做一个天真的search – 挑一只袜子,“迭代”一堆,以find它的一对。 这要求平均迭代n / 2 * n / 4 = n 2/8袜子。

作为一名计算机科学家,我在想我该怎么办? sorting(根据大小/颜色/ …)当然想到实现O(NlogN)的解决scheme。

哈希或其他非原地解决scheme不是一种select,因为我无法复制我的袜子(尽pipe如果我可以很好)。

所以,这个问题基本上是:

给定一堆n对袜子,包含2n元素(假设每个袜子只有一对配对),那么将它们有效配对到对数额外空间的最佳方法是什么? (如果需要的话,我相信我可以记住这个数量的信息。)

我将非常感谢一个解决以下问题的答案:

  • 大量袜子的一般理论解决scheme。
  • 袜子的实际数量并不大,我不相信我的配偶和我有30多双。 (把袜子和她的袜子区分开来是相当容易的,这个也可以使用吗?)
  • 这是否相当于元素清晰度问题 ?

已经提出了sorting解决scheme,但sorting有点太多了 :我们不需要订单; 我们只需要平等组织

所以哈希将是足够的(和更快)。

  1. 对于袜子的每种颜色, 形成一堆 。 迭代您的input篮中的所有袜子, 并将它们分配到彩色桩上
  2. 遍历每一堆, 并通过一些其他度量 (例如模式) 分配到第二组桩
  3. recursion地应用这个scheme,直到你把所有的袜子分布到很小的一堆,你可以直观地进行处理

这种recursion散列分区实际上是由SQL Server在需要对大数据集进行散列连接或散列汇总时完成的。 它将构buildinputstream分发到多个独立的分区。 该scheme线性扩展到任意数量的数据和多个CPU。

如果可以find一个分配键(散列键),则不需要recursion分区,散列键提供的桶足够小,足以快速处理足够的桶 。 不幸的是,我不认为袜子有这样的财产。

如果每个袜子都有一个称为“PairID”的整数,则可以根据PairID % 10 (最后一个数字)轻松地将它们分配到10个桶中。

我能想到的最好的现实世界划分是创build一个长方形的堆 :一个是颜色,另一个是模式。 为什么是矩形? 因为我们需要O(1)随机访问桩。 (一个三维立方体也可以工作,但这不是很实际。)


更新:

平行度呢? 多个人可以更快地匹配袜子吗?

  1. 最简单的平行化策略是让多名工人从input篮子中取出,并将袜子放在堆上。 这只有这么大的膨胀 – 想象一下,有100多人在打斗​​10桩。 同步成本 (performance为手碰撞和人际交stream) 破坏了效率和速度 (见“ 通用可伸缩性法则” )。 这是否容易出现死锁 ? 不,因为每个工人一次只需要访问一个堆。 只有一个“锁”,就不会有死锁。 活锁可能是可能的,这取决于人类如何协调对桩的访问。 他们可能只是使用像网卡那样的随机退避来确定什么卡可以专门访问networking线。 如果它适用于网卡 ,它也适用于人类。
  2. 如果每个工人都有自己的一堆,它几乎无限期地缩放。 然后,工人们可以从input篮中取出大块的袜子(很less有争用,因为他们很less这样做),而且根本不需要在分配袜子时进行同步(因为它们具有线本地桩)。 最后,所有的工人都需要把他们的桩子结合起来。 如果工人组成一个聚合树,我相信可以在O(log(工人数量*每个工人的堆数))中完成。

元素清晰度问题呢? 正如文章所述,元素的显着性问题可以在O(N)解决。 这对于袜子问题也是一样的(也是O(N) ,如果你只需要一个分配步骤(我提出了多个步骤,因为人类计算不好 – 如果你分布在md5(color, length, pattern, ...)上,一步就足够了md5(color, length, pattern, ...) ,即所有属性的完美散列 ))。

显然,我们不能比O(N)快,所以我们已经达到了最优的下界

虽然输出不完全相同(在一种情况下,只是一个布尔值,而在另一种情况下,袜子对),渐近复杂性是相同的。

由于人脑的架构与现代CPU完全不同,这个问题没有实际意义。

人类可以赢得CPUalgorithm,使用“find匹配对”这个事实可以是一个不太大的集合。

我的algorithm:

 spread_all_socks_on_flat_surface(); while (socks_left_on_a_surface()) { // Thanks to human visual SIMD, this is one, quick operation. pair = notice_any_matching_pair(); remove_socks_pair_from_surface(pair); } 

至less这是我在现实生活中使用的,我觉得它非常有效。 缺点是它需要一个平坦的表面,但通常是丰富的。

案例1 :所有的袜子都是相同的(这就是我在现实生活中所做的)。

挑他们中的任何两个做一对。 时间不变

情况2 :组合的数量不变(所有权,颜色,大小,纹理等)。

使用基数sorting 。 这只是线性时间,因为不需要比较。

情况3 :组合的数量不是预先知道的(一般情况)。

我们必须做比较来检查两只袜子是否配对。 selectO(n log n)基于比较的sortingalgorithm之一。

然而,在现实生活中,当袜子的数量相对较less(常量)时,这些理论上最优的algorithm将不能很好地工作。 顺序search可能需要更多的时间,理论上这需要二次时间。

非algorithm的答案,但是当我这样做时“高效”:

  • 步骤1)丢弃所有现有的袜子

  • 步骤2)去沃尔玛购买10-n包白色和黑色包。 在日常生活中不需要其他颜色。

有时候,我必须再次这样做(袜子损坏,袜子损坏等),而且我讨厌经常丢弃完美的袜子(我希望他们一直在销售同样的袜子参考!),所以我最近一个不同的方法。

algorithm答案:

考虑一下,如果你像第二叠袜子那样只抽一只袜子,那么你在天真的search中find匹配的袜子的几率是很低的。

  • 所以随便拿起其中的五个,记住它们的形状或长度。

为什么五个? 通常人类是善于记忆工作记忆中五到七个不同的元素 – 有点像RPN堆栈的人类等价物 – 五是安全的默认值。

  • 从2n-5的堆栈拿起一个。

  • 现在,如果你找不到一个匹配(在视觉模式匹配 – 人类擅长于一个小堆栈),如果你找不到一个匹配,那么把它加到你的五个匹配中。

  • 随机挑选袜子,并与5 + 1袜子进行比较。 随着你的筹码增长,它会降低你的performance,但提高你的几率。 快多了。

随意写下公式来计算你需要抽取多less个样本才能获得50%的比赛胜率。 IIRC这是一个超几何定律。

我每天早上都会这样做,很less需要三次以上的抽签 – 但是我也有类似的一对(10左右,给予或丢失)米色白袜子。 现在你可以估计我的股票的大小:-)

顺便说一下,我发现,每次我需要一双袜子时,将所有袜子分类的交易成本总和远远低于一次袜子的绑定。 准时工作会更好,因为那样你就不必束缚袜子了,边际收益也就越来越小(也就是说,你一直在寻找那两到三个袜子,当你洗衣服的时候,你需要完成匹配你的袜子,你失去了时间)。

我所做的是拿起第一只袜子放下(比如在洗衣盆的边缘)。 然后我拿起另一只袜子,看看是否和第一只袜子一样。 如果是,我把它们都删除。 如果不是,我把它放在第一只袜子旁边。 然后,我拿起第三只袜子,并将其与前两个(如果他们仍然在那里)进行比较。 等等。

假设“去除”袜子是一种select,这种方法可以很容易地在一个数组中实现。 其实,你甚至不需要“移除”袜子。 如果你不需要对袜子进行sorting(见下面),那么你可以移动它们,最后得到一个数组,它将所有的袜子成对排列在arrays中。

假设袜子的唯一操作是比较等式,这个algorithm基本上仍然是一个n 2algorithm,虽然我不知道平均情况(从来没有学会计算)。

sorting,当然会提高效率,特别是在现实生活中,您可以轻松地将袜子插入另外两只袜子之间。 在计算相同可以通过一棵树来实现,但这是额外的空间。 当然,我们又回到了NlogN(或者更多一点,如果有几个袜子按照sorting标准是相同的,但不是来自同一对)。

除此之外,我想不出任何东西,但是这种方法在现实生活中似乎效率很高。 🙂

理论上的限制是O(n),因为你需要触摸每只袜子(除非有些已经配对)。

你可以用基数sorting来实现O(n)。 你只需要为桶select一些属性。

  1. 首先你可以select(她的,我的) – 分成两堆,
  2. 然后使用颜色(可以有颜色的任何顺序,例如按颜色名称按字母顺序排列) – 按颜色将它们分成堆(记住为同一堆中的所有袜子保留第1步的初始顺序),
  3. 那么袜子的长度,
  4. 然后纹理,…. –

如果你可以select有限数量的属性,但是有足够的属性可以唯一地标识每一对,那么你应该在O(k * n)中完成,如果我们可以认为k是有限的,那么就是O(n)。

这是问错的问题。 要问的正确的问题是,为什么我花时间sorting袜子? 当你按你select的X货币单位计算你的空闲时间时,每年多less钱?

更多的时候,这不仅仅是空闲时间, 早晨的空闲时间,你可以在床上消费,喝咖啡,或者早点离开,而不会被交通堵塞。

退后一步往往是好事,并想办法解决这个问题。

有一种方法!

找一个你喜欢的袜子。 综合考虑所有相关特征:不同照明条件下的颜色,整体质量和耐久性,不同气候条件下的舒适度以及气味吸收。 同样重要的是,他们不应该失去储存弹性,所以天然面料是好的,他们应该在塑料包装。

如果左脚和右脚袜子没有区别,那就更好了,但这并不重要。 如果袜子是左右对称的,find一对是O(1)操作,sorting袜子是近似的O(M)操作,其中M是你的房子,你已经散布袜子的地方的数量,理想情况下一些小常数。

如果你select了一个有不同左右袜子的花式对,那么对左右脚桶进行一个完整的桶sorting需要O(N + M),其中N是袜子的数量,M与上面相同。 其他人可以给出寻找第一对的平均迭代的公式,但是find一个盲search的对的最坏情况是N / 2 + 1,这对于合理的N变成天文学上不太可能的情况。这可以通过使用高级图像识别algorithm和启发式,用Mk1眼球扫描一堆未sorting的袜子。

因此,一个实现O(1)袜子配对效率的algorithm(假设对称的袜子)是:

  1. 你需要估计你有多less袜子需要你的余生,或者直到你退休,搬到温暖的气候,再也不需要穿袜子了。 如果你还年轻,还可以估计在我们所有的家庭都有机器人装袜机之前需要多长时间,整个问题就变得无关紧要了。

  2. 您需要了解如何订购您select的散装袜子,花费多less,是否交货。

  3. 订购袜子!

  4. 摆脱你的旧袜子。

另一个步骤3将涉及比较多年来一次购买同样数量的更便宜的袜子的成本,并增加分类袜子的成本,但是要听取我的意见:批量购买更便宜! 另外,袜子的储存价格也会随着袜子价格的上涨而上涨,这比你在许多投资上的收益要高。 然后又有存储成本,但袜子真的没有在衣柜顶部的架子上占用太多的空间。

问题解决了。 所以,只要得到新的袜子,扔掉/捐赠你的旧衣服,并且在知道自己每天都在节省金钱和时间之后,过上幸福的生活。

作为实际的解决scheme:

  1. 迅速做成一堆容易区分的袜子。 (用颜色来说)
  2. 快速排出每一堆,并使用袜子的长度进行比较。 作为一个人,你可以做出一个相当快的决定,用哪个袜子来划分,避免最坏的情况。 (你可以同时看到多个袜子,这对你有利!)
  3. 停止分拣时,他们到达一个门槛,你可以立即find点对和无可比拟的袜子

如果你有1000只袜子,有8种颜色和平均分布,你可以在每次125个袜子中做4根。 有5袜子的门槛,你可以分6堆每堆。 (计数2秒,把一个袜子扔在右边的一堆,它会带你4小时以下。)

如果你只有60只袜子,3种颜色和2种袜子(你的/你的妻子),你可以在一次运行中对每一堆10只袜子进行分类(同样阈值= 5)。 (计数2秒钟需要2分钟)。

最初的桶分类将加快你的过程,因为它将你的n个袜子分成k个桶,所以你只需要做c*n*log(k)工作。 (不考虑门槛)。 所以你要做的就是做n*c*(1 + log(k))工作,其中c是把一只袜子扔到一堆的时间。

与任何c*x*n + O(1)方法相比,这种方法将是有利的,只要log(k) < x - 1


在计算机科学中,这可能是有帮助的:我们收集了n 件东西 ,一个订单(长度)和一个等价关系(额外的信息,例如袜子的颜色)。 等价关系允许我们对原始集合进行划分,在每个等价类中,我们的订单仍然保持不变。 一个事物到它的等价类的映射可以在O(1)中完成,所以只需要O(n)来将每个项目分配给一个类。 现在我们已经使用了额外的信息,可以用任何方式对每个类进行sorting。 好处是数据集已经小得多了。

如果我们有多个等价关系 – >使颜色堆,比纹理上的每个堆分区,比长度sorting,也可以嵌套方法。 任何创build具有大于等于2的大于2的元素的分区的等价关系都会带来速度上的改善(只要我们可以直接将一个sock分配给它的堆),并且在较小的数据集上可以快速地进行sorting。

这个问题实际上是深刻的哲学。 从根本上讲,人们解决问题的能力(我们大脑的“湿件”)是否等同于algorithm可以实现的function。

一个明显的袜子sortingalgorithm是:

 Let N be the set of socks that are still unpaired, initially empty for each sock s taken from the dryer if s matches a sock t in N remove t from N, bundle s and t together, and throw them in the basket else add s to N 

现在这个问题的计算机科学就是关于这个步骤

  1. “如果s与N中的袜子配对”。 我们能够多快“记住”到目前为止所看到的?
  2. “从N中删除t”和“将s添加到N”。 跟踪我们到目前为止所看到的东西有多昂贵?

人类将使用各种策略来实现这些。 人类记忆联想的 ,就像一个哈希表,其中存储值的特征集与相应的值本身配对。 例如,“红色汽车”的概念映射到一个人能够记住的所有红色汽车。 有一个完美的记忆有一个完美的映射。 大多数人在这方面(和大多数人)是不完善的。 联想地图的容量有限。 在各种情况下(一个啤酒太多),映射可能会失去存在,被logging在错误中(即使我们注意到真相已经改变了,“我的名字叫贝蒂,不是内蒂”),汽车“唤起”橙色火鸟“,当我们知道他已经交易了红色卡玛洛)。

在袜子的情况下,完美的召回意味着看着袜子s总是产生其兄弟姐妹的记忆,包括足够的信息(在熨衣板上的位置)以定时定位t 。 具有摄影记忆的人在一定时间内完成1和2。

记忆力不够完美的人可能会根据他的能力追踪大小(爸爸,妈妈,宝宝),颜色(绿色,微红等),模式(菱形,简单等等),使用一些常识等价类。 ,风格(footie,knee-high等)。 所以烫衣板将被分成几类。 这通常允许类别通过记忆定时定位,但是需要通过类别“桶”进行线性search。

Someone with no memory or imagination at all (sorry) will just keep the socks in one pile and do a linear search of the whole pile.

A neat freak might use numeric labels for pairs as someone suggested. This opens the door to a total ordering, which allows the human to use exactly the same algorithms we might with a CPU: binary search, trees, hashes, etc.

So the "best" algorithm depends on the qualities of the wetware/hardware/software that is running it and our willingness to "cheat" by imposing a total order on pairs. Certainly a "best" meta -algorithm is to hire the worlds best sock-sorter: a person or machine that can aquire and quickly store a huge set N of sock attribute sets in a 1-1 associative memory with constant time lookup, insert, and delete. Both people and machines like this can be procured. If you have one, you can pair all the socks in O(N) time for N pairs, which is optimal. The total order tags allow you to use standard hashing to get the same result with either a human or hardware computer.

You are trying to solve the wrong problem.

Solution 1: Each time you put dirty socks in your laundry basket, tie them in a little knot. That way you will not have to do any sorting after the washing. Think of it like registering an index in a Mongo database. A little work ahead for some CPU savings in the future.

Solution 2: If it's winter, you don't have to wear matching socks. We are programmers. Nobody needs to know, as long as it works.

Solution 3: Spread the work. You want to perform such a complex CPU process asynchronously, without blocking the UI. Take that pile of socks and stuff them in a bag. Only look for a pair when you need it. That way the amount of work it takes is much less noticeable.

希望这可以帮助!

Here's an Omega(n log n) lower bound in comparison based model. (The only valid operation is comparing two socks.)

Suppose that you know that your 2n socks are arranged this way:

p 1 p 2 p 3 … p n p f(1) p f(2) … p f(n)

where f is an unknown permutation of the set {1,2,…,n}. Knowing this cannot make the problem harder. There are n! possible outputs (matchings between first and second half), which means you need log(n!) = Omega(n log n) comparisons. This is obtainable by sorting.

Since you are interested in connections to element distinctness problem: proving the Omega(n log n) bound for element distinctness is harder, because the output is binary yes/no. Here, the output has to be a matching and the number of possible outputs suffices to get a decent bound. However, there's a variant connected to element distinctness. Suppose you are given 2n socks and wonder if they can be uniquely paired. You can get a reduction from ED by sending (a 1 , a 2 , …, a n ) to (a 1 , a 1 , a 2 , a 2 , …, a n , a n ). (Parenthetically, the proof of hardness of ED is very interesting, via topology .)

I think that there should be an Omega(n 2 ) bound for the original problem if you allow equality tests only. My intuition is: Consider a graph where you add an edge after a test, and argue that if the graph is not dense the output is not uniquely determined.

Cost: Moving socks -> high, finding/search socks in line -> small

What we want to do is reduce the number of moves, and compensate with the number of searches. Also, we can utilize the multithreded environment of the Homo Sapiens to hold more things in the descision cache.

X = Yours, Y = Your spouses

From pile A of all socks:

Pick two socks, place corresponding X sock in X line, and Y sock in Y line at next available position.

Do until A is empty.

For each line X and Y

  1. Pick the first sock in line, search along the line until it finds the corresponding sock.

  2. Put into the corresponding finished line of socks.

  3. Optional While you are searching the line and and the current sock you are looking at is identical to the previous, do step 2 for these socks.

Optionally to step one, you pick up two sock from that line instead of two, as the caching memory is large enough we can quickly identify if either sock matches the current one on the line you are observing. If you are fortunate enough to have three arms, you could possibly parse three socks at the same time given that the memory of the subject is large enough.

Do until both X and Y is empty.

完成

However, as this have simillar complexity as selection sort, the time taken is far less due to the speeds of I/O(moving socks) and search(searching the line for a sock).

This is how I actually do it, for p pairs of socks ( n = 2p individual socks):

  • Grab a sock at random from the pile.
  • For the first sock, or if all previously-chosen socks have been paired, simply place the sock into the first "slot" of an "array" of unpaired socks in front of you.
  • If you have one or more selected unpaired socks, check your current sock against all the unpaired socks in the array.
    • It is possible to separate socks into general classes or types (white/black, ankle/crew, athletic/dress) when building your array, and "drill-down" to only compare like-for-like.
    • If you find an acceptable match, put both socks together and remove them from the array.
    • If you do not, put the current sock into the first open slot in the array.
  • Repeat with every sock.

The worst-case scenario of this scheme is that every pair of socks is different enough that it must be matched exactly, and that the first n/2 socks you pick are all different. This is your O (n 2 ) scenario, and it's extremely unlikely. If the number of unique types of sock t is less than the number of pairs p = n/2 , and the socks in each type are alike enough (usually in wear-related terms) that any sock of that type can be paired with any other, then as I inferred above, the maximum number of socks you will ever have to compare to is t , after which the next one you pull will match one of the unpaired socks. This scenario is much more likely in the average sock drawer than the worst-case, and reduces the worst-case complexity to O (n*t) where usually t << n .

Real world approach:

As rapidly as possible, remove socks from the unsorted pile one at a time and place in piles in front of you. The piles should be arranged somewhat space-efficiently, with all socks pointing the same direction; the number of piles is limited by the distance you can easily reach. The selection of a pile on which to put a sock should be — as rapidly as possible — by putting a sock on a pile of apparently like socks; the occasional type I (putting a sock on a pile it doesn't belong to) or type II (putting a sock in its own pile when there's an existing pile of like socks) error can be tolerated — the most important consideration is speed . Once all the socks are in piles, rapidly go through the multi-sock piles creating pairs and removing them (these are heading for the drawer). If there are non-matching socks in the pile, re-pile them to their best (within the as-fast-as-possible constraint) pile. When all the multi-sock piles have been processed, match up remaining pairable socks that weren't paired due to type II errors. Whoosh, you're done — and I have a lot of socks and don't wash them until a large fraction are dirty. Another practical note: I flip the top of one of a pair of socks down over the other, taking advantage of their elastic properties, so they stay together while being transported to the drawer and while in the drawer.

From your question it is clear you don't have much actual experience with laundry :). You need an algorithm that works well with a small number of non-pairable socks.

The answers till now don't make good use of our human pattern recognition capabilities. The game of Set provides a clue of how to do this well: put all socks in a two-dimensional space so you can both recognize them well and easily reach them with your hands. This limits you to an area of about 120 * 80 cm or so. From there select the pairs you recognize and remove them. Put extra socks in the free space and repeat. If you wash for people with easily recognizable socks (small kids come to mind), you can do a radix sort by selecting those socks first. This algorithm works well only when the number of single socks is low

 List<Sock> UnSearchedSocks = getAllSocks(); List<Sock> UnMatchedSocks = new list<Sock>(); List<PairOfSocks> PairedSocks = new list<PairOfSocks>(); foreach (Sock newSock in UnsearchedSocks) { Sock MatchedSock = null; foreach(Sock UnmatchedSock in UnmatchedSocks) { if (UnmatchedSock.isPairOf(newSock)) { MatchedSock = UnmatchedSock; break; } } if (MatchedSock != null) { UnmatchedSocks.remove(MatchedSock); PairedSocks.Add(new PairOfSocks(MatchedSock, NewSock)); } else { UnmatchedSocks.Add(NewSock); } } 

Pick up a first sock and place it on a table. Now pick another sock; if it matches the first picked, place it on top of the first. If not, place it on the table a small distance from the first. Pick a third sock; if it matches either of the previous two, place it on top of them or else place it a small distance from the third. Repeat until you have picked up all the socks.

In order to say how efficient it is to pair socks from a pile, we have to define the machine first, because the pairing isn't done whether by a turing nor by a random access machine, which are normally used as the basis for an algorithmic analysis.

The machine

The machine is an abstraction of a the real world element called human being. It is able to read from the environment via a pair of eyes. And our machine model is able to manipulate the environment by using 2 arms. Logical and arithmetic operations are calculated using our brain (hopefully ;-)).

We also have to consider the intrinsic runtime of the atomic operations that can be carried out with these instruments. Due to physical constraints, operations which are carried out by an arm or eye have non constant time complexity. This is because we can't move an endlessly large pile of socks with an arm nor can an eye see the top sock on an endlessly large pile of socks.

However mechanical physics give us some goodies as well. We are not limited to move at most one sock with an arm. We can move a whole couple of them at once.

So depending on the previous analysis following operations should be used in descending order:

  • logical and arithmetic operations
  • environmental reads
  • environmental modifications

We can also make use of the fact that people only have a very limited amount of socks. So an environmental modification can involve all socks in the pile.

The algorithm

So here is my suggestion:

  1. Spread all socks in the pile over the floor.
  2. Find a pair by looking at the socks on the floor.
  3. Repeat from 2 until no pair can be made.
  4. Repeat from 1 until there are no socks on the floor.

Operation 4 is necessary, because when spreading socks over the floor some socks may hide others. Here is the analysis of the algorithm:

The analysis

The algorithm terminates with high probability. This is due to the fact that one is unable to find pairs of socks in step number 2.

For the following runtime analysis of pairing n pairs of socks, we suppose that at least half of the 2n socks aren't hidden after step 1. So in the average case we can find n/2 pairs. This means that the loop is step 4 is executed O(log n) times. Step 2 is executed O(n^2) times. So we can conclude:

  • The algorithm involves O(ln n + n) environmental modifications (step 1 O(ln n) plus picking every pair of sock from the floor)
  • The algorithm involves O(n^2) environmental reads from step 2
  • The algorithm involves O(n^2) logical and arithmetic operations for comparing a sock with another in step 2

So we have a total runtime complexity of O(r*n^2 + w*(ln n + n)) where r and w are the factors for environmental read and environmental write operations respectively for a reasonable amount of socks. The cost of the logical and arithmetical operations are omitted, because we suppose that it takes a constant amount of logical and arithmetical operations to decide whether 2 socks belong to the same pair. This may not be feasible in every scenario.

I came out with another solution which would not promise fewer operations, neither less time consumption, but it should be tried to see if it can be a good-enough heuristic to provide less time consumption in huge series of sock pairing.

Preconditions: There is no guarantee that there are the same socks. If they are of the same color it doesn't mean they have the same size or pattern. Socks are randomly shuffled. There can be odd number of socks (some are missing, we don't know how many). Prepare to remember a variable "index" and set it to 0.

The result will have one or two piles: 1. "matched" and 2. "missing"

Heuristic:

  1. Find most distinctive sock.
  2. Find its match.
  3. If there is no match, put it on the "missing" pile.
  4. Repeat from 1. until there are no more most distinctive socks.
  5. If there are less then 6 socks, go to 11.
  6. Pair blindly all socks to its neighbor (do not pack it)
  7. Find all matched pairs, pack it and move packed pairs to "matched" pile; If there were no new matches – increment "index" by 1
  8. If "index" is greater then 2 (this could be value dependent on sock number because with greater number of socks there are less chance to pair them blindly) go to 11
  9. Shuffle the rest
  10. Go to 1
  11. Forget "index"
  12. Pick a sock
  13. Find its pair
  14. If there is no pair for the sock, move it to the "missing" pile
  15. If match found pair it, pack pair and move it to the "matched" pile
  16. If there are still more then one socks go to 12
  17. If there is just one left go to 14
  18. Smile satisfied 🙂

Also, there could be added check for damaged socks also, as if the removal of those. It could be inserted between 2 and 3, and between 13 and 14.

I'm looking forward to hear about any experiences or corrections.

When I sort socks, I do an approximate radix sort , dropping socks near other socks of the same colour/pattern type. Except in the case when I can see an exact match at/near the location I'm about to drop the sock I extract the pair at that point.

Almost all the other algorithms (including the top scoring answer by usr ) sort, then remove pairs. I find that, as a human, it is better to minimize the number of socks being considered at one time.

I do this by:

  1. Picking a distinctive sock (whatever catches my eye first in the pile).
  2. Starting a radix sort from that conceptual location by pulling socks from the pile based on similarity to that one.
  3. Place the new sock near into the current pile, with a distance based on how different it is. If you find yourself putting the sock on top of another because it is identical, form the pair there, and remove them. This means that future comparisons take less effort to find the correct place.

This takes advantage of the human ability to fuzzy-match in O(1) time, which is somewhat equivalent to the establishment of a hash-map on a computing device.

By pulling the distinctive socks first, you leave space to "zoom" in on the features which are less distinctive to begin with.

After eliminating the fluro coloured, the socks with stripes, and the three pairs of long socks, you might end up with mostly white socks roughly sorted by how worn they are.

At some point the differences between socks are small enough that other people won't notice the difference, and any further matching effort is not needed.

Whenever you pick up a sock, put it in one place. Then the next sock you pick up, if it doesn't match the first sock, set it beside the first one. If it does, there's a pair. This way it doesn't really matter how many combinations there are, and there are only two possibilities for each sock you pick up — either it has a match that's already in your array of socks, or it doesn't, which means you add it to a place in the array.

This also means that you will almost certainly never have all your socks in the array, because socks will get removed as they're matched.

Consider a hash-table of size 'N'.

If we assume normal distribution, then the estimated number of 'insertions' to have atleast one sock mapped to one bucket is NlogN (ie, all buckets are full)

I had derived this as a part of another puzzle,but I would be happy to be proven wrong. Here's my blog article on the same

Let 'N' correspond to an approximate upper-bound on the number of number of unique colors/pattern of socks that you have.

Once you have a collision(aka : a match) simply remove that pair of socks. Repeat the same experiment with the next batch of NlogN socks. The beauty of it is that you could be making NlogN parallel comparisons(collision-resolution) because of the way the human mind works. 🙂

Socks, whether real ones or some analogous data structure, would be supplied in pairs.

The simplest answer is prior to allowing the pair to be separated, a single data structure for the pair should have been initialized that contained a pointer to the left and right sock, thus enabling socks to be referred to directly or via their pair. A sock may also be extended to contain a pointer to its partner.

This solves any computational pairing problem by removing it with a layer of abstraction.

Applying the same idea to the practical problem of pairing socks, the apparent answer is: don't allow your socks to ever be unpaired. Socks are provided as a pair, put in the drawer as a pair (perhaps by balling them together), worn as a pair. But the point where unpairing is possible is in the washer, so all that's required is a physical mechanism that allows the socks to stay together and be washed efficiently.

There are two physical possibilities:

For a 'pair' object that keeps a pointer to each sock we could have a cloth bag that we use to keep the socks together. This seems like massive overhead.

But for each sock to keep a reference to the other, there is a neat solution: a popper (or a 'snap button' if you're American), such as these:

http://www.aliexpress.com/compare/compare-invisible-snap-buttons.html

Then all you do is snap your socks together right after you take them off and put them in your washing basket, and again you've removed the problem of needing to pair your socks with a physical abstraction of the 'pair' concept.

I hope I can contribute something new to this problem. I noticed that all of the answers neglect the fact that there are two points where you can perform preprocessing , without slowing down your overall laundry performance.

Also, we don't need to assume a large number of socks, even for large families. Socks are taken out of the drawer and are worn, and then are tossed in a place (maybe a bin) where they stay before being laundered. While I wouldn't call said bin a LIFO-Stack, I'd say it is safe to assume that

  1. people toss both of their socks roughly in the same area of the bin,
  2. the bin is not randomized at any point, and therefore
  3. any subset taken from the top of this bin generally contains both socks of a pair.

Since all washing machines I know about are limited in size (regardless of how many socks you have to wash), and the actual randomizing occurs in the washing machine, no matter how many socks we have, we always have small subsets which contain almost no singletons.

Our two preprocessing stages are "putting the socks on the clothesline" and "Taking the socks from the clothesline", which we have to do, in order to get socks which are not only clean, but also dry. As with washing machines, clothelines are finite, and I assume that we have the whole part of the line where we put our socks in sight.

Here's the algorithm for put_socks_on_line():

 while (socks left in basket) { take_sock(); if (cluster of similar socks is present) { Add sock to cluster (if possible, next to the matching pair) } else { Hang it somewhere on the line, this is now a new cluster of similar-looking socks. Leave enough space around this sock to add other socks later on } } 

Don't waste your time moving socks around or looking for the best match, this all should be done in O(n), which we would also need for just putting them on the line unsorted. The socks aren't paired yet, we only have several similarity clusters on the line. It's helpful that we have a limited set of socks here, as this helps us to create "good" clusters (for example, if there are only black socks in the set of socks, clustering by colors would not be the way to go)

Here's the algorithm for take_socks_from_line():

 while(socks left on line) { take_next_sock(); if (matching pair visible on line or in basket) { Take it as well, pair 'em and put 'em away } else { put sock in basket } 

I should point out that in order to improve speed of the remaining steps, it is wise not to randomly pick the next sock, but to sequentally take sock after sock from each cluster. Both preprocessing steps don't take more time than just putting the socks on the line or in the basket, which we have to do no matter what, so this should gretly enhance the laundry performance.

After this, it's easy to do the hash partioning algorithm. Usually, about 75% of the socks are already paired, leaving me with a very small subset of socks, and this subset is already (somewhat) clustered (I don't introduce much entropy into my basket after the preprocessing steps). Another thing is that the remaining clusters tend to be small enough to be handled at once, so it is possible to take a whole cluster out of the basket.

Here's the algorithm for sort_remaining_clusters():

 while(clusters present in basket) { Take out the cluster and spread it Process it immediately Leave remaining socks where they are } 

After that, there are only a few socks left. This is where I introduce previously unpaired socks into the system and process the remaining socks without any special algorithm – the remaining socks are very few and can be processed visually very fast.

For all remaining socks I assume that their counterparts are still unwashed and put them away for the next iteration. If you register a growth of unpaired socks over time (a "sock leak"), you should check your bin – it might get randomized (do you have cats which sleep in there?)

I know that these algorithms take a lot of assumptions: a bin which acts as some sort of LIFO stack, a limited, normal washing machine, and a limited, normal clothesline – but this still works with very large numbers of socks.

About parallelism: As long as you toss both socks into the same bin, you can easily parallelize all of those steps.

If the "move" operation is fairly expensive, and the "compare" operation is cheap, and you need to move the whole set anyway, into a buffer where search is much faster than in original storage… just integrate sorting into the obligatory move.

I found integrating the process of sorting into hanging to dry makes it a breeze. I need to pick up each sock anyway, and hang it (move) and it costs me about nothing to hang it in a specific place on the strings. Now just not to force search of the whole buffer (the strings) I choose to place socks by color/shade. Darker left, brighter right, more colorful front etc. Now before I hang each sock, I look in its "right vicinity" if a matching one is there already – this limits "scan" to 2-3 other socks – and if it is, I hang the other one right next to it. Then I roll them into pairs while removing from the strings, when dry.

Now this may not seem all that different from "forming piles by color" suggested by top answers but first, by not picking discrete piles but ranges, I have no problem classifying whether "purple" goes to "red" or "blue" pile; it just goes between. And then by integrating two operations (hang to dry and sort) the overhead of sorting while hanging is like 10% of what separate sorting would be.

I have taken simple steps to reduce my effort into a process taking O(1) time.

By reducing my inputs to one of two types of socks (white socks for recreation, black socks for work), I only need to determine which of two socks I have in hand. (Technically, since they are never washed together, I have reduced the process to O(0) time)

Some upfront effort is required to find desirable socks, and to purchase in sufficient quantity as to eliminate need for your existing socks. As I'd done this before my need for black socks, my effort was minimal, but mileage may vary.

Such an upfront effort has been seen many times in very popular and effective code. Examples include #DEFINE'ing pi to several decimals (other examples exist, but that's the one that comes to mind right now).

Create a hash table which will be used for unmatched socks, using the pattern as the hash. Iterate over the socks one by one. If the sock has a pattern match in the hash table, take the sock out of the table and make a pair. If the sock does not have a match, put it into the table.

The problem of sorting your n pairs of socks is O(n) . Before you throw them in the laundry basket , you thread the left one to the right one. On taking them out, you cut the thread and put each pair into your drawer – 2 operations on n pairs, so O(n).

Now the next question is simply whether you do your own laundry and your wife does hers. That is a problem likely in an entirely different domain of problems . 🙂

I've finished pairing my socks just right now, and I found that the best way to do it is the following:

  • Choose one of the socks and put it away (create a 'bucket' for that pair)
  • If the next one is the pair of the previous one, then put it to the existing bucket, otherwise create a new one.

In the worst case it means that you will have n/2 different buckets, and you will have n-2 determinations about that which bucket contains the pair of the current sock. Obviously, this algorithm works well if you have just a few pairs; I did it with 12 pairs.

It is not so scientific, but it works well:)

What about doing some preprocessing? I would stitch a mark or id number in every sock in a way that every pair has the same mark/id number. This process might be done every time you buy a new pair of socks. Then, you could do a radix sort to get O(n) total cost. Find a place for every mark/id number and just pick all the socks one by one and put them into the correct place.

Interesting Posts