使用优化的Levenshteinalgorithm寻找最近的邻居

我最近发布了一个关于优化algorithm来计算Levenshtein距离的问题，并且这些答复将我引向维基百科有关Levenshtein距离的文章。

文章提到，如果在最大距离上有一个边界k ，那么给定的查询可能会产生一个结果，那么运行时间可以从O（mn）减less到O（kn） ， m和n是string。我查了algorithm，但我真的不知道如何实现它。我希望在这里得到一些线索。

“可能的改进”下的优化是＃4。

令我困惑的部分是我们只需要计算以主对angular线（主对angular线被定义为坐标（i，i））为中心的宽度为2k + 1的对angular线条带。

如果有人能提供一些帮助/见解，我会非常感激。如果需要的话，我可以在这里发表完整的algorithm描述作为答案。

我已经做了很多次了。我这样做的方式是对可能发生变化的游戏树进行recursion深度优先树状漫步。有一个预算k的变化，我用来修剪树。有了这个程序，首先我运行它k = 0，然后k = 1，然后k = 2，直到我得到一个命中，或者我不想再去更高。

char* a = /* string 1 */; char* b = /* string 2 */; int na = strlen(a); int nb = strlen(b); bool walk(int ia, int ib, int k){ /* if the budget is exhausted, prune the search */ if (k < 0) return false; /* if at end of both strings we have a match */ if (ia == na && ib == nb) return true; /* if the first characters match, continue walking with no reduction in budget */ if (ia < na && ib < nb && a[ia] == b[ib] && walk(ia+1, ib+1, k)) return true; /* if the first characters don't match, assume there is a 1-character replacement */ if (ia < na && ib < nb && a[ia] != b[ib] && walk(ia+1, ib+1, k-1)) return true; /* try assuming there is an extra character in a */ if (ia < na && walk(ia+1, ib, k-1)) return true; /* try assuming there is an extra character in b */ if (ib < nb && walk(ia, ib+1, k-1)) return true; /* if none of those worked, I give up */ return false; }

添加解释trie-search：

 // definition of trie-node: struct TNode { TNode* pa[128]; // for each possible character, pointer to subnode }; // simple trie-walk of a node // key is the input word, answer is the output word, // i is the character position, and hdis is the hamming distance. void walk(TNode* p, char key[], char answer[], int i, int hdis){ // If this is the end of a word in the trie, it is marked as // having something non-null under the '\0' entry of the trie. if (p->pa[0] != null){ if (key[i] == '\0') printf("answer = %s, hdis = %d\n", answer, hdis); } // for every actual subnode of the trie for(char c = 1; c < 128; c++){ // if it is a real subnode if (p->pa[c] != null){ // keep track of the answer word represented by the trie answer[i] = c; answer[i+1] = '\0'; // and walk that subnode // If the answer disagrees with the key, increment the hamming distance walk(p->pa[c], key, answer, i+1, (answer[i]==key[i] ? hdis : hdis+1)); } } } // Note: you have to edit this to handle short keys. // Simplest is to just append a lot of '\0' bytes to the key.

现在，把它限制在预算之内，如果hdis太大，就拒绝下降。

使用优化的Levenshteinalgorithm寻找最近的邻居

哪个更好的select用于除以2的整数？

Javareflection性能

为什么（a％256）与（a＆0xFF）不同？

读取文件为string

最快的方法来sorting在JavaScript中的32位有符号整数数组？

编译器是否允许优化堆内存分配？

如果…按照概率进行sorting，sorting的效果如何？

在移动设备上testingNEON优化的cv :: threshold（）

有关GHC实施的好介绍性文字？

如何缩小JS或CSS的dynamic