解释使用位向量来确定是否所有字符都是唯一的

我很困惑如何位vector如何做到这一点(不太熟悉位vector)。 这是给出的代码。 有人可以通过这个走过我吗?

public static boolean isUniqueChars(String str) { int checker = 0; for (int i = 0; i < str.length(); ++i) { int val = str.charAt(i) - 'a'; if ((checker & (1 << val)) > 0) return false; checker |= (1 << val); } return true; } 

特别是checker在做什么?

int checker在这里用作位存储。 整数值中的每一位都可以看作一个标志,所以最终int是一个位数组(标志)。 代码中的每一位都表示是否在string中find具有位索引的字符。 你可以使用位向量,而不是int 。 他们之间有两个区别:

  • 大小int具有固定大小,通常是4个字节,这意味着8 * 4 = 32位(标志)。 位向量通常可以是不同的大小,或者你应该在构造函数中指定大小。

  • API 。 使用位向量,您将更容易阅读代码,可能是这样的:

    vector.SetFlag(4, true); // set flag at index 4 as true

    对于int您将拥有较低级别的位逻辑代码:

    checker |= (1 << 5); // set flag at index 5 to true

也可能是int可能会快一点,因为位操作是非常低的水平,可以按原样由CPU执行。 BitVector允许编写less一点神秘的代码,而且可以存储更多的标志。

为了将来的参考:位向量也被称为bitSet或bitArray。 以下是针对不同语言/平台的此数据结构的一些链接:

  • CPP: BitSet
  • Java: BitSet
  • C#: BitVector32和BitArray

我有一个潜行的怀疑,你从我正在阅读的同一本书得到这个代码…这里的代码本身并不像通常使用的运算符 – | =,&和<<我们这个外行人 – 作者并没有花费额外的时间来解释这个过程,也没有涉及这里涉及到的实际的机制。 我一开始就对这个线程的回答感到满意,但只是在抽象层面上。 我回到原来的地方,因为我觉得那里需要更具体的解释 – 缺乏一个总是让我感到不安。

这个运算符<<是一个左位移符,它取该数字或操作数的二进制表示,然后将其移到操作数指定的很多地方或右边的数字,就像二进制中的十进制数字一样。 我们乘以2的基数 – 当我们向上移动时,许多地方不是10的基数 – 所以右边的数字是指数,左边的数字是2的基数倍数。

这个操作符| =把操作数放在左边,或者把操作数放在右边,这个 – 和是操作数的左边和右边。

所以我们这里有一个哈希表,它被存储在一个32位的二进制数字中,每次检查器得到or'd( checker |= (1 << val) )与一个字母的指定二进制值对应的位正被设置为true。 字符的值和检查器( checker & (1 << val)) > 0 )是和的 – 如果它大于0,我们知道我们有一个重复 – 因为两个相同的位被设置为true并且会一起返回真或'1'。

有26个二进制位,每个对应一个小写字母 – 作者确实假设string只包含小写字母 – 这是因为我们只剩下6个(在32位整数)的地方消费,比我们碰撞

 00000000000000000000000000000001 a 2^0 00000000000000000000000000000010 b 2^1 00000000000000000000000000000100 c 2^2 00000000000000000000000000001000 d 2^3 00000000000000000000000000010000 e 2^4 00000000000000000000000000100000 f 2^5 00000000000000000000000001000000 g 2^6 00000000000000000000000010000000 h 2^7 00000000000000000000000100000000 i 2^8 00000000000000000000001000000000 j 2^9 00000000000000000000010000000000 k 2^10 00000000000000000000100000000000 l 2^11 00000000000000000001000000000000 m 2^12 00000000000000000010000000000000 n 2^13 00000000000000000100000000000000 o 2^14 00000000000000001000000000000000 p 2^15 00000000000000010000000000000000 q 2^16 00000000000000100000000000000000 r 2^17 00000000000001000000000000000000 s 2^18 00000000000010000000000000000000 t 2^19 00000000000100000000000000000000 u 2^20 00000000001000000000000000000000 v 2^21 00000000010000000000000000000000 w 2^22 00000000100000000000000000000000 x 2^23 00000001000000000000000000000000 y 2^24 00000010000000000000000000000000 z 2^25 

所以,对于inputstring“azya”,就像我们一步一步移动一样

string'a'

 a =00000000000000000000000000000001 checker=00000000000000000000000000000000 checker='a' or checker; // checker now becomes = 00000000000000000000000000000001 checker=00000000000000000000000000000001 a and checker=0 no dupes condition 

string'az'

 checker=00000000000000000000000000000001 z =00000010000000000000000000000000 z and checker=0 no dupes checker=z or checker; // checker now becomes 00000010000000000000000000000001 

string'azy'

 checker= 00000010000000000000000000000001 y = 00000001000000000000000000000000 checker and y=0 no dupes condition checker= checker or y; // checker now becomes = 00000011000000000000000000000001 string 'azya' checker= 00000011000000000000000000000001 a = 00000000000000000000000000000001 a and checker=1 we have a dupe 

现在,它宣布重复

我还假设你的例子来自“ 破解代码访问 ”一书,我的答案与这个上下文有关。

为了使用这个algorithm来解决这个问题,我们不得不承认,我们只是要传递从a到z(小写)的字符。

由于只有26个字母,并且这些字母在我们使用的编码表中被正确地sorting,所以这确保了所有的潜在差异str.charAt(i) - 'a'将低于32(intvariableschecker的大小) 。

正如Snowbear所解释的那样,我们将使用checkervariables作为一个位数组。 让我们举一个例子:

比方说str equals "test"

  • 第一遍(i = t)

检查器== 0(00000000000000000000000000000000)

 In ASCII, val = str.charAt(i) - 'a' = 116 - 97 = 19 What about 1 << val ? 1 == 00000000000000000000000000000001 1 << 19 == 00000000000010000000000000000000 checker |= (1 << val) means checker = checker | (1 << val) so checker = 00000000000000000000000000000000 | 00000000000010000000000000000000 checker == 524288 (00000000000010000000000000000000) 
  • 第二通(i = e)

检查器== 524288(00000000000010000000000000000000)

 val = 101 - 97 = 4 1 == 00000000000000000000000000000001 1 << 4 == 00000000000000000000000000010000 checker |= (1 << val) so checker = 00000000000010000000000000000000 | 00000000000000000000000000010000 checker == 524304 (00000000000010000000000000010000) 

等等..直到我们find一个已经设置的位检查特定字符的条件

 (checker & (1 << val)) > 0 

希望能帮助到你

我认为所有这些答案都解释了这是如何工作的,但是我想通过重命名一些variables,添加一些其他variables并添加注释,来提供我的看法:

 public static boolean isUniqueChars(String str) { /* checker is the bit array, it will have a 1 on the character index that has appeared before and a 0 if the character has not appeared, you can see this number initialized as 32 0 bits: 00000000 00000000 00000000 00000000 */ int checker = 0; //loop through each String character for (int i = 0; i < str.length(); ++i) { /* a through z in ASCII are charactets numbered 97 through 122, 26 characters total with this, you get a number between 0 and 25 to represent each character index 0 for 'a' and 25 for 'z' renamed 'val' as 'characterIndex' to be more descriptive */ int characterIndex = str.charAt(i) - 'a'; //char 'a' would get 0 and char 'z' would get 26 /* created a new variable to make things clearer 'singleBitOnPosition' It is used to calculate a number that represents the bit value of having that character index as a 1 and the rest as a 0, this is achieved by getting the single digit 1 and shifting it to the left as many times as the character index requires eg character 'd' 00000000 00000000 00000000 00000001 Shift 3 spaces to the left (<<) because 'd' index is number 3 1 shift: 00000000 00000000 00000000 00000010 2 shift: 00000000 00000000 00000000 00000100 3 shift: 00000000 00000000 00000000 00001000 Therefore the number representing 'd' is 00000000 00000000 00000000 00001000 */ int singleBitOnPosition = 1 << characterIndex; /* This peforms an AND between the checker, which is the bit array containing everything that has been found before and the number representing the bit that will be turned on for this particular character. eg if we have already seen 'a', 'b' and 'd', checker will have: checker = 00000000 00000000 00000000 00001011 And if we see 'b' again: 'b' = 00000000 00000000 00000000 00000010 it will do the following: 00000000 00000000 00000000 00001011 & (AND) 00000000 00000000 00000000 00000010 ----------------------------------- 00000000 00000000 00000000 00000010 Since this number is different than '0' it means that the character was seen before, because on that character index we already have a 1 bit value */ if ((checker & singleBitOnPosition) > 0) { return false; } /* Remember that checker |= singleBitOnPosition is the same as checker = checker | singleBitOnPosition Sometimes it is easier to see it expanded like that. What this achieves is that it builds the checker to have the new value it hasnt seen, by doing an OR between checker and the value representing this character index as a 1. eg If the character is 'f' and the checker has seen 'g' and 'a', the following will happen 'f' = 00000000 00000000 00000000 00100000 checker(seen 'a' and 'g' so far) = 00000000 00000000 00000000 01000001 00000000 00000000 00000000 00100000 | (OR) 00000000 00000000 00000000 01000001 ----------------------------------- 00000000 00000000 00000000 01100001 Therefore getting a new checker as 00000000 00000000 00000000 01100001 */ checker |= singleBitOnPosition; } return true; } 

上面已经提供了几个优秀的答案。 所以我不想重复已经说过的一切。 但是,我想添加几件事来帮助上面的程序,因为我刚刚通过同一个程序,并有几个问题,但花了一些时间之后,我对这个程序更清晰。

首先,“检查器”用于跟踪string中已经遍历的字符,以查看是否有任何字符正在重复。

现在“checker”是一个int数据types,所以它只能有32位或4个字节(取决于平台),所以这个程序只能在32个字符范围内的字符集正确工作。 这就是为什么这个程序从每个字符中减去“a”来使这个程序只运行小写字符。 但是,如果你混合使用小写字母和大写字母,那么它将不起作用。

顺便说一句,如果你不从每个字符中减去“a”(见下面的语句),那么这个程序将只对大写字符的string或只带有小写字符的string正常工作。 所以上述程序的范围也从小写字母增加到大写字母,但不能混合在一起。

 int val = str.charAt(i) - 'a'; 

但是我想写一个通用的程序使用位操作,应该适用于任何ASCII字符,而不必担心大写,小写,数字或任何特殊字符。 为了做到这一点,我们的“检查”应该足够大,以存储256个字符(ASCII字符集大小)。 但是Java中的int不能工作,因为它只能存储32位。 因此,在下面的程序中,我使用了JDK中可用的BitSet类,它可以在实例化BitSet对象时传递任何用户定义的大小。

这里是一个程序,做与上面使用Bitwise运算符编写的程序相同的东西,但是这个程序将适用于ASCII字符集中任何字符的string。

 public static boolean isUniqueStringUsingBitVectorClass(String s) { final int ASCII_CHARACTER_SET_SIZE = 256; final BitSet tracker = new BitSet(ASCII_CHARACTER_SET_SIZE); // if more than 256 ASCII characters then there can't be unique characters if(s.length() > 256) { return false; } //this will be used to keep the location of each character in String final BitSet charBitLocation = new BitSet(ASCII_CHARACTER_SET_SIZE); for(int i = 0; i < s.length(); i++) { int charVal = s.charAt(i); charBitLocation.set(charVal); //set the char location in BitSet //check if tracker has already bit set with the bit present in charBitLocation if(tracker.intersects(charBitLocation)) { return false; } //set the tracker with new bit from charBitLocation tracker.or(charBitLocation); charBitLocation.clear(); //clear charBitLocation to store bit for character in the next iteration of the loop } return true; } 

读伊万的答案真的帮助了我,虽然我会用略有不同的措辞。

<< in (1 << val)是一个位移运算符。 它需要1 (在二进制表示为000000001 ,与你喜欢/由内存分配尽可能多的前面的零),并通过val空格左移。 由于我们假设只有az,并且每次减去a字母,每个字母的值将是0-25,这是从checker整数的布尔表示中右边的字母索引,因为我们将向左移动1 checker val时间。

在每次检查结束时,我们看到|=运算符。 这将合并两个二进制数字,如果在该索引处的任一操作数中存在1 ,则用1replace全部0 。 在这里,这意味着无论在哪里存在(1 << val)1将被复制到checker ,而所有checker的现有1将被保留。

正如你可能猜到的,一个1在这里作为布尔标志为true。 当我们检查一个字符是否已经在string中被表示时,我们比较checker ,这个checker本质上是一个布尔标志( 1值)的数组,在已经被表示的字符的索引处,基本上是在当前字符的索引处带有1标志的布尔值数组。

&运算符完成这个检查。 与|=相似, 只有两个操作数在该索引处具有1时, &操作符才会复制1 。 所以,基本上,只有已经存在于checker中的标志(1 << val)标志将被复制。 在这种情况下,这意味着只有在当前字符已经被表示的情况下, checker & (1 << val)的结果中的任何地方都会出现1 。 如果在该操作的结果中存在1 ,则返回的布尔值的值> 0 ,该方法返回false。

这就是我猜测,为什么位向量也称为位数组 。 因为即使它们不是数组数据types,也可以使用它们类似于使用数组来存储布尔标志的方式。

让我们逐行分解代码。

int checker = 0; 我们正在启动一个检查器,这将帮助我们find重复的值。

int val = str.charAt(i) – 'a'; 我们在string的第i个位置获取字符的ASCII值,并用“a”的ASCII值减去它。 由于假定string只是低位字符,所以字符数限制为26. Hece,'val'的值总是> = 0。

如果((checker&(1 << val))> 0)返回false;

checker | =(1 << val);

现在这是棘手的部分。 让我们考虑一个string“abcda”的例子。 理想情况下,这应该返回false。

对于循环迭代1:

检查器:00000000000000000000000000000000

val:97-97 = 0

1 << 0:00000000000000000000000000000001

检查器&(1 << val):00000000000000000000000000000000不是> 0

因此检查器:00000000000000000000000000000001

对于循环迭代2:

检查器:00000000000000000000000000000001

val:98-97 = 1

1 << 0:00000000000000000000000000000010

检查器&(1 << val):00000000000000000000000000000000不是> 0

因此检查器:00000000000000000000000000000011

对于循环迭代3:

检查器:00000000000000000000000000000011

val:99-97 = 0

1 << 0:00000000000000000000000000000100

检查器&(1 << val):00000000000000000000000000000000不是> 0

因此检查器:00000000000000000000000000000111

对于循环迭代4:

检查器:00000000000000000000000000000111

val:100-97 = 0

1 << 0:00000000000000000000000000001000

检查器&(1 << val):00000000000000000000000000000000不是> 0

因此检查器:00000000000000000000000000001111

对于循环迭代5:

检查器:00000000000000000000000000001111

val:97-97 = 0

1 << 0:00000000000000000000000000000001

检查器&(1 << val):00000000000000000000000000000001> 0

因此返回false。

简单的解释(下面的JS代码)

  • 每个机器码的整数variables是一个32位的数组
  • 所有位操作都是32-bit
  • 他们不知道OS / CPU架构或select的语言编号系统,例如JS的DEC64
  • 这种重复查找方法类似于将字符存储在大小为32的数组中,其中,如果我们在string中finda ,则设置0th索引。
  • string中的重复字符将占用相应的位,或者在此情况下设置为1。
  • 伊万已经解释过 :这个指数计算如何在这个前面的答案中起作用 。

运营总结:

  • 执行checker和字符index之间的操作
  • 内部都是Int-32-Arrays
  • 这两者之间是一个比较明智的操作。
  • 检查操作的输出if1
  • 如果output == 1
    • checkervariables具有在两个数组中都设置的特定索引位
    • 因此它是重复的。
  • 如果output == 0
    • 这个angular色到目前为止还没有find
    • 在字符的checkerindex之间执行OR操作
    • 由此,将索引位更新为1
    • 将输出分配给checker

假设:

  • 我们假设我们会得到所有的小写字母
  • 而且,这个尺寸32就足够了
  • 因此,我们开始从96指数作为参考点考虑ascii代码为97

下面给出的是JavaScript源代码。

 function checkIfUniqueChars (str) { var checker = 0; // 32 or 64 bit integer variable for (var i = 0; i< str.length; i++) { var index = str[i].charCodeAt(0) - 96; var bitRepresentationOfIndex = 1 << index; if ( (checker & bitRepresentationOfIndex) > 1) { console.log(str, false); return false; } else { checker = (checker | bitRepresentationOfIndex); } } console.log(str, true); return true; } checkIfUniqueChars("abcdefghi"); // true checkIfUniqueChars("aabcdefghi"); // false checkIfUniqueChars("abbcdefghi"); // false checkIfUniqueChars("abcdefghii"); // false checkIfUniqueChars("abcdefghii"); // false 

请注意 ,在JS中,尽pipe整数是64位,但总是在32位上进行位操作。

例如:如果string是aa则:

 // checker is intialized to 32-bit-Int(0) // therefore, checker is checker= 00000000000000000000000000000000 

我= 0

 str[0] is 'a' str[i].charCodeAt(0) - 96 = 1 checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000000 Boolean(0) == false // So, we go for the '`OR`' operation. checker = checker OR 32-bit-Int(1) checker = 00000000000000000000000000000001 

我= 1

 str[1] is 'a' str[i].charCodeAt(0) - 96 = 1 checker= 00000000000000000000000000000001 a = 00000000000000000000000000000001 checker 'AND' 32-bit-Int(1) = 00000000000000000000000000000001 Boolean(1) == true // We've our duplicate now 
 public static void main (String[] args) { //In order to understand this algorithm, it is necessary to understand the following: //int checker = 0; //Here we are using the primitive int almost like an array of size 32 where the only values can be 1 or 0 //Since in Java, we have 4 bytes per int, 8 bits per byte, we have a total of 4x8=32 bits to work with //int val = str.charAt(i) - 'a'; //In order to understand what is going on here, we must realize that all characters have a numeric value for (int i = 0; i < 256; i++) { char val = (char)i; System.out.print(val); } //The output is something like: // !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ //There seems to be ~15 leading spaces that do not copy paste well, so I had to use real spaces instead //To only print the characters from 'a' on forward: System.out.println(); System.out.println(); for (int i=0; i < 256; i++) { char val = (char)i; //char val2 = val + 'a'; //incompatible types. required: char found: int int val2 = val + 'a'; //shift to the 'a', we must use an int here otherwise the compiler will complain char val3 = (char)val2; //convert back to char. there should be a more elegant way of doing this. System.out.print(val3); } //Notice how the following does not work: System.out.println(); System.out.println(); for (int i=0; i < 256; i++) { char val = (char)i; int val2 = val - 'a'; char val3 = (char)val2; System.out.print(val3); } //I'm not sure why this spills out into 2 lines: //EDIT I cant seem to copy this into stackoverflow! System.out.println(); System.out.println(); //So back to our original algorithm: //int val = str.charAt(i) - 'a'; //We convert the i'th character of the String to a character, and shift it to the right, since adding shifts to the right and subtracting shifts to the left it seems //if ((checker & (1 << val)) > 0) return false; //This line is quite a mouthful, lets break it down: System.out.println(0<<0); //00000000000000000000000000000000 System.out.println(0<<1); //00000000000000000000000000000000 System.out.println(0<<2); //00000000000000000000000000000000 System.out.println(0<<3); //00000000000000000000000000000000 System.out.println(1<<0); //00000000000000000000000000000001 System.out.println(1<<1); //00000000000000000000000000000010 == 2 System.out.println(1<<2); //00000000000000000000000000000100 == 4 System.out.println(1<<3); //00000000000000000000000000001000 == 8 System.out.println(2<<0); //00000000000000000000000000000010 == 2 System.out.println(2<<1); //00000000000000000000000000000100 == 4 System.out.println(2<<2); // == 8 System.out.println(2<<3); // == 16 System.out.println("3<<0 == "+(3<<0)); // != 4 why 3??? System.out.println(3<<1); //00000000000000000000000000000011 == 3 //shift left by 1 //00000000000000000000000000000110 == 6 System.out.println(3<<2); //00000000000000000000000000000011 == 3 //shift left by 2 //00000000000000000000000000001100 == 12 System.out.println(3<<3); // 24 //It seems that the - 'a' is not necessary //Back to if ((checker & (1 << val)) > 0) return false; //(1 << val means we simply shift 1 by the numeric representation of the current character //the bitwise & works as such: System.out.println(); System.out.println(); System.out.println(0&0); //0 System.out.println(0&1); //0 System.out.println(0&2); //0 System.out.println(); System.out.println(); System.out.println(1&0); //0 System.out.println(1&1); //1 System.out.println(1&2); //0 System.out.println(1&3); //1 System.out.println(); System.out.println(); System.out.println(2&0); //0 System.out.println(2&1); //0 0010 & 0001 == 0000 = 0 System.out.println(2&2); //2 0010 & 0010 == 2 System.out.println(2&3); //2 0010 & 0011 = 0010 == 2 System.out.println(); System.out.println(); System.out.println(3&0); //0 0011 & 0000 == 0 System.out.println(3&1); //1 0011 & 0001 == 0001 == 1 System.out.println(3&2); //2 0011 & 0010 == 0010 == 2, 0&1 = 0 1&1 = 1 System.out.println(3&3); //3 why?? 3 == 0011 & 0011 == 3??? System.out.println(9&11); // should be... 1001 & 1011 == 1001 == 8+1 == 9?? yay! //so when we do (1 << val), we take 0001 and shift it by say, 97 for 'a', since any 'a' is also 97 //why is it that the result of bitwise & is > 0 means its a dupe? //lets see.. //0011 & 0011 is 0011 means its a dupe //0000 & 0011 is 0000 means no dupe //0010 & 0001 is 0011 means its no dupe //hmm //only when it is all 0000 means its no dupe //so moving on: //checker |= (1 << val) //the |= needs exploring: int x = 0; int y = 1; int z = 2; int a = 3; int b = 4; System.out.println("x|=1 "+(x|=1)); //1 System.out.println(x|=1); //1 System.out.println(x|=1); //1 System.out.println(x|=1); //1 System.out.println(x|=1); //1 System.out.println(y|=1); // 0001 |= 0001 == ?? 1???? System.out.println(y|=2); // ??? == 3 why??? 0001 |= 0010 == 3... hmm System.out.println(y); //should be 3?? System.out.println(y|=1); //already 3 so... 0011 |= 0001... maybe 0011 again? 3? System.out.println(y|=2); //0011 |= 0010..... hmm maybe.. 0011??? still 3? yup! System.out.println(y|=3); //0011 |= 0011, still 3 System.out.println(y|=4); //0011 |= 0100.. should be... 0111? so... 11? no its 7 System.out.println(y|=5); //so we're at 7 which is 0111, 0111 |= 0101 means 0111 still 7 System.out.println(b|=9); //so 0100 |= 1001 is... seems like xor?? or just or i think, just or... so its 1101 so its 13? YAY! //so the |= is just a bitwise OR! } public static boolean isUniqueChars(String str) { int checker = 0; for (int i = 0; i < str.length(); ++i) { int val = str.charAt(i) - 'a'; //the - 'a' is just smoke and mirrors! not necessary! if ((checker & (1 << val)) > 0) return false; checker |= (1 << val); } return true; } public static boolean is_unique(String input) { int using_int_as_32_flags = 0; for (int i=0; i < input.length(); i++) { int numeric_representation_of_char_at_i = input.charAt(i); int using_0001_and_shifting_it_by_the_numeric_representation = 1 << numeric_representation_of_char_at_i; //here we shift the bitwise representation of 1 by the numeric val of the character int result_of_bitwise_and = using_int_as_32_flags & using_0001_and_shifting_it_by_the_numeric_representation; boolean already_bit_flagged = result_of_bitwise_and > 0; //needs clarification why is it that the result of bitwise & is > 0 means its a dupe? if (already_bit_flagged) return false; using_int_as_32_flags |= using_0001_and_shifting_it_by_the_numeric_representation; } return true; }