基地62转换

你将如何转换一个整数为62(如hex,但与这些数字:'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ')。

我一直在试图find一个好的Python库,但他们似乎都被转换string占用。 Python base64模块只接受string,并将一个数字转换为四个字符。 我正在寻找类似于简短URL使用的东西。

这个没有标准的模块,但是我写了自己的函数来实现这个function。

BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" def encode(num, alphabet=BASE62): """Encode a positive number in Base X Arguments: - `num`: The number to encode - `alphabet`: The alphabet to use for encoding """ if num == 0: return alphabet[0] arr = [] base = len(alphabet) while num: num, rem = divmod(num, base) arr.append(alphabet[rem]) arr.reverse() return ''.join(arr) def decode(string, alphabet=BASE62): """Decode a Base X encoded string into the number Arguments: - `string`: The encoded string - `alphabet`: The alphabet to use for encoding """ base = len(alphabet) strlen = len(string) num = 0 idx = 0 for char in string: power = (strlen - (idx + 1)) num += alphabet.index(char) * (base ** power) idx += 1 return num 

注意事实上,你可以给它任何字母用于编码和解码。 如果将alphabet参数排除在外,则将在第一行代码中定义62个字符,因此编码/解码为62个基地。

希望这可以帮助。

PS – 对于URL缩写,我发现最好是省略一些混乱的字符,比如0Ol1oI等。因此,我使用这个字母表来缩短我的URL的需求 – "23456789abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ"

玩的开心。

我曾经写过一个脚本来做到这一点,我认为这是相当优雅:)

 import string BASE_LIST = string.digits + string.letters + '_@' BASE_DICT = dict((c, i) for i, c in enumerate(BASE_LIST)) def base_decode(string, reverse_base=BASE_DICT): length = len(reverse_base) ret = 0 for i, c in enumerate(string[::-1]): ret += (length ** i) * reverse_base[c] return ret def base_encode(integer, base=BASE_LIST): if integer == 0: return base[0] length = len(base) ret = '' while integer != 0: ret = base[integer % length] + ret integer /= length return ret 

用法示例:

 for i in range(100): print i, base_decode(base_encode(i)), base_encode(i) 

下面的解码器制造商使用任何合理的基础,有一个更加整洁的循环,并提供一个明确的错误信息,当它遇到一个无效的字符。

 def base_n_decoder(alphabet): """Return a decoder for a base-n encoded string Argument: - `alphabet`: The alphabet used for encoding """ base = len(alphabet) char_value = dict(((c, v) for v, c in enumerate(alphabet))) def f(string): num = 0 try: for char in string: num = num * base + char_value[char] except KeyError: raise ValueError('Unexpected character %r' % char) return num return f if __name__ == "__main__": func = base_n_decoder('0123456789abcdef') for test in ('0', 'f', '2020', 'ffff', 'abqdef'): print test print func(test) 

如果你正在寻找最高的效率(如Django),你会想要类似下面的东西。 该代码是来自Baishampayan Ghose,WoLpH和John Machin的高效方法的组合。

 # Edit this list of characters as desired. BASE_ALPH = tuple("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz") BASE_DICT = dict((c, v) for v, c in enumerate(BASE_ALPH)) BASE_LEN = len(BASE_ALPH) def base_decode(string): num = 0 for char in string: num = num * BASE_LEN + BASE_DICT[char] return num def base_encode(num): if not num: return BASE_ALPH[0] encoding = "" while num: num, rem = divmod(num, BASE_LEN) encoding = BASE_ALPH[rem] + encoding return encoding 

你可能也想提前计算你的字典。 (注意:使用string进行编码比使用列表显示效率更高,即使数字很长也是如此。)

 >>> timeit.timeit("for i in xrange(1000000): base.base_decode(base.base_encode(i))", setup="import base", number=1) 2.3302059173583984 

在2.5秒内编码和解码100万个数字。 (2.2Ghz i7-2670QM)

你可能需要base64,而不是base62。 有一个URL兼容的版本,所以额外的两个填充字符应该不成问题。

这个过程相当简单。 考虑base64表示6位,常规字节表示8.将000000到111111的值分配给所选的64个字符中的每一个,并将这4个值一起匹配一组3个base256字节。 对每个3字节的集合重复,最后用填充字符填充(0通常是有用的)。

我有一个Python库来做到这一点: http : //www.djangosnippets.org/snippets/1431/

如果您只需要生成一个简短的ID(因为您提到了URL缩写)而不是对某些东西进行编码/解码,那么该模块可能会有所帮助:

https://github.com/stochastic-technologies/shortuuid/

你可以从pypi下载zbase62模块

例如

 >>> import zbase62 >>> zbase62.b2a("abcd") '1mZPsa' 

我在这里受到了别人的好评。 我最初需要一个Django项目的Python代码,但自那时起,我转向了node.js,所以这里是Baishampayan Ghose提供的代码的JavaScript版本 (编码部分)。

 var ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"; function base62_encode(n, alpha) { var num = n || 0; var alphabet = alpha || ALPHABET; if (num == 0) return alphabet[0]; var arr = []; var base = alphabet.length; while(num) { rem = num % base; num = (num - rem)/base; arr.push(alphabet.substring(rem,rem+1)); } return arr.reverse().join(''); } console.log(base62_encode(2390687438976, "123456789ABCDEFGHIJKLMNPQRSTUVWXYZ")); 

我希望以下片段可以帮助。

 def num2sym(num, sym, join_symbol=''): if num == 0: return sym[0] if num < 0 or type(num) not in (int, long): raise ValueError('num must be positive integer') l = len(sym) # target number base r = [] div = num while div != 0: # base conversion div, mod = divmod(div, l) r.append(sym[mod]) return join_symbol.join([x for x in reversed(r)]) 

你的情况的用法:

 number = 367891 alphabet = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' print num2sym(number, alphabet) # will print '1xHJ' 

很明显,你可以指定另一个字母表,由less量或更多的符号组成,然后它会将你的号码转换为更小或更大的数字。 例如,提供“01”作为字母表将输出表示input数字的string输出为二进制。

您最初可能会打乱字母表,以便您对数字进行唯一的表示。 如果您要提供url缩短服务,这可能会有帮助。

我个人喜欢白沙class的解决scheme,主要是因为剥夺了混乱的人物。

为了完整性和更好的性能, 本文展示了一种使用Python base64模块的方法。

我写了一段时间后,它工作得很好(消极和所有包括)

 def code(number,base): try: int(number),int(base) except ValueError: raise ValueError('code(number,base): number and base must be in base10') else: number,base = int(number),int(base) if base < 2: base = 2 if base > 62: base = 62 numbers = [0,1,2,3,4,5,6,7,8,9,"a","b","c","d","e","f","g","h","i","j", "k","l","m","n","o","p","q","r","s","t","u","v","w","x","y", "z","A","B","C","D","E","F","G","H","I","J","K","L","M","N", "O","P","Q","R","S","T","U","V","W","X","Y","Z"] final = "" loc = 0 if number < 0: final = "-" number = abs(number) while base**loc <= number: loc = loc + 1 for x in range(loc-1,-1,-1): for y in range(base-1,-1,-1): if y*(base**x) <= number: final = "{}{}".format(final,numbers[y]) number = number - y*(base**x) break return final def decode(number,base): try: int(base) except ValueError: raise ValueError('decode(value,base): base must be in base10') else: base = int(base) number = str(number) if base < 2: base = 2 if base > 62: base = 62 numbers = ["0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f", "g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v", "w","x","y","z","A","B","C","D","E","F","G","H","I","J","K","L", "M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"] final = 0 if number.startswith("-"): neg = True number = list(number) del(number[0]) temp = number number = "" for x in temp: number = "{}{}".format(number,x) else: neg = False loc = len(number)-1 number = str(number) for x in number: if numbers.index(x) > base: raise ValueError('{} is out of base{} range'.format(x,str(base))) final = final+(numbers.index(x)*(base**loc)) loc = loc - 1 if neg: return -final else: return final 

对于这一切的长度抱歉

 BASE_LIST = tuple("23456789ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghjkmnpqrstuvwxyz") BASE_DICT = dict((c, v) for v, c in enumerate(BASE_LIST)) BASE_LEN = len(BASE_LIST) def nice_decode(str): num = 0 for char in str[::-1]: num = num * BASE_LEN + BASE_DICT[char] return num def nice_encode(num): if not num: return BASE_LIST[0] encoding = "" while num: num, rem = divmod(num, BASE_LEN) encoding += BASE_LIST[rem] return encoding 

这是一个循环和迭代的方式来做到这一点。 迭代的速度取决于执行次数。

 def base62_encode_r(dec): s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' return s[dec] if dec < 62 else base62_encode_r(dec / 62) + s[dec % 62] print base62_encode_r(2347878234) def base62_encode_i(dec): s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' ret = '' while dec > 0: ret = s[dec % 62] + ret dec /= 62 return ret print base62_encode_i(2347878234) def base62_decode_r(b62): s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' if len(b62) == 1: return s.index(b62) x = base62_decode_r(b62[:-1]) * 62 + s.index(b62[-1:]) % 62 return x print base62_decode_r("2yTsnM") def base62_decode_i(b62): s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' ret = 0 for i in xrange(len(b62)-1,-1,-1): ret = ret + s.index(b62[i]) * (62**(len(b62)-i-1)) return ret print base62_decode_i("2yTsnM") if __name__ == '__main__': import timeit print(timeit.timeit(stmt="base62_encode_r(2347878234)", setup="from __main__ import base62_encode_r", number=100000)) print(timeit.timeit(stmt="base62_encode_i(2347878234)", setup="from __main__ import base62_encode_i", number=100000)) print(timeit.timeit(stmt="base62_decode_r('2yTsnM')", setup="from __main__ import base62_decode_r", number=100000)) print(timeit.timeit(stmt="base62_decode_i('2yTsnM')", setup="from __main__ import base62_decode_i", number=100000)) 0.270266867033 0.260915645986 0.344734796766 0.311662500262 

现在有一个python库。

我正在为此制作一个点子包。

我build议你使用我的bases.py https://github.com/kamijoutouma/bases.py ,它的灵感来自bases.js

 from bases import Bases bases = Bases() bases.toBase16(200) // => 'c8' bases.toBase(200, 16) // => 'c8' bases.toBase62(99999) // => 'q0T' bases.toBase(200, 62) // => 'q0T' bases.toAlphabet(300, 'aAbBcC') // => 'Abba' bases.fromBase16('c8') // => 200 bases.fromBase('c8', 16) // => 200 bases.fromBase62('q0T') // => 99999 bases.fromBase('q0T', 62) // => 99999 bases.fromAlphabet('Abba', 'aAbBcC') // => 300 

参考https://github.com/kamijoutouma/bases.py#known-basesalphabets什么基地可用;

这是我的解决scheme:

 def base62(a): baseit = (lambda a=a, b=62: (not a) and '0' or baseit(aa%b, b*62) + '0123456789abcdefghijklmnopqrstuvwxyz' 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'[a%b%61 or -1*bool(a%b)]) return baseit() 

说明

在任何基数中,每个数字都等于a1+a2*base**2+a3*base**3...所以目标是find所有的s。

对于每一个N=1,2,3...代码通过对b=base**(N+1)进行“模”赋值来隔离aN*base**N ,其中所有切片都大于N ,并且切片所有的a ,通过每次递减当前的aN*base**Nrecursion调用它们的序列,使它们的序列小于aN*base**N

Base%(base-1)==1因此base**p%(base-1)==1 ,因此q*base^p%(base-1)==q只有一个例外,当q==base-1返回0 。 为了解决这个问题,它返回0 。 该函数从一开始就检查0


优点

在这个例子中,只有一个乘法(而不是除法)和一些模数运算,这些都是相对较快的。

对不起,我无法帮助你在这里的图书馆。 我宁愿使用base64,只是增加额外的字符,你的select – 如果可能的话!

然后你可以使用base64模块。

如果这是真的,真的不可能:

你可以这样做自己(这是伪代码):

 base62vals = [] myBase = 62 while num > 0: reminder = num % myBase num = num / myBase base62vals.insert(0, reminder) 
Interesting Posts