使用PYODBC将BitTorrent位域插入MSSQL中的VarBinary(MAX)
我正在做一个与BitTorrent有关的项目,在这个项目中,我收到了一个以Python字符串形式表示的位图。比如说:
bitfield = "000001110100111000110101100010"
我想把这个Python字符串转换成一种格式,这样就可以直接插入到MSSQL数据库的varbinary(max)列中,使用PYODBC来操作。如果我直接把它作为字符串插入,当然会出现非法转换的错误。
根据PYODBC的文档,它需要一个字节数组或缓冲区作为varbinary字段的输入。
如果有任何建议,我会非常感激。
2 个回答
在开始写代码之前,我想先给大家一个建议:'bitfield'的值并不是可以直接用字节来划分的长度。我建议在处理位串的时候,最好以字节为单位来增加它的大小(比如说,如果len(bitfield)%8 != 0: print '确保bitfield可以完全用字节表示!'),这样可以避免在不同的编程语言、编程语言中的不同库以及不同的数据库中对字段操作时产生歧义。换句话说,数据库、Python以及我推荐的库等,都会以字节数组的形式来存储或表示这个位数组。如果提供的位数组不能均匀地划分成字节,会发生以下三种情况之一: 1) 会抛出一个错误(这是比较乐观的情况) 2) 位数组会自动在左边填充。 3) 位数组会自动在右边填充。
我建议使用某种位串库。我曾经使用过python-bitstring来实现这个功能。我没有花时间去处理ODBC,但基本思路是一样的,并且参考了srgerg的回答:
示例:
#!/usr/bin/python
import pymssql
from binascii import hexlify
from bitstring import BitArray
dbconninfo = {'host': 'hostname', 'user': 'username', 'password': 'secret', 'database': 'bitexample', 'as_dict': True}
conn = pymssql.connect(**dbconninfo)
cursor = conn.cursor()
bitfield = "000001110100111000110101100010"
ba = BitArray(bin=bitfield)
print '%32d (bitfield -> BitArray -> int)' % ba.int
cursor.execute("CREATE TABLE bin_test (bin_col varbinary(max) )")
cursor.execute("INSERT INTO bin_test values (%s)", (ba.int,))
cursor.execute("SELECT bin_col FROM bin_test")
results = cursor.fetchone()['bin_col'] # results now contains binary packed data '\x01\xd3\x8db'
conn.rollback()
results_int = int(hexlify(results),16)
print '%32d (bitfield -> BitArray -> int -> DB (where data is binary packed) -> unpacked with hexlify -> int)' % results_int
print '%32s (Original bitfield)' % bitfield
from_db_using_ba_hexlify_and_int_with_length = BitArray(int=int(hexlify(results),16), length=30).bin
print '%32s (From DB, decoded with hexlify, using int to instantiate BitArray, specifying length of int as 30 bits, out as bin)' %
from_db_using_ba_hexlify_and_int_with_length
from_db_using_ba_hex = BitArray(hex=hexlify(results)).bin # Can't specify length with hex
print '%32s (From DB, decoded with hexlify, using hex to instantiate BitArray, can not specify length, out as bin)' % from_db_using_ba_hex
from_db_using_ba_bytes_no_length = BitArray(bytes=results).bin # Can specify length with bytes... that's next.
print '%32s (From DB, using bytes to instantiate BitArray, no length specified, out as bin)' % from_db_using_ba_bytes_no_length
from_db_using_ba_bytes = BitArray(bytes=results,length=30).bin
print '%32s (From DB, using bytes to instantiate BitArray, specifying length of bytes as 30 bits, out as bin)' % from_db_using_ba_bytes
from_db_using_hexlify_bin = bin(int(hexlify(results),16))
print '%32s (from DB, decoded with hexlify -> int -> bin)' % from_db_using_hexlify_bin
from_db_using_hexlify_bin_ba = BitArray(bin=bin(int(hexlify(results),16))).bin
print '%32s (from DB, decoded with hexlify -> int -> bin -> BitArray instantiated with bin)' % from_db_using_hexlify_bin
from_db_using_bin = bin(int(results,16))
print '%32s (from DB, no decoding done, using bin)' % from_db_using_bin
这个的输出是:
30641506 (bitfield -> BitArray -> int)
30641506 (bitfield -> BitArray -> int -> DB (where data is binary packed) -> unpacked with hexlify -> int)
000001110100111000110101100010 (Original bitfield)
000001110100111000110101100010 (From DB, decoded with hexlify, using int to instantiate BitArray, specifying length of int as 30 bits, out as bin)
00000001110100111000110101100010 (From DB, decoded with hexlify, using hex to instantiate BitArray, can not specify length, out as bin)
00000001110100111000110101100010 (From DB, using bytes to instantiate BitArray, no length specified, out as bin)
000000011101001110001101011000 (From DB, using bytes to instantiate BitArray, specifying length of bytes as 30 bits, out as bin)
0b1110100111000110101100010 (from DB, decoded with hexlify -> int -> bin)
0b1110100111000110101100010 (from DB, decoded with hexlify -> int -> bin -> BitArray instantiated with bin)
Traceback (most recent call last):
File "./bitexample.py", line 38, in <module>
from_db_using_bin = bin(int(results,16))
ValueError: invalid literal for int() with base 16: '\x01\xd3\x8db'
注意,由于你没有一个可以直接拆分成字节的位串(它是一个表示30位的字符串),所以获取完全相同的字符串的唯一方法是指定一个长度,即使这样,结果也会因为BitArray的实例化方式不同而不一致。
假设你在使用较新版本的Python,你可以利用标准库中的struct
模块和bin
函数。这里有个简单的例子:
con = pyodbc.connect("...")
con.execute("CREATE TABLE bin_test ( bin_col varbinary(max) )")
con.execute("INSERT INTO bin_test VALUES (?)",
(int("000001110100111000110101100010", 2),))
result = con.execute("SELECT * FROM bin_test").fetchone()
bin(struct.unpack(">I", result[0])[0])
最后一行的结果是
'0b1110100111000110101100010'
这就是最初的位字段(去掉了前面的零)。
你可以在docs.python.org上找到struct
模块的文档。bin
函数的文档也可以在同一个地方找到。