从MySQL到Python创建计数向量的有效方法

2024-05-29 04:17:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我在MySQL中有这样的数据(示例):

表1:

ID     ITEM    CNT
--------------------
0001    AAB     5
0001    BBA     3
0001    BBB     8
0001    AAC     10
0002    BBA     2
0002    BBC     7
0003    FFG     2
0003    JPO     4
0003    PUI     22
..........

我想找到一种方法,以计数向量的形式在Python中导入这些数据,例如:

0001 = [5,10,3,8,0,0,0,0]
0002 = [0,0,2,0,7,0,0,0]
0003 = [0,0,0,0,0,0,4,22]

其中元素表示此表单中每个id的所有项目的计数:[AAB、AAC、BBA、BBB、BBC、FFG、JPO、PUI]

所以我想问,什么是最好和最有效的方式来实现这一点?从python还是mysql来做比较好?怎么做?你知道吗

谢谢


Tags: 数据id示例mysqlitembbc计数bbb
2条回答

在可能的情况下,用SQL而不是Python来操作数据通常效率更高。你知道吗

使用此设置:

import config
import MySQLdb
conn = MySQLdb.connect(
    host=config.HOST, user=config.USER,
    passwd=config.PASS, db='test')
cursor = conn.cursor()

sql = '''\
DROP TABLE IF EXISTS foo 
'''
cursor.execute(sql)

sql = '''\
CREATE TABLE foo (
    ID varchar(4),
    ITEM varchar(3),
    CNT int)
'''

cursor.execute(sql)

sql = '''\
INSERT INTO foo VALUES (%s,%s,%s)
'''

cursor.executemany(sql, [['0001', 'AAB', 5],
                         ['0001', 'BBA', 3],
                         ['0001', 'BBB', 8],
                         ['0002', 'BBA', 2]])

您可以通过以下方式形成所需的SQL:

items = 'AAB AAC BBA BBB BBC FFG JPO PUI'.split()
fields = ', '.join('COALESCE({}.CNT, 0)'.format(item) for item in items)
joins = '\n'.join('''\
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = '{i}') as {i}
    ON T.ID = {i}.ID'''.format(i=item) for item in items)
sql = '''\
SELECT T.ID, {f}
FROM (SELECT DISTINCT ID from foo) as T
{j}
'''.format(f=fields, j=joins)

print(sql)

像这样使用:

result = dict()
cursor.execute(sql)
for row in cursor:
    result[row[0]] = row[1:]
print(result)    

使用的SQL查询是:

SELECT T.ID, COALESCE(AAB.CNT, 0), COALESCE(AAC.CNT, 0), COALESCE(BBA.CNT, 0), COALESCE(BBB.CNT, 0), COALESCE(BBC.CNT, 0), COALESCE(FFG.CNT, 0), COALESCE(JPO.CNT, 0), COALESCE(PUI.CNT, 0)
FROM (SELECT DISTINCT ID from foo) as T
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'AAB') as AAB
    ON T.ID = AAB.ID
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'AAC') as AAC
    ON T.ID = AAC.ID
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'BBA') as BBA
    ON T.ID = BBA.ID
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'BBB') as BBB
    ON T.ID = BBB.ID
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'BBC') as BBC
    ON T.ID = BBC.ID
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'FFG') as FFG
    ON T.ID = FFG.ID
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'JPO') as JPO
    ON T.ID = JPO.ID
LEFT JOIN (SELECT ID, CNT FROM foo WHERE ITEM = 'PUI') as PUI
    ON T.ID = PUI.ID

结果是:

{'0001': (5L, 0L, 3L, 8L, 0L, 0L, 0L, 0L), '0002': (0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L)}

我知道你要求的

0001 = [5,10,3,8,0,0,0,0]
0002 = [0,0,2,0,7,0,0,0]
0003 = [0,0,0,0,0,0,4,22]

但这至少有两个问题。首先,0001不是有效的Python变量名。变量名不能以数字开头。第二,您不希望动态定义变量名,因为很难使用运行时才知道的裸变量名进行编程。你知道吗

相反,在dict中使用可能的变量名作为键,result。然后可以用result['0001']引用“变量”0001。你知道吗

您可以通过交叉表查询进行访问,其中行标题是id,列标题是item,cnt是要聚合的值。然后,您可以循环通过每一行的每一列来获得向量。有关交叉表查询的帮助,请参见此处:http://allenbrowne.com/ser-67.html

相关问题 更多 >

    热门问题