用Python将列表转换为透视表
我有一个列表,里面包含了很多小列表,格式如下:
listA = [[142L, u'Work Load', [57.35, 19.57]],
[142L, u'Days', [84.0, 44.0]],
[142L, u'Payed', [5684.0, 3944.0]],
[547L, u'Work Load', [87.25, 12.70]],
[547L, u'Days', [98.0, 128.0]],
[547L, u'Payed', [3247.0, 4712.0]],
...]
我想把它转换成下面这个样子:
listB = [['id', u'Work Load', u'Days', u'Payed'],
[142L, 57.35, 84.0, 5684.0],
[142L, 19.57, 44.0, 3944.0],
[547L, 87.25, 98.0, 3247.0],
[547L, 12.70, 128.0, 4712.0],
...]
我该怎么把它们分组呢?我这么做的原因是因为我想把这个列表导出为csv格式的文件。有什么好主意吗?谢谢!
3 个回答
1
使用 pandas 库:
import pandas as pd
listA = [[142L, u'Work Load', [57.35, 19.57]],
[142L, u'Days', [84.0, 44.0]],
[142L, u'Payed', [5684.0, 3944.0]],
[547L, u'Work Load', [87.25, 12.70]],
[547L, u'Days', [98.0, 128.0]],
[547L, u'Payed', [3247.0, 4712.0]]]
dfA = pd.DataFrame(listA)
dfA.columns = ['id','field','data']
dfB = dfA.groupby('id').apply(
lambda grp: pd.DataFrame(zip(*grp['data']), columns=grp['field']))
dfB.index = dfB.index.droplevel(-1)
print(dfB)
会得到一个数据表(DataFrame)
field Work Load Days Payed
id
142 57.35 84 5684
142 19.57 44 3944
547 87.25 98 3247
547 12.70 128 4712
然后你可以用下面的代码把这个数据表写入一个CSV文件:
dfB.to_csv('/tmp/test.csv', sep=',')
结果看起来会是这样的:
id,Work Load,Days,Payed
142,57.35,84.0,5684.0
142,19.57,44.0,3944.0
547,87.25,98.0,3247.0
547,12.7,128.0,4712.0
2
因为你展示的这个列表是按照“id”排序的,所以你可以直接使用 itertools.groupby
和 zip
来处理。
In [189]: lst #if not sorted, lst.sort(key=lambda x: x[0]) first
Out[189]:
[[142L, u'Work Load', [57.35, 19.57]],
[142L, u'Days', [84.0, 44.0]],
[142L, u'Payed', [5684.0, 3944.0]],
[547L, u'Work Load', [87.25, 12.7]],
[547L, u'Days', [98.0, 128.0]],
[547L, u'Payed', [3247.0, 4712.0]]]
In [190]: lstB=[['id', u'Work Load', u'Days', u'Payed'],]
...: for k, g in itertools.groupby(lst, lambda x: x[0]):
...: t=zip(*(i[-1] for i in g))
...: for i in t:
...: lstB.append([k]+list(i))
#outputs:
In [587]: lstB
Out[587]:
[['id', u'Work Load', u'Days', u'Payed'],
[142L, 57.35, 84.0, 5684.0],
[142L, 19.57, 44.0, 3944.0],
[547L, 87.25, 98.0, 3247.0],
[547L, 12.7, 128.0, 4712.0]]
0
这是我问题的一个有效解决方案:
import pandas as pd
dt = [[142L, u'Work Load', [57.35, 19.57]],
[142L, u'Days', [84.0, 44.0]],
[142L, u'Payed', [5684.0, 3944.0]],
[547L, u'Work Load', [87.25, 12.70]],
[547L, u'Days', [98.0, 128.0]],
[547L, u'Payed', [3247.0, 4712.0]]]
dfA = pd.DataFrame(dt)
dfA.columns = [u'id','field','data']
dfB = dfA.groupby(u'id').apply(
lambda grp: pd.DataFrame(zip(*grp['data']), columns=grp['field']))
dfB.index = dfB.index.droplevel(-1)
data = StringIO()
dfB.to_csv(data, sep=';', encoding='utf-8')
self.response = HttpResponse(data.getvalue(), mimetype='text/csv')
self.add_response_headers()
self.response.close()
self.response.flush()
return self.response
谢谢你的帮助。