在Python字典中存储电子表格的列

Species Garden Hedgerow Parkland Pasture Woodland Blackbird 47 10 40 2 2 Chaffinch 19 3 5 0 2 Great Tit 50 0 10 7 0 House Sparrow 46 16 8 4 0 Robin 9 3 0 0 2 Song Thrush 4 0 6 0 0

from xlrd import open_workbook wb = open_workbook("Sample.xls") headers = [] sdata = [] for s in wb.sheets(): print "Sheet:",s.name if s.name.capitalize() == "Data": for row in range(s.nrows): values = [] for col in range(s.ncols): data = s.cell(row,col).value if row == 0: headers.append(data) else: values.append(data) sdata.append(values)

[[u'Blackbird', 47.0, 10.0, 40.0, 2.0, 2.0], [u'Chaffinch', 19.0, 3.0, 5.0, 0.0, 2.0], [u'Great Tit', 50.0, 0.0, 10.0, 7.0, 0.0], [u'House Sparrow', 46.0, 16.0, 8.0, 4.0, 0.0], [u'Robin', 9.0, 3.0, 0.0, 0.0, 2.0], [u'Song Thrush', 4.0, 0.0, 6.0, 0.0, 0.0]]

3条回答

网友

1楼 · 编辑于 2024-05-14 08:55:05

一。XLRD公司

我强烈建议使用collections库中的defaultdict。每个键的值将使用默认值启动，在本例中为空列表。我没有在那里放置那么多异常捕获，您可能希望根据您的用例添加异常检测。

import xlrd
import sys
from collections import defaultdict
result = defaultdict(list)
workbook = xlrd.open_workbook("/Users/datafireball/Desktop/stackoverflow.xlsx")
worksheet = workbook.sheet_by_name(workbook.sheet_names()[0])

headers = worksheet.row(0)
for index in range(worksheet.nrows)[1:]:
    try:
        for header, col in zip(headers, worksheet.row(index)):
            result[header.value].append(col.value)
    except:
        print sys.exc_info()

print result

输出：

defaultdict(<type 'list'>, 
{u'Garden': [47.0, 19.0, 50.0, 46.0, 9.0, 4.0], 
u'Parkland': [40.0, 5.0, 10.0, 8.0, 0.0, 6.0], 
u'Woodland': [2.0, 2.0, 0.0, 0.0, 2.0, 0.0], 
u'Hedgerow': [10.0, 3.0, 0.0, 16.0, 3.0, 0.0], 
u'Pasture': [2.0, 0.0, 7.0, 4.0, 0.0, 0.0], 
u'Species': [u'Blackbird', u'Chaffinch', u'Great Tit', u'House Sparrow', u'Robin', u'Song Thrush']})

2。熊猫

import pandas as pd
xl = pd.ExcelFile("/Users/datafireball/Desktop/stackoverflow.xlsx")
df = xl.parse(xl.sheet_names[0])
print df

输出，您无法想象使用dataframe可以获得多大的灵活性。

             Species  Garden  Hedgerow  Parkland  Pasture  Woodland
0      Blackbird      47        10        40        2         2
1      Chaffinch      19         3         5        0         2
2      Great Tit      50         0        10        7         0
3  House Sparrow      46        16         8        4         0
4          Robin       9         3         0        0         2
5    Song Thrush       4         0         6        0         0

网友

2楼 · 编辑于 2024-05-14 08:55:05

一旦你有了专栏，就相当容易了：

dict(zip(headers, sdata))

实际上，您的示例中的sdata可能是行数据，即使如此，这仍然相当简单，您还可以使用zip来转置表：

dict(zip(headers, zip(*sdata)))

其中一个是你想要的。

网友

3楼 · 编辑于 2024-05-14 08:55:05

我将贡献自己，为自己的问题提供另一个答案！

就在我提出问题之后，我发现了pyexcel——一个相当小的Python库，它充当其他电子表格处理包（即xlrd和odfpy）的包装器。它有一个很好的to_dict方法，它可以完全满足我的要求（即使不需要转置表）！

下面是一个例子，使用上面的数据：

from pyexcel import SeriesReader
from pyexcel.utils import to_dict

sheet = SeriesReader("Sample.xls")
print sheet.series() #--- just the headers, stored in a list
data = to_dict(sheet)
print data #--- the full dataset, stored in a dictionary

输出：

u'Species', u'Garden', u'Hedgerow', u'Parkland', u'Pasture', u'Woodland']
{u'Garden': [47.0, 19.0, 50.0, 46.0, 9.0, 4.0], u'Hedgerow': [10.0, 3.0, 0.0, 16.0, 3.0, 0.0], u'Pasture': [2.0, 0.0, 7.0, 4.0, 0.0, 0.0], u'Parkland': [40.0, 5.0, 10.0, 8.0, 0.0, 6.0], u'Woodland': [2.0, 2.0, 0.0, 0.0, 2.0, 0.0], u'Species': [u'Blackbird', u'Chaffinch', u'Great Tit', u'House Sparrow', u'Robin', u'Song Thrush']}

希望也有帮助！

相关问题更多 >

编程相关推荐

热门问题

热门文章