如何在Python中使用BeautifulSoup将表格存储为每行一个元素的变量,并用分隔符区分列?
我想把一个HTML表格存储在一个叫做store的变量里。
html = ['<html><body><p align="center"><table><tr><td>row1col1</td><td>row1col2</td><td>row1col3</td></tr><tr><td>row2col1</td><td>row2col2</td><td>row2col3</td></tr></table></html>']
soup = BeautifulSoup(''.join(html))
table = soup.find('table')
rows = table.findAll('tr')
store = []
row = []
numcols = []
for tr in rows:
cols = tr.findAll('td')
for td in cols:
try:
text = ''.join(td.find(text=True))
except Exception:
text = ''
text = text+"|"
row.append(text)
store = ''.join(row)
print store
下面是输出的内容:
row1col1|row1col2|row1col3|row2col1|row2col2|row2col3|
我希望把每一行单独存储在“store”变量里,这样每一行就会在“store”的一个元素中,每一列之间用|符号分隔。目前,我无法分辨哪些项目属于哪一行。有没有什么好主意可以做到这一点?
1 个回答
3
我猜您想要的可能是这样的:
html = '<html><body><p align="center"><table><tr><td>row1col1</td><td>row1col2</td><td>row1col3</td></tr><tr><td>row2col1</td><td>row2col2</td><td>row2col3</td></tr></table></html>'
soup = BeautifulSoup(html)
table = soup.find('table')
rows = table.findAll('tr')
store = []
for tr in rows:
cols = tr.findAll('td')
row = []
for td in cols:
try:
row.append(''.join(td.find(text=True)))
except Exception:
row.append('')
store.append('|'.join(row))
print '\n'.join(store)