(环境:Python 2.7.6 Shell空闲+BeautifulSoup 4.3.2+)
我想从一批文件(大约50个文件)中提取一些文本,然后很好地将它们放入一个Excel文件中,可以是一行一行,也可以是一列一列。你知道吗
每个文件中的文本示例包含以下内容:
<tr>
<td width=25%>
Arnold Ed
</td>
<td width=15%>
18 Feb 1959
</td>
</tr>
<tr>
<td width=15%>
男性
</td>
<td width=15%>
02 March 2002
</td>
</tr>
<tr>
<td width=15%>
Guangxi
</td>
</tr>
到目前为止我所做的工作如下所示。方法是一个接一个地读文件。这些代码在文本拾取部分之前运行良好,但它们不会写入Excel文件。你知道吗
from bs4 import BeautifulSoup
import xlwt
list_open = open("c:\\file list.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")
for each_file in line_in_list:
page = open(each_file)
soup = BeautifulSoup(page.read())
all_texts = soup.find_all("td")
for a_t in all_texts:
a = a_t.renderContents()
#"print a" here works ok
book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('namelist', cell_overwrite_ok = True)
sheet.write (0, 0, a)
book.save("C:\\details.xls")
实际上,它只是将最后一段文本写入Excel文件。那么我怎样才能正确地完成呢?你知道吗
在laike9m的帮助下,最终版本是:
list_open = open("c:\\file list.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")
book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('namelist', cell_overwrite_ok = True)
for i,each_file in enumerate(line_in_list):
page = open(each_file)
soup = BeautifulSoup(page.read())
all_texts = soup.find_all("td")
for j,a_t in enumerate(all_texts):
a = a_t.renderContents()
sheet.write (i, j, a)
book.save("C:\\details.xls")
您没有将最后四行放入
for
循环。我想这就是为什么它只把最后一段文本写入Excel文件。你知道吗编辑
相关问题 更多 >
编程相关推荐