从一批文件中提取文本,并将其写入Excel fi

2024-04-18 15:23:04 发布

您现在位置:Python中文网/ 问答频道 /正文

(环境:Python 2.7.6 Shell空闲+BeautifulSoup 4.3.2+)

我想从一批文件(大约50个文件)中提取一些文本,然后很好地将它们放入一个Excel文件中,可以是一行一行,也可以是一列一列。你知道吗

每个文件中的文本示例包含以下内容:

<tr> 
    <td width=25%>
        Arnold Ed   
    </td>
    <td width=15%>
        18 Feb 1959     
    </td>
</tr>
<tr> 
    <td width=15%>
        男性
    </td>   
    <td width=15%>
        02 March 2002   
    </td>
</tr>
<tr>
    <td width=15%>
        Guangxi         
    </td>   
</tr>

到目前为止我所做的工作如下所示。方法是一个接一个地读文件。这些代码在文本拾取部分之前运行良好,但它们不会写入Excel文件。你知道吗

from bs4 import BeautifulSoup
import xlwt

list_open = open("c:\\file list.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")


for each_file in line_in_list:
    page = open(each_file)
    soup = BeautifulSoup(page.read())

    all_texts = soup.find_all("td")

    for a_t in all_texts:
        a = a_t.renderContents()

        #"print a" here works ok

    book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
    sheet = book.add_sheet('namelist', cell_overwrite_ok = True)
    sheet.write (0, 0, a)
    book.save("C:\\details.xls")

实际上,它只是将最后一段文本写入Excel文件。那么我怎样才能正确地完成呢?你知道吗


在laike9m的帮助下,最终版本是:

list_open = open("c:\\file list.txt")
read_list = list_open.read()
line_in_list = read_list.split("\n")

book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('namelist', cell_overwrite_ok = True)

for i,each_file in enumerate(line_in_list):
    page = open(each_file)
    soup = BeautifulSoup(page.read())

    all_texts = soup.find_all("td")

    for j,a_t in enumerate(all_texts):
        a = a_t.renderContents()
        sheet.write (i, j, a)

book.save("C:\\details.xls")

Tags: 文件in文本readlineopenallwidth
1条回答
网友
1楼 · 发布于 2024-04-18 15:23:04

您没有将最后四行放入for循环。我想这就是为什么它只把最后一段文本写入Excel文件。你知道吗

编辑

book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('namelist', cell_overwrite_ok = True)

for i, each_file in enumerate(line_in_list):
    page = open(each_file)
    soup = BeautifulSoup(page.read())

    all_texts = soup.find_all("td")

    for j, a_t in enumerate(all_texts):
        a = a_t.renderContents()                   
        sheet.write(i, j, a)

book.save("C:\\details.xls")

相关问题 更多 >