需要帮助进行网页抓取并将结果保存为Excel使用CS

3条回答

网友

1楼 · 编辑于 2024-04-25 02:00:19

您的代码只存储最后一个剧院-这是一个逻辑错误。您需要将每个剧院result行存储在所有theaters的列表中，并将其写入文件：

# ... your code snipped fro brevity  ...

theaters = []  # collect all theaters here

for i in range(10):
    payload={'pageIndex':i}

    # ... snipp ...

    for j in range(len(rows)):
        col=rows[j].find_all('td')
        result=[]
        for item in col:
            result.append(item.get_text())

        theaters.append(result)

    # ... snipp ...

headers = ['City','District','Code','Name','NumScreen','NumSeats', 
           'Permanent', 'Registered', 'License','OpenDate','Run']

# no need for 2 context's unless you have an existing file you want to delete
# every time you run your script
with open(r"C:\Users\lwt04\Desktop\TheaterInfo.csv","w",newline='') as out:
    theater = csv.writer(out)
    theater.writerow(headers)
    theater = csv.writer(out)
    theater.writerows(theaters)  # writerowS here

如果您想附加，或者创建look into Check a file exists or not without try-catch block，并考虑将opening mode设置为变量'w'或'a'，具体取决于文件是'w'还是写头，否则只写数据。你知道吗

附录-您不是在写excel，而是在写一个可以由excel打开的CSV文件。要直接编写excel，请使用适当的模块，如f.e.此处：https://openpyxl.readthedocs.io/en/stable/

HTH公司

网友

2楼 · 编辑于 2024-04-25 02:00:19

将results保存到另一个列表，并将该列表写入csvfile。你知道吗

import requests
from bs4 import BeautifulSoup
import csv

url='http://www.kobis.or.kr/kobis/business/mast/thea/findTheaterInfoList.do'
headers = ['City','District','Code','Name','NumScreen','NumSeats', 
           'Permanent', 'Registered', 'License','OpenDate','Run']

data=[]
for i in range(1,10):
    payload={'pageIndex':i}
    r=requests.post(url, params=payload)
    soup=BeautifulSoup(r.text, 'html.parser')
    table=soup.find("table", class_="tbl_comm")
    rows=table.find('tbody').find_all('tr')
    for row in rows:
        result=[]
        for cell in row.find_all(['td', 'th']):
            result.append(cell.get_text())
        if result:
            data.append(result)

with open(r"C:\Users\lwt04\Desktop\TheaterInfo.csv", 'w') as fp:
    writer = csv.writer(fp)
    writer.writerow(headers)
    writer.writerows(data)

网友

3楼 · 编辑于 2024-04-25 02:00:19

您也可以使用pandas来实现这个目的。你只需要为result做些什么。你知道吗

import pandas as pd
df = pd.DataFrame([result], columns=['City','District','Code','Name','NumScreen','NumSeats', 'Permanent', 'Registered', 'License','OpenDate','Run'])

df.to_csv('filename.csv', delimiter=',')

对于CSV

您可以简单地对result使用它，因为它只是数据的一行。如果你想做listofresult，对于多个条目都可以处理。你知道吗

listofresult = []
   for i in range(10):
    payload={'pageIndex':i}
    r=requests.post(url, params=payload)
    soup=BeautifulSoup(r.text, 'html.parser')
    table=soup.find('table')
    rows=table.find('tbody').find_all('tr')

    for j in range(len(rows)):
        col=rows[j].find_all('td')
        result=[]
        for item in col:
            result.append(item.get_text())
listofresult.append(result)

with open('filename.csv', 'w') as f:
    writer = csv.writer(f)
    # Write the headers
    headers = ['City','District','Code','Name','NumScreen','NumSeats', 
           'Permanent', 'Registered', 'License','OpenDate','Run']
    writer.writerow(headers)
    writer.writerows([result]) # Per current
    writer.writerows(listofresult) ## For multiple list

相关问题更多 >

编程相关推荐

热门问题

热门文章