需要帮助进行网页抓取并将结果保存为Excel使用CS

2024-04-25 02:00:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要网站刮网址,并保存到excel像我上传的图像

但我不知道我的代码怎么了

我的excel文件中只有一行。请帮帮我。你知道吗

import requests
from bs4 import BeautifulSoup
import csv


for i in range(10):
    payload={'pageIndex':i}
    r=requests.post(url, params=payload)
    soup=BeautifulSoup(r.text, 'html.parser')
    table=soup.find('table')
    rows=table.find('tbody').find_all('tr')

    for j in range(len(rows)):
        col=rows[j].find_all('td')
        result=[]
        for item in col:
            result.append(item.get_text())

with open(r"C:\Users\lwt04\Desktop\TheaterInfo.csv","w",newline='') as out:
    theater = csv.writer(out)

with open(r"C:\Users\lwt04\Desktop\TheaterInfo.csv","a",newline='') as out:
    theater = csv.writer(out)
    theater.writerow(result)

Tags: csvinimportfortablerangeresultfind
3条回答

您的代码只存储最后一个剧院-这是一个逻辑错误。您需要将每个剧院result行存储在所有theaters的列表中,并将其写入文件:

# ... your code snipped fro brevity  ...

theaters = []  # collect all theaters here

for i in range(10):
    payload={'pageIndex':i}

    # ... snipp ...

    for j in range(len(rows)):
        col=rows[j].find_all('td')
        result=[]
        for item in col:
            result.append(item.get_text())

        theaters.append(result)

    # ... snipp ...

headers = ['City','District','Code','Name','NumScreen','NumSeats', 
           'Permanent', 'Registered', 'License','OpenDate','Run']

# no need for 2 context's unless you have an existing file you want to delete
# every time you run your script
with open(r"C:\Users\lwt04\Desktop\TheaterInfo.csv","w",newline='') as out:
    theater = csv.writer(out)
    theater.writerow(headers)
    theater = csv.writer(out)
    theater.writerows(theaters)  # writerowS here

如果您想附加,或者创建look into Check a file exists or not without try-catch block,并考虑将opening mode设置为变量'w''a',具体取决于文件是'w'还是写头,否则只写数据。你知道吗


附录-您不是在写excel,而是在写一个可以由excel打开的CSV文件。要直接编写excel,请使用适当的模块,如f.e.此处:https://openpyxl.readthedocs.io/en/stable/

HTH公司

results保存到另一个列表,并将该列表写入csvfile。你知道吗

import requests
from bs4 import BeautifulSoup
import csv

url='http://www.kobis.or.kr/kobis/business/mast/thea/findTheaterInfoList.do'
headers = ['City','District','Code','Name','NumScreen','NumSeats', 
           'Permanent', 'Registered', 'License','OpenDate','Run']

data=[]
for i in range(1,10):
    payload={'pageIndex':i}
    r=requests.post(url, params=payload)
    soup=BeautifulSoup(r.text, 'html.parser')
    table=soup.find("table", class_="tbl_comm")
    rows=table.find('tbody').find_all('tr')
    for row in rows:
        result=[]
        for cell in row.find_all(['td', 'th']):
            result.append(cell.get_text())
        if result:
            data.append(result)

with open(r"C:\Users\lwt04\Desktop\TheaterInfo.csv", 'w') as fp:
    writer = csv.writer(fp)
    writer.writerow(headers)
    writer.writerows(data)

您也可以使用pandas来实现这个目的。你只需要为result做些什么。你知道吗

import pandas as pd
df = pd.DataFrame([result], columns=['City','District','Code','Name','NumScreen','NumSeats', 'Permanent', 'Registered', 'License','OpenDate','Run'])

df.to_csv('filename.csv', delimiter=',')

对于CSV

您可以简单地对result使用它,因为它只是数据的一行。如果你想做listofresult,对于多个条目都可以处理。你知道吗

listofresult = []
   for i in range(10):
    payload={'pageIndex':i}
    r=requests.post(url, params=payload)
    soup=BeautifulSoup(r.text, 'html.parser')
    table=soup.find('table')
    rows=table.find('tbody').find_all('tr')

    for j in range(len(rows)):
        col=rows[j].find_all('td')
        result=[]
        for item in col:
            result.append(item.get_text())
listofresult.append(result)

with open('filename.csv', 'w') as f:
    writer = csv.writer(f)
    # Write the headers
    headers = ['City','District','Code','Name','NumScreen','NumSeats', 
           'Permanent', 'Registered', 'License','OpenDate','Run']
    writer.writerow(headers)
    writer.writerows([result]) # Per current
    writer.writerows(listofresult) ## For multiple list

相关问题 更多 >