用python-newbi抓取网页

2024-06-16 10:48:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我只是在学习python&;网页抓取,我试图从Atheraces&;中抓取分段时间;我可以将数据放入电子表格,但都是垂直的&;我想得到它作为一个水平表(如显示在网站上),到目前为止,我有这个

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = "http://www.attheraces.com/ajax/getContent.aspx?ctype=sectionalsracecardresult&raceid=1062194&page=/racecard/Windsor/8-October-2018/1325&dtype=times"

uClient = uReq (my_url)
page_html =uClient.read()
uClient.close()

page_soup=soup(page_html, "html.parser")

containers = page_soup.findAll ("div",{"class":"card-body__td card-body__td--centred card-cell__time card-cell__time--8-sectionals"})

filename = "sectionals.csv"
f= open (filename, "w")

headers = "sectional\n"

f.write(headers)

for container in containers:
    sectional = container.div.div.span.text

    print(sectional)


    f.write(sectional + "," + "\n")

f.close()   

Tags: fromimportdivurlclosemyhtmlas
1条回答
网友
1楼 · 发布于 2024-06-16 10:48:47

如果直接转到单元格,则必须对行进行假设。从行开始:

containers = page_soup.findAll("div", {"class":"card-cell card-cell primary card-cell primary no-only"})

# Open a file handle here and use it to create a csv writer (I like to use DictWriter).

for container in containers:
    row = []

    for cell in container.findAll("div", {"class":"card-body__td card-body__td centred card-cell__time card-cell__time 8-sectionals"}):
        sectional = cell.div.div.span.text
        row.append(sectional)

    # Write a row to your csv writer here.
    print(row)

研究如何使用Python的^{}模块来避免常见问题。另外,with语法是确保资源管理正确的好方法^{} supports this,文件(with open('...', 'r') as:)也是如此,这些can be used together

相关问题 更多 >