使用for循环函数进行数据刮取

2024-06-11 08:13:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在努力学习数据清理,通过从oddschecker网站上清理马匹,我已经走到了这一步。我用的是Python和Spyder

我目前正处于以下代码为我提供所需信息的阶段:

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup

#create fetchsoup fuction using mozilla
def fetchSoup(url, userAgent='Mozilla/5.0' ):
    req = Request(url, headers={'User-Agent': userAgent})
    with urlopen(req) as response:
        html = response.read()
    return BeautifulSoup(html, "lxml")

#define my url
url = 'https://www.oddschecker.com/horse-racing/chelmsford-city/20:30/top-3-finish'
soup = fetchSoup(url)

#created a new variable, open and close bracket means fuction called and result assigned to the html variable.
html=soup.prettify()

#defined another variable, where we are spliting the above by the flass of diff-row evtabrow bc, which is where the horses are split within the HTML
splitperhorse={'class':'diff-row evTabRow bc'}
#and again
horseinfo=soup.find_all('tr',splitperhorse)

--这将按如下方式拆分数据:

 horseinfo[0]

 <tr class="diff-row evTabRow bc" data-best-bks="B3,BF,MK" data-best-dig="1.5" data-bid="26459041728" data-bname="Sharney" data-hcap="" data-hcap-sort="1" data-stall="7"><td class="cardnum">9</td><td class="sel nm has-silks basket-active"><span class="float-wrap"><span class="beta-sprite add-to-bet-basket" data-name="Sharney" data-ng-click="MainController.addToMultipleBetSlip(26459041728, 3490400883, 1.5)" data-track="&amp;lid=grid&amp;lpos=basket-add" title="Add Sharney to betslip"></span></span><img alt="Sharney silk" class="silks" height="29" src="https://static.oddschecker.com/content/racing-silks/24372.gif?v=1.0.15" width="39"/><span class="float-wrap name-wrap"><span class="tcell"><div class="top-row"><a class="popup selTxt" data-name="Sharney" href="https://www.oddschecker.com/horse-racing/chelmsford-city/20:30/top-3-finish/bet-history/sharney" target="_blank" title="View odds history for Sharney">Sharney<span class="stall"> (7)</span></a></div><div class="bottom-row jockey"><span class="current-form">0-40</span></div></span></span></td><td class="bc bs oi b" data-bk="B3" data-fodds="1.9" data-hcap="" data-o="1/2" data-odig="1.5"><p>1/2</p></td><td class="bc bs oi" data-bk="SK" data-fodds="2.0" data-hcap="" data-o="2/5" data-odig="1.4"><p>2/5</p></td><td class="bc bs oi" data-bk="LD" data-fodds="1.4" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="bc bs oi" data-bk="WH" data-ew-denom="1" data-ew-places="3" data-fodds="1.83" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="np o" data-bk="EE" data-fodds="" data-hcap="" data-o="" data-odig="0"></td><td class="bc bs oi" data-bk="FB" data-fodds="1.4" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="bc bs oi" data-bk="VC" data-fodds="1.91" data-hcap="" data-o="2/5" data-odig="1.4"><p>2/5</p></td><td class="bc bs oi" data-bk="PP" data-fodds="1.4" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="np o" data-bk="UN" data-fodds="" data-hcap="" data-o="" data-odig="0"></td><td class="bc bs oi" data-bk="CE" data-fodds="1.4" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="bc bs oi" data-bk="FR" data-fodds="1.8" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="bc bs oi" data-bk="WA" data-fodds="1.33" data-hcap="" data-o="2/7" data-odig="1.29"><p>2/7</p></td><td class="bc bs oi" data-bk="SA" data-fodds="1.25" data-hcap="" data-o="2/9" data-odig="1.22"><p>2/9</p></td><td class="bc bs o" data-bk="BY" data-fodds="1.36" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="np o" data-bk="VT" data-fodds="" data-hcap="" data-o="" data-odig="0"></td><td class="bc bs oi" data-bk="OE" data-fodds="1.25" data-hcap="" data-o="2/9" data-odig="1.22"><p>2/9</p></td><td class="np o" data-bk="SO" data-fodds="" data-hcap="" data-o="" data-odig="0"></td><td class="bc bs oi" data-bk="BH" data-fodds="1.25" data-hcap="" data-o="2/9" data-odig="1.22"><p>2/9</p></td><td class="bc bs o" data-bk="GN" data-fodds="1.36" data-hcap="" data-o="4/11" data-odig="1.36"><p>4/11</p></td><td class="bc bs o" data-bk="SX" data-ew-denom="0" data-ew-places="0" data-fodds="1.44" data-hcap="" data-o="4/9" data-odig="1.44"><p>4/9</p></td><td class="np o" data-bk="MR" data-fodds="" data-hcap="" data-o="" data-odig="0"></td><td class="wo wo-col"></td><td class="bc bs oi b" data-bk="BF" data-fodds="1.95" data-hcap="" data-o="8/15" data-odig="1.53" data-x-selection="27351256*1.169809702*horse-racing*29739583*1.169809702"><p>8/15</p></td><td class="np o" data-bk="BD" data-fodds="" data-hcap="" data-o="" data-odig="0"></td><td class="bc bs oo b" data-bk="MK" data-fodds="1.22" data-hcap="" data-o="8/15" data-odig="1.53"><p>8/15</p></td></tr>

--我想做的是采用每行一匹马的格式,即

沙尼价格1价格2价格3价格4

然后(现在并不特别重要)创建一个csv来导出

我试图使用for循环函数,但是我很难掌握它

如果有人能给我一些指导,我将非常感激


Tags: theurldatabsnpclassbktd