beautifulsoup web scrape python

2024-09-20 22:21:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我仍然在学习如何使用漂亮的汤。你知道吗

我试图从http://www.nfl.com/injuries?week=1中创建一个数据框,在这里我可以得到球员的姓名、位置和比赛/伤病状态。我一直在尝试修改我找到的代码,但没有得到任何东西或任何地方。对哪里出了问题有什么建议吗?你知道吗

编辑:在做了更多的寻找,我最初的问题是与标签。看起来像是<script>type=javascript/text。所以我改变了。现在我越来越近了,但不知道如何提取相关数据。如何提取{player:,position:“…}数据?你知道吗

下面是我试图收集的代码示例。你知道吗

import bs4
import requests as re
import pandas as pd    

alpha  = re.get('http://www.nfl.com/injuries?week=1')

beta = bs4.BeautifulSoup(alpha.text,'lxml')
#print(beta)

gama = beta.findAll('script', {'type':"text/javascript"})
print(gama)

样品

</script>, <script type="text/javascript">
nfl.use("node", "datatable", "datatable-sort", "mobile-panel", "overthrow", 
"overthrow-shadows", "tabview", function(Y) {
var isTeamAway      = false,
    isTeamHome      = false,
    isTeam          = false,
    homeAbbr        = 'DEN',
    awayAbbr        = 'LAC',
    gameWeek        = '1',
    teamTabHome     = Y.one('.colors-DEN-1'),
    teamTabAway     = Y.one('.colors-LAC-1'),
    datatableHome   = Y.one('.data-table-DEN-1'),
    datatableAway   = Y.one('.data-table-LAC-1');

var dataAway = [












    {player: "Inman Dontrelle ",   position: "WR", injury: "Groin", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Inman", firstName: "Dontrelle", esbId: "INM264861"  },



    {player: "McGrath Sean ",   position: "TE", injury: "Knee", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "McGrath", firstName: "Sean", esbId: "MCG631892"  },











    {player: "Attaochu Jeremiah ",   position: "DE", injury: "Hamstring", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Attaochu", firstName: "Jeremiah", esbId: "ATT290361"  },









    {player: "Boston Jayestin ",   position: "S", injury: "Calf", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Boston", firstName: "Jayestin", esbId: "BOS695248"  },


];

var dataHome = [


    {player: "Booker Devontae ",   position: "RB", injury: "Wrist", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Booker", firstName: "Devontae", esbId: "BOO019902"  },



    {player: "Talib Aqib ",   position: "CB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Talib", firstName: "Aqib", esbId: "TAL428789"  },



    {player: "Paradis Matthew ",   position: "C", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Paradis", firstName: "Matthew", esbId: "PAR002722"  },



    {player: "Kerr Zachariah ",   position: "DT", injury: "Knee", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Kerr", firstName: "Zachariah", esbId: "KER593782"  },



    {player: "Peko Kyle ",   position: "DT", injury: "Foot", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Peko", firstName: "Kyle", esbId: "PEK467819"  },







    {player: "Dixon Riley ",   position: "P", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Dixon", firstName: "Riley", esbId: "DIX641722"  },



    {player: "Crick Jared ",   position: "DE", injury: "Back", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Crick", firstName: "Jared", esbId: "CRI129618"  },



    {player: "Wolfe Derek ",   position: "DE", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Wolfe", firstName: "Derek", esbId: "WOL309455"  },



    {player: "Lynch Paxton ",   position: "QB", injury: "right Shoulder", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Lynch", firstName: "Paxton", esbId: "LYN526034"  },





    {player: "Gotsis Adam ",   position: "DE", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Gotsis", firstName: "Adam", esbId: "GOT428790"  },



    {player: "Thomas Demaryius ",   position: "WR", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Thomas", firstName: "Demaryius", esbId: "THO095855"  },



    {player: "Charles Jamaal ",   position: "RB", injury: "--", practiceStatus: "Full Participation in Practice", gameStatus: "--", lastName: "Charles", firstName: "Jamaal", esbId: "CHA561428"  },




];  

Tags: inscriptpositionfirstnamefullplayerlimitedlastname
1条回答
网友
1楼 · 发布于 2024-09-20 22:21:15

您可以像这样使用正则表达式(regex):

import bs4
import requests
import pandas as pd
import re

alpha  = requests.get('http://www.nfl.com/injuries?week=1')
beta = bs4.BeautifulSoup(alpha.text,'lxml')
gama = beta.findAll('script', {'type':"text/javascript"})
for g in gama:
    match = re.search(r'\{player(.*)',g.text)
    if match:
        print(match.group(0))

输出:

{player: "Logan Bennie ",   position: "DT", injury: " ", practiceStatus: "Full Participation in Practice", gameStatus: " ", lastName: "Logan", firstName: "Bennie", esbId: "LOG113260"  },
{player: "Pelon Claudeson ",   position: "DE", injury: " ", practiceStatus: "Full Participation in Practice", gameStatus: " ", lastName: "Pelon", firstName: "Claudeson", esbId: "PEL747520"  },
{player: "Pasztor Austin ",   position: "T", injury: "Chest", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Pasztor", firstName: "Austin", esbId: "PAS822673"  },
{player: "Flacco Joseph ",   position: "QB", injury: " ", practiceStatus: "Full Participation in Practice", gameStatus: " ", lastName: "Flacco", firstName: "Joseph", esbId: "FLA009602"  },
{player: "Dupree Alvin ",   position: "LB", injury: "Shoulder", practiceStatus: "Did Not Participate In Practice", gameStatus: "Questionable", lastName: "Dupree", firstName: "Alvin", esbId: "DUP507860"  },
{player: "Palmer Carson ",   position: "QB", injury: " ", practiceStatus: "Full Participation in Practice", gameStatus: " ", lastName: "Palmer", firstName: "Carson", esbId: "PAL249055"  },
{player: "Bortles Robby ",   position: "QB", injury: " ", practiceStatus: "Full Participation in Practice", gameStatus: " ", lastName: "Bortles", firstName: "Robby", esbId: "BOR650964"  },
{player: "Cooper Amari ",   position: "WR", injury: " ", practiceStatus: "Full Participation in Practice", gameStatus: " ", lastName: "Cooper", firstName: "Amari", esbId: "COO487703"  },
{player: "Goode Najee ",   position: "LB", injury: " ", practiceStatus: "Full Participation in Practice", gameStatus: " ", lastName: "Goode", firstName: "Najee", esbId: "GOO217526"  },
{player: "Rogers Chester ",   position: "WR", injury: "Hamstring", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Rogers", firstName: "Chester", esbId: "ROG146742"  },
{player: "Vannett Nicholas ",   position: "TE", injury: " ", practiceStatus: "Full Participation in Practice", gameStatus: " ", lastName: "Vannett", firstName: "Nicholas", esbId: "VAN643509"  },
{player: "Norris Jared ",   position: "LB", injury: "Groin", practiceStatus: "Did Not Participate In Practice", gameStatus: "Out", lastName: "Norris", firstName: "Jared", esbId: "NOR463803"  },
{player: "Apple Eli ",   position: "CB", injury: " ", practiceStatus: "Full Participation in Practice", gameStatus: " ", lastName: "Apple", firstName: "Eli", esbId: "APP195645"  },
{player: "Anthony Stephone ",   position: "LB", injury: "Ankle", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Anthony", firstName: "Stephone", esbId: "ANT204590"  },
{player: "Inman Dontrelle ",   position: "WR", injury: "Groin", practiceStatus: "Limited Participation in Practice", gameStatus: "Questionable", lastName: "Inman", firstName: "Dontrelle", esbId: "INM264861"  },

注意,当我导入re时,我必须将请求的导入更改为re。你知道吗

相关问题 更多 >

    热门问题