如何使用CSS选择器和BeautifulSoup从表中获取数据？

2024-06-02 06:52:31 发布

男 | 程序猿一只，喜欢编程写python代码。

在这个页面上，我试图从一个未命名的表和未命名的单元格中获取特定的数据。我使用Chrome中inspect元素中的Copy Selector来找到CSS选择器。当我要求Python打印特定的CSS选择器时，我得到的是“Nonetype”对象不可调用

特别是在这个页面上，我想让数字“198”出现在“一般信息”的表格中，文章：第N个孩子（4）你说，表：第n个孩子（2）你说

CSS选择器路径是：

"html body div#program-details section#general-info article.grid-50 table tbody tr td"

用选择器来复制

^{pr2}$

大部分代码都是访问站点并绕过EULA。跳到最下面的代码我有问题。在

import mechanize  
import requests
import urllib2
import urllib
import csv
from BeautifulSoup import BeautifulSoup

br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [("User-agent","Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13")]

sign_in = br.open('https://login.ama-assn.org/account/login')  #the login url

br.select_form(name = "go") #Alternatively you may use this instead of the above line if your form has name attribute available.

br["username"] = "wasabinoodlz" #the key "username" is the variable that takes the username/email value
br["password"] = "Bongshop10"    #the key "password" is the variable that takes the password value
logged_in = br.submit()   #submitting the login credentials
logincheck = logged_in.read()  #reading the page body that is redirected after successful login
#print (logincheck) #printing the body of the redirected url after login


# EULA agreement stuff
cont = br.open('https://freida.ama-assn.org/Freida/eula.do').read()
cont1 = br.open('https://freida.ama-assn.org/Freida/eulaSubmit.do').read()

# Begin request for page data
req = br.open('https://freida.ama-assn.org/Freida/user/programDetails.do?pgmNumber=1205712369').read()

#Da Soups!
soup = BeautifulSoup(req)
#print soup.prettify() # use this to read html.prettify()


for score in soup.select('#general-info > article:nth-child(4) > table:nth-child(2) > tbody > tr > td:nth-child(2)'):
    print score.string

Tags： the in https org br import read 选择器

1条回答

网友

1楼 · 发布于 2024-06-02 06:52:31

您需要使用html5lib解析器初始化BeautifulSoup。在

soup = BeautifulSoup(req, 'html5lib')

BeautifulSoup只实现nth-of-type伪选择器。在

^{pr2}$

如何使用CSS选择器和BeautifulSoup从表中获取数据？

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用CSS选择器和BeautifulSoup从表中获取数据？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >