试图解析一个大学教师网站上的名字(和博士学位)。有困难得到唯一的

2021-11-29 23:07:57 发布

您现在位置:Python中文网/ 问答频道 /正文

from bs4 import BeautifulSoup #imports beautifulSoup package
import urllib2

url = 'https://www.marshall.usc.edu/faculty/phd' #sets url to a variable
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read(), "lxml") #sets the contents of the page to the variable soup

#names = soup.find_all('tr', {'class': 'odd views-row-first'})

names = soup.find_all('td', {'class': 'views-field views-field-field-faculty-name-last-value active'}) #sets the name 'cell' and tags
#namesU = names.replaceAll("<[^>]*>","")

#names.strip('<td class="views-field views-field-field-faculty-name-last-value active">') 
#names2 = names.sub('<td class="views-field views-field-field-faculty-name-last-value active">', '')

print(names)