我写了一个代码来抓取标题URL,但我在提取标题URL时遇到了一个错误,请您指导我。 这是我的密码:
import requests
from bs4 import BeautifulSoup
# import pandas as pd
# import pandas as pd
import csv
def get_page(url):
response = requests.get(url)
if not response.ok:
print('server responded:', response.status_code)
else:
# 1. html , 2. parser
soup = BeautifulSoup(response.text, 'html.parser')
return soup
def get_index_data(soup):
try:
titles_link = soup.find_all('a', class_="body_link_11")
except:
titles_link = []
# urls = [item.get('href') for item in titles_link]
print(titles_link)
def main():
mainurl = "http://cgsc.cdmhost.com/cdm/search/collection/p4013coll8/" \
"searchterm/1/field/all/mode/all/conn/and/order/nosort/page/1"
get_index_data(get_page(mainurl))
if __name__ == '__main__':
main()
如果要获取所有链接,请尝试以下操作:
输出:
相关问题 更多 >
编程相关推荐