python html解析器不返回lin

import bs4 from bs4 import BeautifulSoup as soup from urllib.request import urlopen import re #import xml.etree.ElementTree as ET rss_url="https://news.google.com/news/rss/search/section/q/australia/australia?hl=en-AU&gl=AU&ned=au" Client=urlopen(rss_url) xml_page=Client.read() Client.close() soup_page=soup(xml_page,"html.parser") #soup_page=ET.parse(xml_page) news_list=soup_page.findAll("item") # Print news title, url and publish date for news in news_list: #text=news.text title=news.title.text link=news.link.text pubdate=news.pubDate.text description=news.description.text publisher = re.findall('<font color="#6f6f6f">(.*?)</font>', description) article_link=link article_info=[title,publisher,link,pubdate] print(article_info)

1条回答

网友

1楼 · 发布于 2024-05-01 21:48:58

关于pubDate和link字段：

pubDate字段可以使用所有小写字母检索：

pubdate=news.pubdate.text

link字段在BeautifulSoup的早期版本4.5.3中被正确捕获，但在当前版本4.6.0中没有。在你看到的空白行中出现4.0的结果。安装4.5.3，包括：

$ pip3 uninstall beautifulsoup4
$ pip3 install 'beautifulsoup4==4.5.3'

以下是靓汤发布的历史。4.5.3于2017年1月2日发布，4.6.0于2017年5月7日发布。你知道吗

https://pypi.org/project/beautifulsoup4/#history

我在macOS上使用python3.6.0。你知道吗

这里是前两行，更新显示了所有字段。你知道吗

['Coalition party room split over national energy guarantee – politics live', ['The Guardian'], 'https://www.theguardian.com/australia-news/live/2018/may/29/nationals-barnaby-joyce-superannuation-coalition-banking-royal-commission-tax-politics-live', 'Mon, 28 May 2018 22:37:07 GMT']

['Residential rental agreements in Australia falling behind rest of the world: tenants union', ['ABC Online'], 'http://www.abc.net.au/news/2018-05-29/residential-rental-agreements-in-australia-need-updating/9809364', 'Mon, 28 May 2018 19:39:43 GMT']

相关问题更多 >

编程相关推荐

热门问题

热门文章