用BeautifulSoup提取标题标签

2024-04-29 05:33:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这个:

date = chunk.find_all('a', title=True, class_='tweet-timestamp js-permalink     js-nav js-tooltip')

它返回:

<a class="tweet-timestamp js-permalink js-nav js-tooltip" href="/15colleen/status/537395294133313536" title="3:59 PM - 25 Nov 2014"><span class="_timestamp js-short-timestamp " data-aria-label-part="last" data-long-form="true" data-time="1416959997" data-time-ms="1416959997000">Nov 25</span></a>

显然get_text()返回Nov 25,但我想提取片段3:59 PM - 25 Nov 2014


Tags: datadatetimetitlejstimestampnovclass
2条回答

指定列表索引和标题索引以获取标题属性的值。

>>> from bs4 import BeautifulSoup
>>> s = '<a class="tweet-timestamp js-permalink js-nav js-tooltip" href="/15colleen/status/537395294133313536" title="3:59 PM - 25 Nov 2014"><span class="_timestamp js-short-timestamp " data-aria-label-part="last" data-long-form="true" data-time="1416959997" data-time-ms="1416959997000">Nov 25</span></a>'
>>> soup = BeautifulSoup(s)
>>> date = soup.find_all('a', title=True, class_='tweet-timestamp js-permalink     js-nav js-tooltip')
>>> date
[<a class="tweet-timestamp js-permalink js-nav js-tooltip" href="/15colleen/status/537395294133313536" title="3:59 PM - 25 Nov 2014"><span class="_timestamp js-short-timestamp " data-aria-label-part="last" data-long-form="true" data-time="1416959997" data-time-ms="1416959997000">Nov 25</span></a>]
>>> date[0]['title']
'3:59 PM - 25 Nov 2014'

你只需要.find和提取["title"]

from bs4 import BeautifulSoup

soup = BeautifulSoup(html)
print(soup.find("a",attrs={"class":"tweet-timestamp js-permalink js-nav js-tooltip"})["title"])

3:59 PM - 25 Nov 2014

相关问题 更多 >