如何使用python从url中提取元描述？

<title>Book a Virgin Australia Flight | Virgin Australia </title> <meta name="keywords" content="" /> <meta name="description" content="Search for and book Virgin Australia and partner flights to Australian and international destinations." />

2条回答

网友

1楼 · 编辑于 2024-05-15 23:10:53

请检查BeautifulSoup作为解决方案。

对于上述问题，您可以使用以下代码提取“说明”信息：

import requests
from bs4 import BeautifulSoup

url = 'http://www.virginaustralia.com/au/en/bookings/flights/make-a-booking/'
response = requests.get(url)
soup = BeautifulSoup(response.text)

metas = soup.find_all('meta')

print [ meta.attrs['content'] for meta in metas if 'name' in meta.attrs and meta.attrs['name'] == 'description' ]

输出：

['Search for and book Virgin Australia and partner flights to Australian and international destinations.']

网友

2楼 · 编辑于 2024-05-15 23:10:53

你知道html-xpath吗？使用lxml lib和xpath提取html元素是一种快速的方法。

import lxml

doc = lxml.html.document_fromstring(html_content)
title_element = doc.xpath("//title")
website_title = title_element[0].text_content().strip()
meta_description_element = doc.xpath("//meta[@property='description']")
website_meta_description = meta_description_element[0].text_content().strip()

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用python从url中提取元描述？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >