从外部网站获取元描述

3 投票

2 回答

3270 浏览

提问于 2025-04-17 21:45

我需要提取一个外部网站的元描述。我已经搜索过，可能简单的答案已经存在，但我还是没能把它应用到我的代码中。

目前我可以通过以下方式获取网站的标题：

external_sites_html = urllib.request.urlopen(url)
soup = BeautifulSoup(external_sites_html)
title = soup.title.string

不过，描述就有点复杂了。它可能以以下形式出现：

<meta name="og:description" content="blabla"
<meta property="og:description" content="blabla"
<meta name="description" content="blabla"

所以我想要的是提取出第一个出现在HTML中的描述。然后它会被添加到数据库中，格式是：

entry.description = extracted_desc
entry.save

如果根本找不到任何描述，那就只保存标题。

html解析网页标题数据库存储元数据网站抓取元描述提取

2 个回答

你可以这样做：

# Order these in order of preference
description_selectors = [
    {"name": "description"},
    {"name": "og:description"},
    {"property": "description"}
]

for selector in description_selectors:
    description_tag = soup.find(attrs=selector)
    if description_tag and description_tag.get('content'):
        description = description_tag['content']
        break
else:
    desciption = ''

请注意，这里的 else 是针对 for 循环的，而不是针对 if 条件的。

回答于 2025-04-17 由 Python大师

分享举报

你可以在soup对象上使用find方法，来查找带有特定属性的标签。在这里，我们需要找到meta标签，它的name属性可以是og:description、description，或者它的property属性是description。

# First get the meta description tag
description = soup.find('meta', attrs={'name':'og:description'}) or soup.find('meta', attrs={'property':'description'}) or soup.find('meta', attrs={'name':'description'})

# If description meta tag was found, then get the content attribute and save it to db entry
if description:
    entry.description = description.get('content')

回答于 2025-04-17 由 Python大师

分享举报

从外部网站获取元描述

2 个回答

撰写回答