仅限超链接文本的python html条带

2024-03-28 08:49:33 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我试图从

<a href="/define.php?term=dubstep&defid=5175360">dubstep</a> the music that is created from transformers having s$#

所以在解析后读起来是这样的

dubstep - the music that is created from transformers having S$#

我想从这个html超链接中提取文本dubstep

我该怎么做呢?你知道吗

我在这里读了答案 How to remove tags from a string in python using regular expressions? (NOT in HTML)

但我明白了

<class 'NameError'>, NameError("name 're' is not defined",), <traceback object at 0x036A41E8>)

Tags: theinfromthatismusicphphref
3条回答

使用此选项:

from bs4 import Beautifulsoup
html = <a href="/define.php?term=dubstep&defid=5175360">dubstep</a> the music that is created from transformers having s$#
soup = Beautifulsoup(html)
print(soup.get_text())

为什么不使用BeautifulSoup?你知道吗

In [44]: from bs4 import  BeautifulSoup

In [45]: soup = BeautifulSoup ('''<a href="/define.php?term=dubstep&defid=5175360">dubstep</a> the music that is created from transformers having s$#''')

In [46]: soup.find('a').text
Out[46]: u'dubstep'

编辑:

或者如果你只想发短信:

In [48]: soup.text 
Out[48]: u'dubstep the music that is created from transformers having s$#'

好吧

 NameError("name 're' is not defined",),

意思是你一开始忘了import re,但这只是猜测。你知道吗

此外,由于您只需要<a></a>标记之间的单词,因此您需要一个类似于以下内容的regexp:

 .*<a .*>([^<]*)</a>.*

相关问题 更多 >