如何从我的漂亮的汤结果中移除标签(例如:地址 = [a,b,c,d,r......])

2024-06-06 19:42:44 发布

您现在位置:Python中文网/ 问答频道 /正文

如何从beautifulsoup结果中删除标记 (比如:地址=[a,b,c,d,r……])

from bs4 import BeautifulSoup as bs
import requests
    #
url = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm'
url_get = requests.get(url)
soup = bs(url_get.content, 'html.parser')
#
address=soup.find_all('p', class_="nospc")
address
<p class="nospc">Address: Nobels gate 32, N-0268 Oslo</p>,
<p class="nospc">Address: Akershus Festning, 0015 Oslo</p>,
<p class="nospc">Address: Frederiks gate 2, 0164 Oslo</p>,
<p class="nospc">Address: Universitetsgata 13, Oslo</p>,
<p class="nospc">Address: Tøyengata 53, 0578 Oslo</p>,
<p class="nospc">Address: Bellevue, Oslo</p>,
<p class="nospc">Address: Frederiks gate 2, 0164 Oslo</p>,
<p class="nospc">Address: Bygdøynesveien 39, 0286 Oslo</p>,
<p class="nospc">Address: Kongeveien 5, 0787 Oslo</p>,
<p class="nospc">Address: Karl Johansgt. 11, 0154 Oslo</p>,
<p class="nospc">Address: Rådhuset, 0037 Oslo</p>,
<p class="nospc">Address: Bryggegata 9, 0120 Oslo</p>,
<p class="nospc">Address: Sars gate 1, 0562 Oslo</p>,
<p class="nospc">Address: Kirsten Flagstads Plass 1, 0150 Oslo</p>]

我想要像这样的东西

Address = ['Nobels gate 32, N-0268 Oslo', 'Akershus Festning, 0015 Oslo' ...]

Tags: importurlgetbsaddressrequestsosloclass
3条回答

可以使用text属性获取标记中的内容:

address=[x.text for x in soup.find_all('p', class_="nospc")]
print(address)

请尝试以下操作代码。它将拆分地址部分。你知道吗

url = 'https://www.planetware.com/tourist-attractions-/oslo-n-osl-oslo.htm'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
#
address=soup.find_all('p', class_="nospc")
addrlist=[]
for addr in address:
    addrlist.append(addr.text.split(':')[1].strip())

print(addrlist)

输出:

['Nobels gate 32, N-0268 Oslo', 'Akershus Festning, 0015 Oslo', 'Frederiks gate 2, 0164 Oslo', 'Universitetsgata 13, Oslo', 'Tøyengata 53, 0578 Oslo', 'Bellevue, Oslo', 'Frederiks gate 2, 0164 Oslo', 'Bygdøynesveien 39, 0286 Oslo', 'Kongeveien 5, 0787 Oslo', 'Karl Johansgt. 11, 0154 Oslo', 'Rådhuset, 0037 Oslo', 'Bryggegata 9, 0120 Oslo', 'Sars gate 1, 0562 Oslo', 'Kirsten Flagstads Plass 1, 0150 Oslo']

.text方法可以做到这一点。但是你不能在find_all的列表中调用它 你必须遍历这个列表

Address_text=[]

for a in address:
    Address_text.append(a.text)


In[14]: Address_text
Out[14]: 
['Address:  Nobels gate 32, N-0268 Oslo',
 'Address:  Akershus Festning, 0015 Oslo',
 'Address:  Frederiks gate 2, 0164 Oslo',
 'Address:  Universitetsgata 13, Oslo',
 'Address:  Tøyengata 53, 0578 Oslo',
 'Address:  Bellevue, Oslo',
 'Address:  Frederiks gate 2, 0164 Oslo',
 'Address:  Bygdøynesveien 39, 0286 Oslo',
 'Address:  Kongeveien 5, 0787 Oslo',
 'Address:  Karl Johansgt. 11, 0154 Oslo',
 'Address:  Rådhuset, 0037 Oslo',
 'Address:  Bryggegata 9, 0120 Oslo',
 'Address:  Sars gate 1, 0562 Oslo',
 'Address:  Kirsten Flagstads Plass 1, 0150 Oslo']

相关问题 更多 >