如何用BeautifulSoup停止文章两次打印

2024-04-20 03:39:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图打印每个文章从这个网站的链接和文章链接打印两次,只打印其中5个。你知道吗

我试着把我的范围扩大到(1,20),这打印了所有的十篇文章链接,但每一篇都打印了两次。你知道吗

from bs4 import BeautifulSoup
from urllib.request import urlopen

url = urlopen("https://www.politico.com/newsletters/playbook/archive")
target = 'C:/Users/k/Politico/pol.csv'

content = url.read()

soup = BeautifulSoup(content,"lxml")

for article in range (1,10):
    #Prints each article's link and saves to csv file
    print(soup('article')[article]('a',{'target':'_top'}))

我希望输出10篇文章链接,没有重复。你知道吗


Tags: csvfromimporturltarget网站链接article
3条回答

您可以使用css selector.front list h3>;a

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.politico.com/newsletters/playbook/archive#')
soup = bs(r.content, 'lxml')
links = [link['href'] for link in soup.select('.front-list h3 > a')]
print(links)

试着打印你的汤,看看如何在每个迭代,有2个链接和相同的。 因此,它要打印两次。你知道吗

拿一套把所有的str(data)

a = set()
for article in range (1,20):
    a.add((str(soup('article')[article]('a',{'target':'_top'}))))

print(a) 

你可以使用下面的方法,像一个魅力作品。你知道吗

from bs4 import BeautifulSoup
from urllib.request import urlopen

url = urlopen("https://www.politico.com/newsletters/playbook/archive")
target = 'C:/Users/k/Politico/pol.csv'
content = url.read()
soup = BeautifulSoup(content,"lxml")

articles = soup.findAll('article', attrs={'class':'story-frag format-l'})

for article in articles:
    link = article.find('a', attrs={'target':'_top'}).get('href')
    print(link)

enter image description here预期输出如上所示

相关问题 更多 >