我正试图用BeautifulSoup从USPTO的网页上编译专利文件
df['link']
urls=df['link'].to_numpy()
urls
for i in urls:
page = requests.get(i)
## storing the content of the page in a variable
txt = page.text
## creating BeautifulSoup object
soup = bs4.BeautifulSoup(txt, 'html.parser')
soup
但是,它只打印其中一个URL,而不是全部5个链接。我需要所有5个链接刮为文本
如有任何建议,我们将不胜感激。干杯
我需要刮取的链接
array(['http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=1&f=G&l=50&co1=AND&d=PTXT&s1=g06n.CPCL.&OS=CPCL/g06n&RS=CPCL/g06n',
'http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=2&f=G&l=50&co1=AND&d=PTXT&s1=g06n.CPCL.&OS=CPCL/g06n&RS=CPCL/g06n',
'http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=3&f=G&l=50&co1=AND&d=PTXT&s1=g06n.CPCL.&OS=CPCL/g06n&RS=CPCL/g06n',
'http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=4&f=G&l=50&co1=AND&d=PTXT&s1=g06n.CPCL.&OS=CPCL/g06n&RS=CPCL/g06n',
'http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=5&f=G&l=50&co1=AND&d=PTXT&s1=g06n.CPCL.&OS=CPCL/g06n&RS=CPCL/g06n'],
dtype=object)
输出:View Online
相关问题 更多 >
编程相关推荐