python中的请求返回错误,而手动打开链接的工作方式是p

2024-04-26 04:23:13 发布

您现在位置:Python中文网/ 问答频道 /正文

import requests

a = 'http://tmsearch.uspto.gov/bin/showfield?f=toc&state=4809%3Ak1aweo.1.1&p_search=searchstr&BackReference=&p_L=100&p_plural=no&p_s_PARA1={}&p_tagrepl%7E%3A=PARA1%24MI&expr=PARA1+or+PARA2&p_s_PARA2=&p_tagrepl%7E%3A=PARA2%24ALL&a_default=search&f=toc&state=4809%3Ak1aweo.1.1&a_search=Submit+Query'
a = a.format('coca-cola')

b = requests.get(a)

print(b.text)
print(b.url)

如果你复制打印的网址并粘贴到浏览器中,网站将毫无问题地打开,但如果你这样做了请求。获取,我拿些代币?错误。有什么我能做的吗?你知道吗

通过请求。获取我的网址回来,但没有数据,如果做手动。上面写着:<html><head><TITLE>TESS -- Error</TITLE></head><body>


Tags: importhttpsearchtitlerequestsheadgovstate
1条回答
网友
1楼 · 发布于 2024-04-26 04:23:13

首先,确保你遵守网站的使用条款和使用政策。你知道吗

这看起来有点复杂。您需要在[web scraping session][1]的整个过程中保持一定的state。而且,您还需要一个HTML解析器,比如^{}

from urllib.parse import parse_qs, urljoin

import requests
from bs4 import BeautifulSoup


SEARCH_TERM = 'coca-cola'

with requests.Session() as session:
    session.headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'}

    # get the current search state
    response = session.get("http://tmsearch.uspto.gov/")
    soup = BeautifulSoup(response.content, "html.parser")
    link = soup.find("a", text="Basic Word Mark Search (New User)")["href"]

    session.get(urljoin(response.url, link))

    state = parse_qs(link)['state'][0]

    # perform a search
    response = session.post("http://tmsearch.uspto.gov/bin/showfield", data={
        'f': 'toc',
        'state': state,
        'p_search': 'search',
        'p_s_All': '',
        'p_s_ALL': SEARCH_TERM + '[COMB]',
        'a_default': 'search',
        'a_search': 'Submit'
    })

    # print search results
    soup = BeautifulSoup(response.content, "html.parser")

    print(soup.find("font", color="blue").get_text())

    table = soup.find("th", text="Serial Number").find_parent("table")
    for row in table('tr')[1:]:
        print(row('td')[1].get_text())

它打印第一个搜索结果页中的所有序列号值,以便于演示。你知道吗

相关问题 更多 >