回应页面没有被刮后的数据美丽的汤和Python

import bs4 as bs import urllib.request import requests import webbrowser import urllib.parse url_for_parse = "http://demo.testfire.net/feedback.aspx" #PARSE THE WEBPAGE sauce = urllib.request.urlopen(url_for_parse).read() soup = bs.BeautifulSoup(sauce,"html.parser") #GET FORM ATTRIBUTES form = soup.find('form') action_value = form.get('action') method_value = form.get('method') id_value = form.get('id') #POST DATA payload = {'txtSearch':'HELLOWORLD'} r = requests.post(url_for_parse, payload) #PARSING ACTION VALUE WITH URL url2 = urllib.parse.urljoin(url_for_parse,action_value) #READ RESPONSE response = urllib.request.urlopen(url2) page_source = response.read() with open("results.html", "w") as f: f.write(str(page_source)) searchfile = open("results.html", "r") for line in searchfile: if "HELLOWORLD" in line: print ("STRING FOUND") else: print ("STRING NOT FOUND") searchfile.close()

3条回答

网友

1楼 · 编辑于 2024-04-25 07:42:15

很明显你在做什么。在

1) You are posting some data to a URL
2) Scraping the same URL.
3) Check for some "String"

但是你应该怎么做。在

^{pr2}$

为此，您需要将r.content写入本地文件并搜索字符串

修改代码如下：

 payload = {'txtSearch':'HELLOWORLD'}
 url2 = urllib.parse.urljoin(url_for_parse,action_value)
 r = requests.post(url2, auth = {"USERNAME", "PASSWORD"}, payload)

  with open("results.html", "w") as f:
        f.write(str(r.content))

//Then continue searching for a String.

注意：您需要将有效负载发送到url2，而不是初始URL（URL_for \u parse）

网友

2楼 · 编辑于 2024-04-25 07:42:15

在请求.post调用将是您要通过的HTML。你可以通过

r.content

但是，在我的测试中，它说我没有认证，所以我假设你已经认证了？在

我还建议完全使用请求，而不是对get和post使用urllib。在

网友

3楼 · 编辑于 2024-04-25 07:42:15

在请求之间持久化会话参数可能是个好主意。在

http://docs.python-requests.org/en/master/user/advanced/#session-objects

import requests

proxies = {
    "http": "",
    "https": "",
}

headers = {
        'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36'
}

data = {'item':'content'}
## not that you need basic auth but its simple to toss in requests
auth = requests.auth.HTTPBasicAuth('fake@example.com', 'not_a_real_password') 
s = requests.session()
s.headers.update(headers)
s.proxies.update(proxies)
response = s.post(url=url, data=data, auth=auth)

这个关键点就是你正在调用并等待的

^{pr2}$

只是发到http://demo.testfire.net/comment.aspx

相关问题更多 >

编程相关推荐

热门问题

热门文章