如何自动化与使用POST方法的网站交互

0 投票

3 回答

2788 浏览

提问于 2025-04-18 02:41

我需要在这个网站的文本框里输入一些文字：

http://www.link.cs.cmu.edu/link/submit-sentence-4.html

然后我需要获取返回页面的HTML内容。我查过其他的解决方案，但我知道没有一种方法适合所有情况。我见过Selenium这个工具，但我看不懂它的文档，也不知道怎么用。请帮帮我，谢谢。

顺便说一下，我对BeautifulSoup有一些经验，如果这有帮助的话。我之前问过，但只有requests是唯一的解决方案。不过我也不知道怎么用它。

html解析自动化 post方法 selenium requests 网站交互

3 个回答

根据提问者的要求，使用Python来处理这个过程。

我不会使用selenium，因为它会在你的电脑上启动一个浏览器，这样做对于仅仅填写一个表单并获取回复来说有点过于复杂了（如果你的页面有JavaScript或ajax的内容，那可以考虑使用它）。

填写表单的请求代码可以是这样的：

import requests

payload = {
    'Sentence': 'Once upon a time, there was a little red hat and a wolf.',
    'Constituents': 'on',
    'NullLinks': 'on',
    'AllLinkages': 'on',
    'LinkDisplay': 'on',
    'ShortLegth': '6',
    'PageFile': '/docs/submit-sentence-4.html',
    'InputFile': "/scripts/input-to-parser",
    'Maintainer': "sleator@cs.cmu.edu"
}

r = requests.post("http://www.link.cs.cmu.edu/cgi-bin/link/construct-page-4.cgi#submit", 
                  data=payload)

print r.text

这里的 r.text 是返回的HTML内容，你可以用比如BeautifulSoup这样的工具来解析它。

从HTML的回复来看，我觉得你可能会在处理 <pre> 标签中的文本时遇到问题，但这属于另一个话题，不在这个问题的范围内。

希望对你有帮助，

回答于 2025-04-18 由 Python大师

分享举报

首先，我觉得如果你只是在看一个网页，用BeautifulSoup来自动化处理有点过于复杂了。你不如直接查看网页的源代码，看看表单的结构。你的表单其实很简单：

<FORM METHOD="POST"
ACTION="/cgi-bin/link/construct-page-4.cgi#submit">
<input type="text" name="Sentence" size="120" maxlength="120"></input><br>
<INPUT TYPE="checkbox" NAME="Constituents" CHECKED>Show constituent tree &nbsp;
<INPUT TYPE="checkbox" NAME="NullLinks" CHECKED>Allow null links &nbsp;
<INPUT TYPE="checkbox" NAME="AllLinkages" OFF>Show all linkages &nbsp;
<INPUT TYPE="HIDDEN" NAME="LinkDisplay" VALUE="on">
<INPUT TYPE="HIDDEN" NAME="ShortLength" VALUE="6">
<INPUT TYPE="HIDDEN" NAME="PageFile" VALUE="/docs/submit-sentence-4.html">
<INPUT TYPE="HIDDEN" NAME="InputFile" VALUE="/scripts/input-to-parser">
<INPUT TYPE="HIDDEN" NAME="Maintainer" VALUE="sleator@cs.cmu.edu">
<br>
<INPUT TYPE="submit" VALUE="Submit one sentence">
<br>
</FORM>

所以你应该能够提取出字段并填写它们。

我会用 curl 和 -X POST 来处理（就像这里提到的那样——也看看那个答案 :))。

如果你真的想用Python来做，那你需要做一些像使用requests进行POST请求的操作。

回答于 2025-04-18 由 Python大师

分享举报

直接从文档中提取的内容，改成了你的例子。

from selenium import webdriver

# Create a new instance of the Firefox driver
driver = webdriver.Firefox()

# go to the page
driver.get("http://www.link.cs.cmu.edu/link/submit-sentence-4.html")

# the page is ajaxy so the title is originally this:
print driver.title

# find the element that's name attribute is Sentence
inputElement = driver.find_element_by_name("Sentence")

# type in the search
inputElement.send_keys("You're welcome, now accept the answer!")

# submit the form 
inputElement.submit()

这样至少可以帮助你输入文本。然后，可以看看这个例子，了解如何获取HTML内容。

回答于 2025-04-18 由 Python大师

分享举报

如何自动化与使用POST方法的网站交互

3 个回答

撰写回答