尝试使用Python向使用框架的表单发送数据并获取数据
感谢Northcat和其他人的帮助,我成功地用requests库向http://www.camp.bicnirrh.res.in/featcalc/发送了一个multipart/form-data请求,效果非常好。现在我想向http://pro-161-70.ib.unicamp.br/~itaraju/tools/pimw/发送数据,并选择“显示pI/MW值”这个选项。我正在上传一个名为Denovo的文件。这是我到目前为止尝试的内容,我尽量遵循之前成功的格式。
import requests
import urllib
session = requests.Session()
file={'file': (open('Bishop/Denovo.txt', 'r').read())}
url = 'http://pro-161-70.ib.unicamp.br/~itaraju/tools/pimw/pimw.htm'
payload = {"opShowpimw":"opShowpimw", "opUseTabs":"opUseTabs"}
raw = urllib.urlencode(payload)
response = session.post(url, files=file, data=payload)
print response.text
我在代码中使用的是这个网址,而不是上面列出的那个,因为这个网站使用了框架,返回的信息是“这个页面使用了框架,但你的浏览器不支持它们”。所以我通过查看“查看框架源代码”找到了上面的网址。负载数据是通过查看ieheaders得到的。负载中的第一个对应“显示pI/MW值”,第二个则是试图让输出更简单,选择文本格式(在表单上,点击“.txt格式”)。但是返回的结果没有值,看起来像是第一页。结果页面的框架源网址是这个:'http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/pimw.cgi',不过使用这个网址并没有得到任何响应。
1 个回答
1
我把一个序列作为文本发送到 tbSeq
。
这个序列我是在 这个网站上找到的。
它给了我一些结果和一张图片(如下所示),并把它保存为 'output.gif'
。
import requests
import lxml.html
url = 'http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/pimw.cgi'
payload = {
'arquivo': '',
'opShowTitle': 'ON',
'opShowSeq': 'ON',
'opShowStat': 'ON',
'opShowpimw': 'ON',
'opGelVirtual': 'ON',
'opMap': 'gel0.def',
'opPK': 'Default',
'tbCt': 3.55,
'tbNt': 7,
'tbArg': 12.01,
'tbAsp': 4.06,
'tbCys': 9,
'tbGlu': 4.45,
'tbHis': 5.985,
'tbLys': 10.01,
'tbTyr': 10.01,
'tbSeq': '''>gi|532319|pir|TVFV2E|TVFV2E envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVH
CTNLMNTTVTTGLLLNGSYSENRTQIWQKHRTSNDS
ALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQ
KYNLRLRQAWCHFPSNWKGAWKEVKEEIVNLPKER
YRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPG
PCVQRTYVACHIRSVIIWLETISKKTYAPPREGHLECT
STVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRY
KLVEITPIGFAPTEVRRYTGGHERQKRVPFVXXXXXX
XXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
LAAVEAQQQMLKLTIWGVK''',
}
# send POST
r = requests.post(url, data=payload)
#print r.text
# convert HTML string into HTML tree
html = lxml.html.fromstring(r.text)
# get all images
imgs = html.cssselect('img')
# get second image
if len(imgs) > 1:
url = 'http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/' + imgs[1].attrib['src'].strip()
print "Downloading ...", url
with open('output.gif', 'wb') as handle:
r = requests.get(url, stream=True)
if not r.ok:
# Something went wrong
pass
for block in r.iter_content(1024):
if not block:
break
handle.write(block)
print '.',
print
# get data
for tr in html.cssselect('tr'):
for td in tr.cssselect('tr'):
print td.text_content().strip().replace('\n', ' | '),
print
结果:
Downloading ... http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/../../../tools/htdocs/tmp/gel.15548.gif
. . . . . . . . . . . . . . . . . . . . . . . . . .
ORF:
gi|532319|pir|TVFV2E|TVFV2E envelope protein
Sequence:
ELRLRYCAPAGFALLKCNDADYDGFKTNCS NVSVVHCTNLMNTTVTTGLLLNGSYSENRT QIWQKHRTSNDSALILLNKHYNLTVTCKRP GNKTVLPVTIMAGLVFHSQKYNLRLRQAWC HFPSNWKGAWKEVKEEIVNLPKERYRGTND PKRIFFQRQWGDPETANLWFNCHGEFFYCK MDWFLNYLNNLTVDADHNECKNTSGTKSGN KRAPGPCVQRTYVACHIRSVIIWLETISKK TYAPPREGHLECTSTVTGMTVELNYIPKNR TNVTLSPQIESIWAAELDRYKLVEITPIGF APTEVRRYTGGHERQKRVPFVXXXXXXXXX XXXXXXXXXXXXXVQSQHLLAGILQQQKNL LAAVEAQQQMLKLTIWGVK
MW: | pI:
40969.02 | | 9.35
Amino-acid composition
Ala (A) | 20 | 5.3% | | Cys (C) | 12 | 3.2% | | Asp (D) | 10 | 2.6% | | Glu (E) | 19 | 5.0% | | Phe (F) | 12 | 3.2% | | Gly (G) | 20 | 5.3% | | His (H) | 11 | 2.9% | | Ile (I) | 16 | 4.2% | | Lys (K) | 24 | 6.3% | | Leu (L) | 34 | 9.0% | | | | | Met (M) | 5 | 1.3% | | Asn (N) | 27 | 7.1% | | Pro (P) | 16 | 4.2% | | Gln (Q) | 17 | 4.5% | | Arg (R) | 21 | 5.5% | | Ser (S) | 16 | 4.2% | | Thr (T) | 30 | 7.9% | | Val (V) | 24 | 6.3% | | Trp (W) | 10 | 2.6% | | Tyr (Y) | 13 | 3.4% Ala (A) | 20 | 5.3% Cys (C) | 12 | 3.2% Asp (D) | 10 | 2.6% Glu (E) | 19 | 5.0% Phe (F) | 12 | 3.2% Gly (G) | 20 | 5.3% His (H) | 11 | 2.9% Ile (I) | 16 | 4.2% Lys (K) | 24 | 6.3% Leu (L) | 34 | 9.0% Met (M) | 5 | 1.3% Asn (N) | 27 | 7.1% Pro (P) | 16 | 4.2% Gln (Q) | 17 | 4.5% Arg (R) | 21 | 5.5% Ser (S) | 16 | 4.2% Thr (T) | 30 | 7.9% Val (V) | 24 | 6.3% Trp (W) | 10 | 2.6% Tyr (Y) | 13 | 3.4%
Ala (A) | 20 | 5.3%
Cys (C) | 12 | 3.2%
Asp (D) | 10 | 2.6%
Glu (E) | 19 | 5.0%
Phe (F) | 12 | 3.2%
Gly (G) | 20 | 5.3%
His (H) | 11 | 2.9%
Ile (I) | 16 | 4.2%
Lys (K) | 24 | 6.3%
Leu (L) | 34 | 9.0%
Met (M) | 5 | 1.3%
Asn (N) | 27 | 7.1%
Pro (P) | 16 | 4.2%
Gln (Q) | 17 | 4.5%
Arg (R) | 21 | 5.5%
Ser (S) | 16 | 4.2%
Thr (T) | 30 | 7.9%
Val (V) | 24 | 6.3%
Trp (W) | 10 | 2.6%
Tyr (Y) | 13 | 3.4%
Total: | 379
Theoretical 2D gel:
小红点 :)
编辑: 这是一个文件的例子 - 文件需要通过名为 arquivo
的字段发送。
import requests
import lxml.html
url = 'http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/pimw.cgi'
payload = {
# 'arquivo': '', # remove it
'opShowTitle': 'ON',
'opShowSeq': 'ON',
'opShowStat': 'ON',
'opShowpimw': 'ON',
'opGelVirtual': 'ON',
'opMap': 'gel0.def',
'opPK': 'Default',
'tbCt': 3.55,
'tbNt': 7,
'tbArg': 12.01,
'tbAsp': 4.06,
'tbCys': 9,
'tbGlu': 4.45,
'tbHis': 5.985,
'tbLys': 10.01,
'tbTyr': 10.01,
'tbSeq': '',
}
files = {'arquivo': open('sequence.fasta').read()}
#url = 'http://httpbin.org/post' # special portal for tests
# send POST
r = requests.post(url, data=payload, files=files)
#print r.text
# convert HTML string into HTML tree
html = lxml.html.fromstring(r.text)
# get all images
imgs = html.cssselect('img')
# get second image
if len(imgs) > 1:
url = 'http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/' + imgs[1].attrib['src'].strip()
print "Downloading ...", url
with open('output.gif', 'wb') as handle:
r = requests.get(url, stream=True)
if not r.ok:
# Something went wrong
pass
for block in r.iter_content(1024):
if not block:
break
handle.write(block)
print '.',
print
# get data
for tr in html.cssselect('tr'):
for td in tr.cssselect('tr'):
print td.text_content().strip().replace('\n', ' | '),
print
使用的文件是 sequence.fasta
>gi|532319|pir|TVFV2E|TVFV2E envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVH
CTNLMNTTVTTGLLLNGSYSENRTQIWQKHRTSNDS
ALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQ
KYNLRLRQAWCHFPSNWKGAWKEVKEEIVNLPKER
YRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPG
PCVQRTYVACHIRSVIIWLETISKKTYAPPREGHLECT
STVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRY
KLVEITPIGFAPTEVRRYTGGHERQKRVPFVXXXXXX
XXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
LAAVEAQQQMLKLTIWGVK