尝试使用Python向使用框架的表单发送数据并获取数据

0 投票
1 回答
1423 浏览
提问于 2025-04-18 11:30

感谢Northcat和其他人的帮助,我成功地用requests库向http://www.camp.bicnirrh.res.in/featcalc/发送了一个multipart/form-data请求,效果非常好。现在我想向http://pro-161-70.ib.unicamp.br/~itaraju/tools/pimw/发送数据,并选择“显示pI/MW值”这个选项。我正在上传一个名为Denovo的文件。这是我到目前为止尝试的内容,我尽量遵循之前成功的格式。

import requests
import urllib
session = requests.Session()
file={'file': (open('Bishop/Denovo.txt', 'r').read())}
url = 'http://pro-161-70.ib.unicamp.br/~itaraju/tools/pimw/pimw.htm'
payload = {"opShowpimw":"opShowpimw", "opUseTabs":"opUseTabs"}
raw = urllib.urlencode(payload)
response = session.post(url, files=file, data=payload)
print response.text

我在代码中使用的是这个网址,而不是上面列出的那个,因为这个网站使用了框架,返回的信息是“这个页面使用了框架,但你的浏览器不支持它们”。所以我通过查看“查看框架源代码”找到了上面的网址。负载数据是通过查看ieheaders得到的。负载中的第一个对应“显示pI/MW值”,第二个则是试图让输出更简单,选择文本格式(在表单上,点击“.txt格式”)。但是返回的结果没有值,看起来像是第一页。结果页面的框架源网址是这个:'http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/pimw.cgi',不过使用这个网址并没有得到任何响应。

1 个回答

1

我把一个序列作为文本发送到 tbSeq

这个序列我是在 这个网站上找到的。

它给了我一些结果和一张图片(如下所示),并把它保存为 'output.gif'

import requests
import lxml.html

url = 'http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/pimw.cgi'
payload = {
    'arquivo': '',
    'opShowTitle': 'ON',
    'opShowSeq': 'ON',
    'opShowStat': 'ON',
    'opShowpimw': 'ON',
    'opGelVirtual': 'ON',
    'opMap': 'gel0.def',
    'opPK': 'Default',
    'tbCt': 3.55,
    'tbNt': 7,
    'tbArg': 12.01,
    'tbAsp': 4.06,
    'tbCys': 9,
    'tbGlu': 4.45,
    'tbHis': 5.985,
    'tbLys': 10.01,
    'tbTyr': 10.01,
    'tbSeq': '''>gi|532319|pir|TVFV2E|TVFV2E envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVH
CTNLMNTTVTTGLLLNGSYSENRTQIWQKHRTSNDS
ALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQ
KYNLRLRQAWCHFPSNWKGAWKEVKEEIVNLPKER
YRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPG
PCVQRTYVACHIRSVIIWLETISKKTYAPPREGHLECT
STVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRY
KLVEITPIGFAPTEVRRYTGGHERQKRVPFVXXXXXX
XXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
LAAVEAQQQMLKLTIWGVK''',
}

# send POST    
r = requests.post(url, data=payload)

#print r.text

# convert HTML string into HTML tree
html = lxml.html.fromstring(r.text)

# get all images
imgs = html.cssselect('img')

# get second image
if len(imgs) > 1:
    url = 'http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/' + imgs[1].attrib['src'].strip()

    print "Downloading ...",  url

    with open('output.gif', 'wb') as handle:
        r = requests.get(url, stream=True)

        if not r.ok:
            # Something went wrong
            pass

        for block in r.iter_content(1024):
            if not block:
                break

            handle.write(block)
            print '.',

        print 

# get data
for tr in html.cssselect('tr'):
    for td in tr.cssselect('tr'):
        print td.text_content().strip().replace('\n', ' | '),
    print 

结果:

Downloading ... http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/../../../tools/htdocs/tmp/gel.15548.gif
. . . . . . . . . . . . . . . . . . . . . . . . . .


ORF:
gi|532319|pir|TVFV2E|TVFV2E envelope protein
Sequence:
ELRLRYCAPAGFALLKCNDADYDGFKTNCS NVSVVHCTNLMNTTVTTGLLLNGSYSENRT QIWQKHRTSNDSALILLNKHYNLTVTCKRP GNKTVLPVTIMAGLVFHSQKYNLRLRQAWC HFPSNWKGAWKEVKEEIVNLPKERYRGTND PKRIFFQRQWGDPETANLWFNCHGEFFYCK MDWFLNYLNNLTVDADHNECKNTSGTKSGN KRAPGPCVQRTYVACHIRSVIIWLETISKK TYAPPREGHLECTSTVTGMTVELNYIPKNR TNVTLSPQIESIWAAELDRYKLVEITPIGF APTEVRRYTGGHERQKRVPFVXXXXXXXXX XXXXXXXXXXXXXVQSQHLLAGILQQQKNL LAAVEAQQQMLKLTIWGVK
MW: |       pI:
40969.02 |  |   9.35
Amino-acid composition
Ala (A) | 20 | 5.3% |  | Cys (C) | 12 | 3.2% |  | Asp (D) | 10 | 2.6% |  | Glu (E) | 19 | 5.0% |  | Phe (F) | 12 | 3.2% |  | Gly (G) | 20 | 5.3% |  | His (H) | 11 | 2.9% |  | Ile (I) | 16 | 4.2% |  | Lys (K) | 24 | 6.3% |  | Leu (L) | 34 | 9.0% |  |    |  |  | Met (M) | 5 | 1.3% |  | Asn (N) | 27 | 7.1% |  | Pro (P) | 16 | 4.2% |  | Gln (Q) | 17 | 4.5% |  | Arg (R) | 21 | 5.5% |  | Ser (S) | 16 | 4.2% |  | Thr (T) | 30 | 7.9% |  | Val (V) | 24 | 6.3% |  | Trp (W) | 10 | 2.6% |  | Tyr (Y) | 13 | 3.4% Ala (A) | 20 | 5.3% Cys (C) | 12 | 3.2% Asp (D) | 10 | 2.6% Glu (E) | 19 | 5.0% Phe (F) | 12 | 3.2% Gly (G) | 20 | 5.3% His (H) | 11 | 2.9% Ile (I) | 16 | 4.2% Lys (K) | 24 | 6.3% Leu (L) | 34 | 9.0% Met (M) | 5 | 1.3% Asn (N) | 27 | 7.1% Pro (P) | 16 | 4.2% Gln (Q) | 17 | 4.5% Arg (R) | 21 | 5.5% Ser (S) | 16 | 4.2% Thr (T) | 30 | 7.9% Val (V) | 24 | 6.3% Trp (W) | 10 | 2.6% Tyr (Y) | 13 | 3.4%
Ala (A) | 20 | 5.3%
Cys (C) | 12 | 3.2%
Asp (D) | 10 | 2.6%
Glu (E) | 19 | 5.0%
Phe (F) | 12 | 3.2%
Gly (G) | 20 | 5.3%
His (H) | 11 | 2.9%
Ile (I) | 16 | 4.2%
Lys (K) | 24 | 6.3%
Leu (L) | 34 | 9.0%
Met (M) | 5 | 1.3%
Asn (N) | 27 | 7.1%
Pro (P) | 16 | 4.2%
Gln (Q) | 17 | 4.5%
Arg (R) | 21 | 5.5%
Ser (S) | 16 | 4.2%
Thr (T) | 30 | 7.9%
Val (V) | 24 | 6.3%
Trp (W) | 10 | 2.6%
Tyr (Y) | 13 | 3.4%
Total:  | 379
Theoretical 2D gel:

小红点 :)

这里插入图片描述


编辑: 这是一个文件的例子 - 文件需要通过名为 arquivo 的字段发送。

import requests
import lxml.html

url = 'http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/pimw.cgi'
payload = {
#    'arquivo': '', # remove it
    'opShowTitle': 'ON',
    'opShowSeq': 'ON',
    'opShowStat': 'ON',
    'opShowpimw': 'ON',
    'opGelVirtual': 'ON',
    'opMap': 'gel0.def',
    'opPK': 'Default',
    'tbCt': 3.55,
    'tbNt': 7,
    'tbArg': 12.01,
    'tbAsp': 4.06,
    'tbCys': 9,
    'tbGlu': 4.45,
    'tbHis': 5.985,
    'tbLys': 10.01,
    'tbTyr': 10.01,
    'tbSeq': '',
}

files = {'arquivo': open('sequence.fasta').read()}

#url = 'http://httpbin.org/post' # special portal for tests

# send POST    
r = requests.post(url, data=payload, files=files)

#print r.text

# convert HTML string into HTML tree
html = lxml.html.fromstring(r.text)

# get all images
imgs = html.cssselect('img')

# get second image
if len(imgs) > 1:
    url = 'http://pro-161-70.ib.unicamp.br/~itaraju/cgi-bin/itaraju/bioinf/' + imgs[1].attrib['src'].strip()

    print "Downloading ...",  url

    with open('output.gif', 'wb') as handle:
        r = requests.get(url, stream=True)

        if not r.ok:
            # Something went wrong
            pass

        for block in r.iter_content(1024):
            if not block:
                break

            handle.write(block)
            print '.',

        print 

# get data
for tr in html.cssselect('tr'):
    for td in tr.cssselect('tr'):
        print td.text_content().strip().replace('\n', ' | '),
    print 

使用的文件是 sequence.fasta

>gi|532319|pir|TVFV2E|TVFV2E envelope protein
ELRLRYCAPAGFALLKCNDADYDGFKTNCSNVSVVH
CTNLMNTTVTTGLLLNGSYSENRTQIWQKHRTSNDS
ALILLNKHYNLTVTCKRPGNKTVLPVTIMAGLVFHSQ
KYNLRLRQAWCHFPSNWKGAWKEVKEEIVNLPKER
YRGTNDPKRIFFQRQWGDPETANLWFNCHGEFFYCK
MDWFLNYLNNLTVDADHNECKNTSGTKSGNKRAPG
PCVQRTYVACHIRSVIIWLETISKKTYAPPREGHLECT
STVTGMTVELNYIPKNRTNVTLSPQIESIWAAELDRY
KLVEITPIGFAPTEVRRYTGGHERQKRVPFVXXXXXX
XXXXXXXXXXXXXXXXVQSQHLLAGILQQQKNL
LAAVEAQQQMLKLTIWGVK

撰写回答