无头程序化网页浏览器在请求和靓汤之上
pynav2的Python项目详细描述
Pynav2
在请求和靓汤之上的无头编程Web浏览器
要求
Python3.4+
unittest从python 3.4测试到3.7
安装
如果python3是默认的python二进制文件
pip install pynav2
如果python2是默认的python二进制文件
pip3 install pynav2
许可证
GNU LGPLv3(GNU Lesser通用公共许可版本3)
交互模式示例
所有示例都需要
frompynav2importBrowserb=Browser()
http get请求并打印响应
获取http://example.com(如果服务器上可用,请使用https)
>>>b.get('example.com')<Response[200]>>>>b.text# alias for b.response.text'<!DOCTYPE html>\n<html lang="mul" class="no-js">\n<head>\n<meta charset="utf-8">\n<title>example.com</title>...'
http get请求并打印json响应
gethttp://example.com/user-agent/json如果不,则返回响应的json编码内容
>>>b.get('example.com/user-agent/json')<Response[200]>>>>b.json# alias for b.response.json(){'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0'}
http post请求并打印响应
>>>data={'q':'python'}>>>b.post('example.com/search',data=data)<Response[200]>>>>b.text'<!DOCTYPE html>\n<html lang="mul" class="no-js">\n<head>\n<meta charset="utf-8">\n<title>example.com</title>...'
http发布json请求并打印json响应
>>>importjson>>>data={'login':'user','password':'pass'}>>>b.post('example.com/login',json=json.dumps(data))# json to send in the body of the request<Response[200]>>>>b.json{'login':'success'}
http头请求和打印响应头
>>>b.head('example.com')<Response[200]>>>>b.response.headers{'Server':'nginx','Content-Type':'text/html; charset=utf-8','Content-Length':'48842','Age':'3154','Connection':'keep-alive'}
http put请求并打印json响应
>>>data={'version':'2.1','licence':'LGPL'}>>>b.put('example.com/api/about/',data=data)<Response[200]>>>>b.json{'update':'success'}
http补丁请求并打印json响应
>>>data={'version':'2.1'}>>>b.patch('example.com/api/about/',data=data)<Response[200]>>>>b.json{'patch':'success'}
http删除请求并打印json响应
>>>b.delete('example.com/api/user/102')<Response[200]>>>>b.json{'delete':'success'}
http选项请求并打印json响应
>>>b.options('example.com/api/user')<Response[200]>>>>b.json{'options':'...'}
获取所有链接
>>>b.get('example.com')<Response[200]>>>>b.links['http://example.com/news','http://example.com/forum','http://example.com/contact']>>>forlinkinb.links:...print(link)...http://example.com/newshttp://example.com/forumhttp://example.com/contact
过滤链接
可以添加任何beautifulsoup.find_all()参数,请参见Beautiful Soup documentation
>>>importre>>>b.get('example.com')<Response[200]>>>>b.get_links(text='Python Events')# regular expression>>>b.get_links(class_="jump-link")# no regular expression for class attribute>>>b.get_links(href="windows")# regular expression>>>b.get_links(title=re.compile('success'))# manual regular expression
获取所有图像
>>>b.get('example.com')<Response[200]>>>>b.images['http://example.com/img/logo.png','http://example.com/img/picture.jpg','http://there.com/news.gif']
过滤图像
可以添加任何beautifulsoup.find_all()参数,请参见Beautiful Soup documentation
>>>b.get('example.com')<Response[200]>>>>b.get_images(src='logo')# regular expression>>>b.get_images(class_='python-logo')# no regular expression for class attribute>>>b.get_images(alt='yth')# regular expression
下载文件
>>>b.verbose=True>>>b.download('http://example.com/ubuntu-amd64','/tmp')# it will follow redirect and look for headers content-disposition to find filenamedownloadingubuntu-18.04.1-desktop-amd64.iso(1.8GB)to:/tmp/ubuntu-18.04.1-desktop-amd64.isodownloadcompletedin12minutes5seconds(1.8GB)
处理引用程序
>>>b.handle_referer=True>>>b.get('somewhere.com')>>>b.get('example.com')# request headers will have http://somewhere.com as referer>>>b.get('there.com')# request headers will have http://example.com as referer
手动设置referer
>>>b.referer='http://www.here.com'>>>b.get('example.com')# request headers will have http://here.com as referer
设置用户代理
用户代理模块包括用户代理列表:
Firefox_Windows、Chrome_Windows、Edge_Windows、IE_Windows、Firefox_Linux、Chrome_Linux、Safari_Mac
默认用户代理是Firefox_Windows
>>>frompynav2importuseragent>>>b.user_agent=useragent.firefox_linux>>>b.get('example.com')# request headers will have 'Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0' as User-Agent>>>b.user_agent='my_app/v1.0'>>>b.get('example.com')# request headers will have my_app/v1.0 as User-Agent
设置请求前的睡眠时间
>>>b.set_sleep_time(0.5,1.5)# random x seconds between 0.5 to 1.5 seconds and wait x before each request>>>b.get('example.com')# wait x seconds before request
定义请求超时
10秒超时
>>>b.timeout=10
关闭所有打开的TCP会话
>>>b.get('example1.com')>>>b.get('example2.com')>>>b.get('example3.com')>>>b.session.close()
为一个请求设置使用https请求的http代理
>>>b.get('https://httpbin.org/ip').json()['origin']111.111.111.111>>>proxies={'https':'10.0.0.0:1234'}>>>b.timeout=10# could be useful to wait 10 seconds if proxies are slow>>>b.get('https://httpbin.org/ip',proxies=proxies).json()['origin']10.0.0.0
为所有请求设置使用https请求的http代理
>>>b.get('https://httpbin.org/ip').json()['origin']111.111.111.111>>>b.proxies={'https':'10.0.0.0:1234'}>>>b.timeout=10# could be useful to wait 10 seconds if proxies are slow>>>b.get('https://httpbin.org/ip').json()['origin']10.0.0.0
为所有请求设置使用https请求的http代理,为特定域设置另一个代理
>>>b.get('https://httpbin.org/ip').json()['origin']111.111.111.111>>>b.proxies={'https':'10.0.0.0:1234','https://specific-domain.com':'10.11.12.13:1234'}>>>b.timeout=10# could be useful to wait 10 seconds if proxies are slow>>>b.get('https://httpbin.org/ip').json()['origin']10.0.0.0>>>b.get('https://specific-domain.com/ip').json()['origin']10.11.12.13
获取美化组实例
在GET或POST请求之后,browser.bs(beautifulsoup)将自动启动,并带有b.response.text
>>>b.get('example.com')>>>b.bs.find_all('a')
获取请求对象实例
>>>b.get('example.com')>>>b.session>>>b.request>>>b.response