无头程序化网页浏览器在请求和靓汤之上

pynav2的Python项目详细描述


Pynav2

在请求和靓汤之上的无头编程Web浏览器

要求

Python3.4+

unittest从python 3.4测试到3.7

安装

如果python3是默认的python二进制文件

pip install pynav2

如果python2是默认的python二进制文件

pip3 install pynav2

许可证

GNU LGPLv3(GNU Lesser通用公共许可版本3)

交互模式示例

所有示例都需要

frompynav2importBrowserb=Browser()

http get请求并打印响应

获取http://example.com(如果服务器上可用,请使用https)

>>>b.get('example.com')<Response[200]>>>>b.text# alias for b.response.text'<!DOCTYPE html>\n<html lang="mul" class="no-js">\n<head>\n<meta charset="utf-8">\n<title>example.com</title>...'

http get请求并打印json响应

gethttp://example.com/user-agent/json如果不,则返回响应的json编码内容

>>>b.get('example.com/user-agent/json')<Response[200]>>>>b.json# alias for b.response.json(){'user-agent':'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0'}

http post请求并打印响应

>>>data={'q':'python'}>>>b.post('example.com/search',data=data)<Response[200]>>>>b.text'<!DOCTYPE html>\n<html lang="mul" class="no-js">\n<head>\n<meta charset="utf-8">\n<title>example.com</title>...'

http发布json请求并打印json响应

>>>importjson>>>data={'login':'user','password':'pass'}>>>b.post('example.com/login',json=json.dumps(data))# json to send in the body of the request<Response[200]>>>>b.json{'login':'success'}

http头请求和打印响应头

>>>b.head('example.com')<Response[200]>>>>b.response.headers{'Server':'nginx','Content-Type':'text/html; charset=utf-8','Content-Length':'48842','Age':'3154','Connection':'keep-alive'}

http put请求并打印json响应

>>>data={'version':'2.1','licence':'LGPL'}>>>b.put('example.com/api/about/',data=data)<Response[200]>>>>b.json{'update':'success'}

http补丁请求并打印json响应

>>>data={'version':'2.1'}>>>b.patch('example.com/api/about/',data=data)<Response[200]>>>>b.json{'patch':'success'}

http删除请求并打印json响应

>>>b.delete('example.com/api/user/102')<Response[200]>>>>b.json{'delete':'success'}

http选项请求并打印json响应

>>>b.options('example.com/api/user')<Response[200]>>>>b.json{'options':'...'}

获取所有链接
>>>b.get('example.com')<Response[200]>>>>b.links['http://example.com/news','http://example.com/forum','http://example.com/contact']>>>forlinkinb.links:...print(link)...http://example.com/newshttp://example.com/forumhttp://example.com/contact

过滤链接

可以添加任何beautifulsoup.find_all()参数,请参见Beautiful Soup documentation

>>>importre>>>b.get('example.com')<Response[200]>>>>b.get_links(text='Python Events')# regular expression>>>b.get_links(class_="jump-link")# no regular expression for class attribute>>>b.get_links(href="windows")# regular expression>>>b.get_links(title=re.compile('success'))# manual regular expression

获取所有图像
>>>b.get('example.com')<Response[200]>>>>b.images['http://example.com/img/logo.png','http://example.com/img/picture.jpg','http://there.com/news.gif']

过滤图像

可以添加任何beautifulsoup.find_all()参数,请参见Beautiful Soup documentation

>>>b.get('example.com')<Response[200]>>>>b.get_images(src='logo')# regular expression>>>b.get_images(class_='python-logo')# no regular expression for class attribute>>>b.get_images(alt='yth')# regular expression

下载文件

>>>b.verbose=True>>>b.download('http://example.com/ubuntu-amd64','/tmp')# it will follow redirect and look for headers content-disposition to find filenamedownloadingubuntu-18.04.1-desktop-amd64.iso(1.8GB)to:/tmp/ubuntu-18.04.1-desktop-amd64.isodownloadcompletedin12minutes5seconds(1.8GB)

处理引用程序

>>>b.handle_referer=True>>>b.get('somewhere.com')>>>b.get('example.com')# request headers will have http://somewhere.com as referer>>>b.get('there.com')# request headers will have http://example.com as referer

手动设置referer
>>>b.referer='http://www.here.com'>>>b.get('example.com')# request headers will have http://here.com as referer

设置用户代理

用户代理模块包括用户代理列表:

Firefox_Windows、Chrome_Windows、Edge_Windows、IE_Windows、Firefox_Linux、Chrome_Linux、Safari_Mac

默认用户代理是Firefox_Windows

>>>frompynav2importuseragent>>>b.user_agent=useragent.firefox_linux>>>b.get('example.com')# request headers will have 'Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0' as User-Agent>>>b.user_agent='my_app/v1.0'>>>b.get('example.com')# request headers will have my_app/v1.0 as User-Agent 

设置请求前的睡眠时间
>>>b.set_sleep_time(0.5,1.5)# random x seconds between 0.5 to 1.5 seconds and wait x before each request>>>b.get('example.com')# wait x seconds before request

定义请求超时

10秒超时

>>>b.timeout=10

关闭所有打开的TCP会话

>>>b.get('example1.com')>>>b.get('example2.com')>>>b.get('example3.com')>>>b.session.close()

为一个请求设置使用https请求的http代理

袜子代理见Requests documentation

>>>b.get('https://httpbin.org/ip').json()['origin']111.111.111.111>>>proxies={'https':'10.0.0.0:1234'}>>>b.timeout=10# could be useful to wait 10 seconds if proxies are slow>>>b.get('https://httpbin.org/ip',proxies=proxies).json()['origin']10.0.0.0

为所有请求设置使用https请求的http代理

袜子代理见Requests documentation

>>>b.get('https://httpbin.org/ip').json()['origin']111.111.111.111>>>b.proxies={'https':'10.0.0.0:1234'}>>>b.timeout=10# could be useful to wait 10 seconds if proxies are slow>>>b.get('https://httpbin.org/ip').json()['origin']10.0.0.0

为所有请求设置使用https请求的http代理,为特定域设置另一个代理

袜子代理见Requests documentation

>>>b.get('https://httpbin.org/ip').json()['origin']111.111.111.111>>>b.proxies={'https':'10.0.0.0:1234','https://specific-domain.com':'10.11.12.13:1234'}>>>b.timeout=10# could be useful to wait 10 seconds if proxies are slow>>>b.get('https://httpbin.org/ip').json()['origin']10.0.0.0>>>b.get('https://specific-domain.com/ip').json()['origin']10.11.12.13

获取美化组实例

在GET或POST请求之后,browser.bs(beautifulsoup)将自动启动,并带有b.response.text

Beautifll Soup documentation

>>>b.get('example.com')>>>b.bs.find_all('a')

获取请求对象实例

Requests documentation

>>>b.get('example.com')>>>b.session>>>b.request>>>b.response

获取浏览器历史记录
>>>b.get('example1.com')>>>b.get('example2.com')>>>b.get('example3.com')>>>printb.history['example1.com','example2.com','example3.com']

禁用“不安全请求警告:正在发出未验证的https请求”

>>>importurllib3>>>urllib3.disable_warnings()>>>b.get('example.com')# no warnings 

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
JavaJSonarray不会从SeekBar读取double   使用另一个类从Java中的2D数组打印用户输入   java ClassNotFoundException的原因   spring调用两个方法以返回Java中的不同页面   httpurlconnection Java禁止的代码错误,但浏览器错误(2)   java画布矩阵转换   java:在另一个java映射中使用“Map”作为值   java“未找到用于解密的证书”(Apache CXF,WSSecurity)   java如何查看JTable中选择的行   java在没有xmlwrappers的情况下重复xml元素序列集   java将垂直直方图打印到控制台   java Spring JDBCTemplate:构造不带特殊字符的JSON   java PayPal RestApi获取用户信息