帮助获取和分析带有请求、lxml和美化组4的页面上的文本

parse-helper的Python项目详细描述


安装

lxml

安装系统要求
% sudo apt-get install -y libxml2 libxslt1.1 libxml2-dev libxslt1-dev zlib1g-dev

or

% brew install libxml2

使用pip

安装
% pip3 install parse-helper

用法

ph-ddgph-download-filesph-download-file-asph-soup-explore提供脚本

$ venv/bin/ph-ddg --help
Usage: ph-ddg [OPTIONS] [QUERY]

  Pass a search query to duckduckgo api

Options:
  --help  Show this message and exit.

$ venv/bin/ph-download-files --help
Usage: ph-download-files [OPTIONS] [ARGS]...

  Download all links to local files

  - args: urls or filenames containing urls

Options:
  --help  Show this message and exit.

$ venv/bin/ph-download-file-as --help
Usage: ph-download-file-as [OPTIONS] URL [LOCALFILE]

  Download link to local file

  - url: a string - localfile: a string

Options:
  --help  Show this message and exit.

$ venv/bin/ph-soup-explore --help
Usage: ph-soup-explore [OPTIONS] [URL_OR_FILE]

  Create a soup object from a url or file and explore with ipython

Options:
  --help  Show this message and exit.
In[1]:importparse_helperasphIn[2]:ph.USER_AGENTOut[2]:'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/58.0.3029.110 Chrome/58.0.3029.110 Safari/537.36'In[3]:ph.duckduckgo_api('adventure time')2019-08-2706:21:05,303:FetchingJSONfromhttps://api.duckduckgo.com?q=adventure+time&format=jsonOut[3]:[{'text':'Adventure Time An American animated television series created by Pendleton Ward for Cartoon Network.','thumbnail':'https://duckduckgo.com/i/fb8f17fd.png','link':'https://duckduckgo.com/Adventure_Time'},{'text':'"Adventure Time" (pilot) An animated short created by Pendleton Ward, as well as the pilot to the Cartoon Network series...','thumbnail':'https://duckduckgo.com/i/aa9b49e0.png','link':'https://duckduckgo.com/Adventure_Time_(pilot)'},{'text':"Adventure Time (1959 TV series) A local children's television show on WTAE-TV 4 in Pittsburgh, Pennsylvania, from 1959 to 1975.",'thumbnail':'','link':'https://duckduckgo.com/Adventure_Time_(1959_TV_series)'},{'text':"Adventure Time (1967 TV series) A Canadian children's adventure television series which aired on CBC Television in 1967 and 1968.",'thumbnail':'','link':'https://duckduckgo.com/Adventure_Time_(1967_TV_series)'},{'text':'Adventure Time (album) The second album for the rock/pop trio The Elvis Brothers.','thumbnail':'','link':'https://duckduckgo.com/Adventure_Time_(album)'}]

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java模拟存储库对象从controller testcase返回空结果?   Java扫描程序跳行异常   c#混淆If语句?   java为什么日志中的stacktrace前面没有这一行?   java如何使SSHJ在非标准端口上启动出站SFTP?   java我如何确定为什么每周收入没有在我的程序中显示前三次?   java如何判断堆叠条形图的轴的行为?   java rest json API的Web服务器体系结构   java ClassCastException:javax。摆动按扭   Java在Linux上不读取excel文件(使用Apache POI)   反馈错误后,java Wicket 1.4.9无法从modal使用AjaxRequestTarget!   java方法头中“静态”的含义是什么?   java无法在我的tomcat应用程序中添加外部库   java使用Itext Pdf将图像添加到Pdf文件   java为什么当我调用universe时java3d会闪烁。getCanvas()。getView()。重新油漆();   java Gson嵌套类在使用proguard时为空   java如何确定JSON路径对文档是否有效   java在使用Jsoup解析时保持HTML布尔属性的原始形式   使用Java SDK,如何在Azure存储文件服务中为文件设置元数据?