清理url的实用程序
urlclean的Python项目详细描述
欢迎使用urlclean的文档!
urlclean提供以下功能:
- 要执行http重定向,
- 要执行HTML元重定向,
- 要删除顽童和Facebook跟踪器URL参数,
- 用于进一步清洁电源的插件,
- 将所有这些组合起来以取消排序并解析各种URL
从命令行中尝试:
python -m urlclean <some url>
内容:
索引和表格
- 索引
- 模块索引
- 搜索页
代码的文档
urlcleaner解析重定向url并删除跟踪的模块 url参数
urlclean.weedparams(url)
removes Urchin Tracker and Facebook surveillance params from urls.
Args:
url (str): The url to scrub of ugly params返回:
(str). The return cleaned url
urlclean.httpresolve(url,ua=none,proxyhost='',proxyport='')
resolve one redirection of a http request.
Args:
url (str): The url to follow one redirect
ua (fn): A function returning a User Agent string (optional)
proxyhost (str): http proxy server (optional)
proxyport (int): http proxy server port (optional)
- 返回:(str,httplib.response)。返回解析的url,并且
- 来自http查询的响应
urlclean.uneta(url,res)
Finds any meta redirects a httplib.response object that has text/html as content-type.
Args:
url (str): The url to follow one redirect
res (httplib.response): a http.response object
返回:(str)。返回解析的url
urlclean.unsharten(url,cache=none,ua=none,>;>;**<;<;kwargs)
resolves all HTTP/META redirects and optionally caches them in any object supporting a __getitem__, __setitem__ interface
Args:
url (str): The url to follow one redirect
cache (PersistentCryptoDict): an optional PersistentCryptoDict instance
ua (fn): A function returning a User Agent string (optional), the default is googlebot.
>>**<<kwargs (dict): optional proxy args for urlclean.httpresolve (default: localhost:8118)
返回:(str)。返回最终清理的url。
插件
插件应该有一个convert函数,它接收并返回 网址。如果出现错误,应返回未更改的url。
变更日志
- v0.5.4-修复了相对URL的HttpResolve
- v0.5.1-安装/doc修复程序
- v0.5-添加插件