Python SimpleSpider包_程序模块 - PyPI

更容易使用网络蜘蛛

SimpleSpider的Python项目详细描述

单纯形指令

如何安装

pip install SimpleSpider

这是一个帮助你更容易使用网络蜘蛛的模块。在

如何安装

pip install SimpleSpider

在命令中使用

在命令中使用时有9个参数。在

argument	type	default	desctipyion
url	str	None	Your url
single	bool	True	If you want to use script to get the content from series of page,you can set it as False and se the index.
re	str	None	Regular Expression setting use,dont forget to use "" ,eg: --re "ab*c"
xpath	str	None	Xpath setting use, dont forget to use "",eg:--xpath "//*div[0]/text()"
index	str	default	use "," to spite the index, eg --index 1,2,3,4,5,6,7
print	bool	True	if you dont want to print out it in the console,set it as False
output	str	None	if you want to export your result, use it to set the path,eg: --output "D:/data.xlsx."
mode	str	None	you can use "img", "xp" and "re" to set mode get img urls,or use xpath, or regular expression
indexfile	str	None	you can directly read the link by file

例1：从单页中获取带有正则表达式的数据。在

SimpleSpider --mode re --url https://www.163.com --re "<title>(.*.?)</title>"

输出： 网易

例2：使用Xpath从单页获取数据
SimpleSpider --mode xp--url https://www.163.com --xpath "//title/text()"

输出：
网易

例3：使用Xpath从多个页面获取数据
SimpleSpider --mode xp --url https://ent.163.com/20/0323/ --re "<title>(.*.?)</title>" --single False --index 08/F8D2BVI700038FO9.html,10/F8D8B35800038FO9.html

输出：
'疫情期间还出游？网友在巴厘岛偶遇霍建华林心如_网易娱乐'
'台湾女星刘真去世：上《康熙》走红当郭台铭红娘_网易娱乐'

例4：使用Xpath从多个页面/链接获取数据 SimpleSpider --mode xp --url https://ent.163.com/20/0323/ --re "<title>(.*.?)</title>" --single False --indexfile data.txt 索引文件应该这样写： 1.html 2.html 3.html url是http://example.com/test（这是索引）

例5：使用Xpath从单页获取数据
SimpleSpider --mode img --url https://www.baidu.com

输出：
//www.baidu.com/img/gs.gif

如果要使用此模型中的函数，只需：

from SimpleSpider import SimpleSpider

有一些功能可以让您简化代码
例1：

result = SinglePageGetByRegEx(Url=http://www.163.com,Re="<title>(.*?.)")
result的值是['网易']

例2： List = [53,54,55,56]
result = MulityPageGetByRegEx(Url="http://www.oursteps.com.au/bbs/forum.php?mod=forumdisplay&fid=", IndexList=List,RegEx="<title>(.*?.)</title>") result的值是[['生活其他 - 新足迹 - 新足迹澳洲华人生活大全'], ['证券外汇 - 新足迹澳洲华人生活大全'], ['个人理财 - 新足迹澳洲华人生活大全'], ['生意种种 - 新足迹澳洲华人生活大全']]

可以使用Xpath和正则表达式。在

也可以直接在页面中获取中间字符串。例3： html页面是

^{pr2}$

result = SinglePageGetMiddleStr(http://www.163.com,front="<title>,back="</title>")
输出
['网易']

也可以直接在页面中获取图像。 result = SinglePageGetImgUrl(http://www.baidu.com")
输出
//www.baidu.com/img/gs.gif

如果您想了解更多，请访问：https://github.com/shanzhengliu/SimpleSpider

欢迎加入QQ群-->： 979659372

SimpleSpider 0.1.3

SimpleSpider的Python项目详细描述

单纯形指令

如何安装

如何安装

在命令中使用

如果要使用此模型中的函数，只需：

推荐PyPI第三方库

pyvis

pymachine

mosaicode-lib-c-opencv

c7n-kube

martinmymath

kfinance

LAMMPyS

porousmedialab

pytrix

firewatch

geneseekr

mpfshell

dh-poetr

lando-util

recordb

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

SimpleSpider 0.1.3

SimpleSpider的Python项目详细描述

单纯形指令

如何安装

如何安装

在命令中使用

如果要使用此模型中的函数，只需：

推荐PyPI第三方库

pyvis

pymachine

mosaicode-lib-c-opencv

c7n-kube

martinmymath

kfinance

LAMMPyS

porousmedialab

pytrix

firewatch

geneseekr

mpfshell

dh-poetr

lando-util

recordb

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签