links = re.findall(r'\w+://\w+.\w+.\w+\w+\w.+"', page)
解析网页中的链接。你知道吗
任何帮助都将不胜感激。这是我从分析http://www.soc.napier.ac.uk/~cs342/CSN08115/cw_webpage/index.html中得到的:
#my current output#
http://net.tutsplus.com/tutorials/other/8-regular-expressions-you-should-know/"
http://www.asecuritysite.com/content/icon_clown.gif" alt="if broken see alex@school.ac.uk +44(0)1314552759" height="100"
http://www.rottentomatoes.com/m/sleeper/"
http://www.rottentomatoes.com/m/sleeper/trailer/"
http://www.rottentomatoes.com/m/star_wars/"
http://www.rottentomatoes.com/m/star_wars/trailer/"
http://www.rottentomatoes.com/m/wargames/"
http://www.rottentomatoes.com/m/wargames/trailer/"
https://www.sans.org/press/sans-institute-and-crowdstrike-partner-to-offer-hacking-exposed-live-webinar-series.php"> SANS to Offer "Hacking Exposed Live"
https://www.sans.org/webcasts/archive/2013"
#I want to get this when i run the module#
http://net.tutsplus.com/tutorials/other/8-regular-expressions-you-should-know/
http://www.asecuritysite.com/content/icon_clown.gif
http://www.rottentomatoes.com/m/sleeper/
http://www.rottentomatoes.com/m/sleeper/trailer/
http://www.rottentomatoes.com/m/star_wars/
http://www.rottentomatoes.com/m/star_wars/trailer/
http://www.rottentomatoes.com/m/wargames/
http://www.rottentomatoes.com/m/wargames/trailer/
https://www.sans.org/press/sans-institute-and-crowdstrike-partner-to-offer-hacking-exposed-live-webinar-series.php
https://www.sans.org/webcasts/archive/2013
You should not use regular expressions for parsing HTML.有专门的工具叫做HTML解析器。你知道吗
下面是一个使用^{} 和^{} 的示例:
印刷品:
通过BeautifulsoupCSS selectors。你知道吗
试试看这个。看到了吗演示。你知道吗
http://regex101.com/r/hQ9xT1/31
相关问题 更多 >
编程相关推荐