获取不带http或www的URL的Python正则表达式

0 投票

2 回答

1257 浏览

提问于 2025-04-18 03:17

我想知道怎么用Python从搜索结果中只提取出site.com，以便获取关于谷歌搜索中单词的相关信息？

from xgoogle.search import GoogleSearch, SearchError
try:
  page = 1
  gs = GoogleSearch("#hashtag insights")
  gs.results_per_page = 100
  results = []
  while True:
    tmp = gs.get_results()
    if not tmp: # no more results were found
      break
    results.extend(tmp)
  # ... do something with all the results ...
except SearchError, e:
  print "Search failed: %s" % e

for res in results:
    print res.url

正则表达式数据处理 URL提取

2 个回答

可以试试用正则表达式，像这样：

import re
s = 'http://www.google.com'

>>> print re.search(r'^https?:\/\/www\.(.*)$', s).group(1)
google.com

如果你的网站内容更广泛，可以这样做：

import re
s = 'http://username.blogspot.com'

>>> print re.search(r'^https?:\/\/[^.]*.(.*)$', s).group(1)
blogspot.com

回答于 2025-04-18 由 Python大师

分享举报

你不需要用正则表达式来处理这个问题，可以使用 urlparse 这个工具。

hostname = urlparse.urlparse("http://www.techcrunch.com/").hostname

http://docs.python.org/library/urlparse.html

回答于 2025-04-18 由 Python大师

分享举报

获取不带http或www的URL的Python正则表达式

2 个回答

撰写回答