抓取HTML数据并解析为列表

0 投票

2 回答

632 浏览

提问于 2025-04-18 02:38

我正在用Python为安卓写一个应用程序（使用sl4a），我想让这个应用程序去搜索一个笑话网站，然后提取一个笑话。接着，它会告诉我这个笑话，以便让我清醒过来。目前，它能把原始的HTML源代码保存到一个列表里，但我需要它从HTML标签中提取数据，保存到一个新的列表里，然后再把这些数据读给我听。现在我遇到的问题是解析器无法正常工作。以下是我的代码：

import android
droid = android.Android() 
import urllib 
current = 0
newlist = []

sock = urllib.urlopen("http://m.funtweets.com/random") 
htmlSource = sock.read() 
sock.close() 
rawhtml = []
rawhtml.append (htmlSource)

while current < len(rawhtml):
    while current != "<div class=":
        if [current] == "</b></a>":
            newlist.append (current)
            current += 1


print newlist

html解析数据抓取列表处理安卓开发笑话提取

2 个回答

这是怎么做的：

page = urllib2.urlopen("http://www.m.funtweets.com/random").read() 
user = re.compile(r'<span>@</span>(\w+)') 
text = re.compile(r"</b></a> (\w.*)") 
user_lst =[match.group(1) for match in re.finditer(user, page)] 
text_lst =[match.group(1) for match in re.finditer(text, page)] 
for _user, _text in zip(user_lst, text_lst):
    print '@{0}\n{1}\n'.format(_user,_text)

首先，你需要导入两个库：一个是“re”，它用来处理正则表达式，另一个是“urllib2”，它用来处理网络请求。

page = urllib2.urlopen("http://www.m.funtweets.com/random").read() 
user = re.compile(r'<span>@</span>(\w+)') 
text = re.compile(r"</b></a> (\w.*)") 
user_lst =[match.group(1) for match in re.finditer(user, page)] 
text_lst =[match.group(1) for match in re.finditer(text, page)] 
for _user, _text in zip(user_lst, text_lst):
    print '@{0}\n{1}\n'.format(_user,_text)

回答于 2025-04-18 由 Python大师

分享举报

在安卓中解析HTML时，可以使用这个库 http://jsoup.org/。这个库功能强大，开发者们都很认可它，而且它在Python中也可以使用哦！

回答于 2025-04-18 由 Python大师

分享举报

抓取HTML数据并解析为列表

2 个回答

撰写回答