我的Python程序运行得很慢

4 投票

2 回答

18053 浏览

提问于 2025-04-17 16:56

我正在制作一个程序，这个程序（至少现在）是从TwitchTV（一个直播平台）获取直播信息的。这个程序是为了自我学习，但当我运行它时，打印出主播的名字竟然要花2分钟。

我在Windows7上使用的是Python 2.7.3 64位，如果这有什么重要性的话。

这是classes.py的内容：

#imports:
import urllib
import re

#classes:
class Streamer:

    #constructor:
    def __init__(self, name, mode, link):
        self.name = name
        self.mode = mode
        self.link = link

class Information:

    #constructor:
    def __init__(self, TWITCH_STREAMS, GAME, STREAMER_INFO):
        self.TWITCH_STREAMS = TWITCH_STREAMS
        self.GAME = GAME
        self.STREAMER_INFO = STREAMER_INFO

    def get_game_streamer_names(self):
        "Connects to Twitch.TV API, extracts and returns all streams for a spesific game."

        #start connection
        self.con = urllib2.urlopen(self.TWITCH_STREAMS + self.GAME)
        self.info = self.con.read()
        self.con.close()

        #regular expressions to get all the stream names
        self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
        self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info


        #run in a for to reduce all "live_user_NAME" values
        for name in self.streamers_names:
            if name.startswith("live_user_"):
                self.streamers_names.remove(name)

        #end method
        return self.streamers_names

    def get_streamer_mode(self, name):
        "Returns a streamers mode (on/off)"

        #start connection
        self.con = urllib2.urlopen(self.STREAMER_INFO + name)
        self.info = self.con.read()
        self.con.close()

    #check if stream is online or offline ("stream":null indicates offline stream)
    if self.info.count('"stream":null') > 0:
        return "offline"
    else:
        return "online"

这是main.py的内容：

#imports:
from classes import *

#consts:
TWITCH_STREAMS = "https://api.twitch.tv/kraken/streams/?game=" #add the game name at the end of the link (space = "+", eg: Game+Name)
STREAMER_INFO  = "https://api.twitch.tv/kraken/streams/" #add streamer name at the end of the link
GAME = "League+of+Legends"

def main():
    #create an information object
    info = Information(TWITCH_STREAMS, GAME, STREAMER_INFO)

    streamer_list = [] #create a streamer list
    for name in info.get_game_streamer_names():
        #run for every streamer name, create a streamer object and place it in the list
        mode =  info.get_streamer_mode(name)
        streamer_name = Streamer(name, mode, 'http://twitch.tv/' + name)
        streamer_list.append(streamer_name)

    #this line is just to try and print something
    print streamer_list[0].name, streamer_list[0].mode


if __name__ == '__main__':
    main()

这个程序本身运行得很好，只是速度非常慢。

有没有什么建议？

网络请求编程效率 windows环境程序优化性能调优直播数据获取

2 个回答

你在这里用错工具来解析从网址返回的json数据了。你应该使用默认提供的json库，而不是用正则表达式来解析数据。这样可以让你的程序运行得更快。

把正则表达式解析器换成

#regular expressions to get all the stream names
        self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
        self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info

json解析器

self.info = json.loads(self.info) #This will parse the json data as a Python Object
#Parse the name and return a generator 
return (stream['name'] for stream in data[u'streams'])

回答于 2025-04-17 由 Python大师

分享举报

程序的效率通常遵循80/20法则（有些人称之为90/10法则，甚至95/5法则）。也就是说，程序在运行的80%时间里，实际上只是在20%的代码中执行。换句话说，你的代码中可能有一个“瓶颈”：就是某一小块代码运行得很慢，而其他部分运行得很快。你的目标是找出这个瓶颈（或者多个瓶颈），然后修复它们，让程序运行得更快。

最好的方法是对你的代码进行性能分析。这意味着你需要记录特定操作发生的时间，可以使用logging模块，像评论者建议的那样使用timeit，或者使用一些内置的性能分析工具，甚至可以在程序的不同点打印出当前时间。最终，你会发现某一部分代码似乎花费了最多的时间。

经验告诉我们，I/O操作（比如从硬盘读取数据，或者通过互联网访问资源）通常比内存中的计算要慢。我猜测你遇到的问题是，你使用了一个HTTP连接来获取主播列表，然后又用一个HTTP连接来获取每个主播的状态。假设有10000个主播，你的程序就需要进行10001次HTTP连接才能完成。

如果真是这样，有几种方法可以解决这个问题：

看看Twitch.TV的API是否有其他选项，可以让你一次性获取带有直播状态的用户列表，这样就不需要为每个主播单独调用API了。
缓存结果。这不会让你的程序在第一次运行时变得更快，但如果在一分钟内再次运行，你可以让它重用之前的结果。
限制你的应用一次只处理少量主播。如果有10000个主播，你的应用到底需要查看所有10000个主播的状态吗？也许只抓取前20个主播就够了，用户可以按一个键获取下20个，或者关闭应用。很多时候，编程不仅仅是写代码，还要管理用户的期望。这似乎是一个个人项目，所以可能没有“用户”，这意味着你可以自由地改变应用的功能。
使用多个连接。目前，你的应用只与服务器建立一个连接，等待结果返回，解析结果，保存，然后再开始下一个连接。这个过程可能需要半秒钟。如果有250个主播，处理每个主播的过程总共会花费超过两分钟。然而，如果你能同时处理四个连接，总时间可能会减少到不到30秒。可以看看multiprocessing模块。请记住，有些API可能对同时连接的数量有限制，所以一次性发起50个连接可能会让他们不高兴，导致你无法访问他们的API。这里要小心。

回答于 2025-04-17 由 Python大师

分享举报

我的Python程序运行得很慢

2 个回答

撰写回答