如何为我的pymongo/twitter脚本创建函数？

Question

我正在用Python、MongoDB和pymongo模块创建一些脚本，目的是从Twitter API获取特定的信息，并把这些信息存储到Mongo数据库里。我写了一些脚本来做不同的事情，比如访问搜索API、获取用户时间线等等。不过，我现在刚开始熟悉这些工具，是时候让我的代码变得更高效了。因此，我现在正在给我的脚本添加函数和类。这是我一个没有使用函数或类的脚本：

#!/usr/local/bin/python

import twitter
import datetime
from datetime import date, timedelta, datetime
import pymongo
from pymongo import Connection

# Twitter handle that we are scraping mentions for
SCREEN_NAME = '@twitterapi'

# Connect to the database
connection = Connection()
db = connection.test    
collection = db.twitterapi_mentions  # Change the name of this database
t = twitter.Twitter(domain='search.twitter.com')

# Fetch the information from the API
results = []
for i in range(2):
    i+=1
    response = t.search(q=SCREEN_NAME, result_type='recent', rpp=100, page=i)['results']
    results.extend(response)

# Create a document in the database for each item taken from the API
for tweet in results:
    id_str = tweet['id_str']
    twitter_id = tweet['from_user']
    tweetlink = "http://twitter.com/#!/%s/status/%s" % (twitter_id, id_str)
    created_at = datetime.strptime(tweet['created_at'], "%a, %d %b %Y %H:%M:%S +0000")
    date = created_at.date().strftime("%m/%d/%y")
    time = created_at.time().strftime("%H:%M:%S")
    text = tweet['text']
    identifier = {'id' : id_str}
    entries = {'id' : id_str, 'tweetlink' : tweetlink, 'date' : date, 'time' : time, 'text' : text, 'twitter_id':twitter_id }
    collection.update(identifier, entries, upsert = True)

这些脚本对我来说运行得很好，但我需要对多个Twitter账号运行相同的脚本。比如，我会复制同样的脚本，然后改动以下两行：

SCREEN_NAME = '@cocacola'

collection = db.cocacola_mentions

这样我就能获取到@twitterapi和@cocacola的提及。我考虑了很多如何把这段代码变成一个函数。最大的问题是如何改变集合的名字。比如，看看这个脚本：

#!/usr/local/bin/python

import twitter
import datetime
from datetime import date, timedelta, datetime
import pymongo
from pymongo import Connection

def getMentions(screen_name):

    # Connect to the database
    connection = Connection()
    db = connection.test    
    collection = db.screen_name  # Change the name of this database
    t = twitter.Twitter(domain='search.twitter.com')

    # Fetch the information from the API
    results = []
    for i in range(2):
        i+=1
        response = t.search(q=screen_name, result_type='recent', rpp=100, page=i)    ['results']
        results.extend(response)

    # Create a document in the database for each item taken from the API
    for tweet in results:
        id_str = tweet['id_str']
        twitter_id = tweet['from_user']
        tweetlink = "http://twitter.com/#!/%s/status/%s" % (twitter_id, id_str)
        created_at = datetime.strptime(tweet['created_at'], "%a, %d %b %Y %H:%M:%S +0000")
        date = created_at.date().strftime("%m/%d/%y")
        time = created_at.time().strftime("%H:%M:%S")
        text = tweet['text']
        identifier = {'id' : id_str}
        entries = {'id' : id_str, 'tweetlink' : tweetlink, 'date' : date, 'time' : time, 'text' : text, 'twitter_id':twitter_id }
        collection.update(identifier, entries, upsert = True)

getMentions("@twitterapi")
getMentions("@cocacola")

如果我使用上面的脚本，那么所有的数据都会存储在名为“screen_name”的集合里，但我希望它能存储在传入的屏幕名称对应的集合里。理想情况下，我希望@twitterapi的提及存储在“twitterapi_mentions”这个集合里，而@cocacola的提及存储在“cocacola_mentions”这个集合里。我觉得使用pymongo的Collection类可能是解决办法，我也看过文档，但就是无法让它正常工作。如果你有其他建议，能让我这个脚本更高效，我会非常感激。另外，如果我有任何错误，请多多包涵，正如我所说，我还是个新手。

pymongo mongodb 集合操作数据存储函数设计脚本优化 twitter api 用户时间线

如何为我的pymongo/twitter脚本创建函数？

2 个回答

撰写回答