如何在tweepy modu中添加位置过滤器

import sys import tweepy consumer_key="" consumer_secret="" access_key = "" access_secret = "" auth = tweepy.OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_key, access_secret) api = tweepy.API(auth) class CustomStreamListener(tweepy.StreamListener): def on_status(self, status): print status.text def on_error(self, status_code): print >> sys.stderr, 'Encountered error with status code:', status_code return True # Don't kill the stream def on_timeout(self): print >> sys.stderr, 'Timeout...' return True # Don't kill the stream sapi = tweepy.streaming.Stream(auth, CustomStreamListener()) sapi.filter(track=['manchester united'])

3条回答

网友

1楼 · 编辑于 2024-06-16 12:42:51

胡安给出了正确的答案。我只是用这个过滤德国：

# Bounding boxes for geolocations
# Online-Tool to create boxes (c+p as raw CSV): http://boundingbox.klokantech.com/
GEOBOX_WORLD = [-180,-90,180,90]
GEOBOX_GERMANY = [5.0770049095, 47.2982950435, 15.0403900146, 54.9039819757]

stream.filter(locations=GEOBOX_GERMANY)

这是一个相当粗糙的盒子，包括一些其他国家的部分。如果你想要更细的颗粒，你可以组合多个盒子来填写你需要的位置。

不过，需要注意的是，如果使用geotags进行过滤，则会大大限制tweets的数量。这是来自我的测试数据库的大约500万条Tweets（查询应该返回实际包含地理位置的Tweets的百分比年龄）：

> db.tweets.find({coordinates:{$ne:null}}).count() / db.tweets.count()
0.016668392651547598

因此，在我的1%流样本中，只有1.67%包含地理标签。不过，还有其他方法可以确定用户的位置： http://arxiv.org/ftp/arxiv/papers/1403/1403.2345.pdf

网友

2楼 · 编辑于 2024-06-16 12:42:51

sapi.filter（track=['manchester united'，locations=['GPS Coordinates']）

网友

3楼 · 编辑于 2024-06-16 12:42:51

流式处理API不允许同时按位置和关键字筛选。

Bounding boxes do not act as filters for other filter parameters. For example track=twitter&locations=-122.75,36.8,-121.75,37.8 would match any tweets containing the term Twitter (even non-geo tweets) OR coming from the San Francisco area.

来源：https://dev.twitter.com/docs/streaming-apis/parameters#locations

你所能做的就是向流媒体API请求关键字或定位的tweets，然后通过查看每个tweet来过滤应用程序中的结果流。

如果您修改代码如下，您将捕获英国的tweets，那么这些tweets将被过滤，只显示包含“manchester United”的tweets

import sys
import tweepy

consumer_key=""
consumer_secret=""
access_key=""
access_secret=""

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)


class CustomStreamListener(tweepy.StreamListener):
    def on_status(self, status):
        if 'manchester united' in status.text.lower():
            print status.text

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

sapi = tweepy.streaming.Stream(auth, CustomStreamListener())    
sapi.filter(locations=[-6.38,49.87,1.77,55.81])

相关问题更多 >

编程相关推荐

热门问题

热门文章