在没有任何身份验证和限制的情况下抓取twitter前端api。

tweetscrape的Python项目详细描述


高音刮刀

License: GPL v3Codacy BadgecodecovBuild StatusCurrent Release Versionpypi VersionTwitter

twitter的api使用起来很烦人,而且有很多限制——幸运的是,他们的前端(javascript)有自己的api,我对其进行了反向工程。没有API速率限制。没有限制。非常快。

您可以使用这个库轻松地获取任何用户的tweets文本。关注创建者在shirishkadam.com上的博客,了解最新进展。

安装

为python 3.5.x、3.6.x构建

$ pip install tweetscrape
$ python -m tweetscrape.twitter_scrape --help

开始

$ python -m tweetscrape.twitter_scrape -u "5hirish"  -n 60 -d "twitter.csv" -f "csv"
$ python -m tweetscrape.twitter_scrape --hashtag "#Python" -n 60 -d "twitter.csv" -f "csv"
$ python -m tweetscrape.twitter_scrape --all "Avengers" --mention "@Marvel" -n 20 -d "twitter.csv" -f "csv"
$ python -m tweetscrape.twitter_scrape --near "Brooklyn" -n 20 -d "twitter.csv" -f "csv"
$ python -m tweetscrape.twitter_scrape --from "@CNN" --since "2019-06-20" --until "2019-06-23" -n 20 -d "twitter.csv" -f "csv"

用法

fromtweetscrape.profile_tweetsimportTweetScrapperProfiletweet_scrapper=TweetScrapperProfile("5hirish",40,'twitter.csv','csv')tweet_count,tweet_id,tweet_time,dump_path=tweet_scrapper.get_profile_tweets()print("Extracted {0} tweets till {1} at {2}".format(tweet_count,tweet_time,dump_path))

在这里阅读有关tweetscrape用法的更多信息:USAGE.md

id,type,time,author,author_id,re_tweeter,associated_tweet,text,links,hashtags,mentions,reply_count,favorite_count,retweet_count
993872079274508289,tweet,1525792543000,5hirish,428808036,,993872079274508289,"Built @twitter #scrapper inspired by @kennethreitz similar project. Does a bunch of other cool stuff like extracting user tweets with all meta-data, hastags, images, likes, etc. extracting tweets based on keyword or hastag search #python @Github https://github.com/5hirish/tweet_scrapper …pic.twitter.com/bXdnrWXNwr","['https://t.co/ID5hJ6InIu', 'https://t.co/bXdnrWXNwr']","['#scrapper', '#python']","['@Twitter', '@kennethreitz', '@github']",1,14,7
1141791578970894338,tweet,1561059300000,gracecondition,127701253,5hirish,1141791578970894338,everyone else using word2vec:king – man + woman = queenme using word2vec:fish + music = bassfish + friend = chumfish + hair = mulletfish + struggle = flounderoink - pig + bro = wassupyeti – snow + economics = homo economicushttps://graceavery.com/word2vec-fish-music-bass/ …,['https://t.co/UAiViuEnM2'],[],[],17,939,227
1141849459342610437,tweet,1561073100000,Reuters,1652541,5hirish,1141849459342610437,WATCH: Elon Musk gives #E3 audience a preview of gaming in Tesla carspic.twitter.com/u7rVedhDyW,['https://t.co/u7rVedhDyW'],"['#E3', '#E3']",[],3,49,18
1141812196453699584,tweet,1561064216000,xamat,9316452,5hirish,1141812196453699584,"The annoying pop-up about cookies on websites is basically teaching me to click ""ok"" on anything that gets in my way asap, which seems very dangerous and exactly the opposite of what is intended.",[],[],[],1,23,3
1141897990627446784,tweet,1561084671000,data_mike_j,1053368990695706624,5hirish,1141897990627446784,Check out my newest blog post where I build a graph visualization of the #MuellerReport using @spacy_io and #Python including paragraph recommendation engine.https://minimizeuncertainty.com/post/graph-visualization-of-the-mueller-report-with-spacy-and-pyvis/ …,['https://t.co/Q5GGKqmbYv'],"['#MuellerReport', '#Python']",['@spacy_io'],0,31,13
1142137189775507456,tweet,1561141700000,5hirish,428808036,,1142137187783213056,"Share your weekend goals here, could be anything, like reading a book, writing a blog post, preparing your favorite dessert or anything that will give you a positive feeling of #accomplishment for the coming week. Let's check back on Monday.",[],['#accomplishment'],[],1,0,0
1142137187783213056,tweet,1561141700000,5hirish,428808036,,1142137187783213056,"Weekend Goal: Convert my Flask app into a RESTful Flask API app template with Unit tests, Travis CI and Swagger docs. #python Repo: https://github.com/5hirish/flask-restful-template … (Contributors Welcome!)#weekendgoal #accountability",['https://t.co/m7isdCd6cc'],"['#python', '#weekendgoal', '#accountability']",[],1,3,1
1141840676394434560,tweet,1561071006000,naval,745273,5hirish,1141840676394434560,"Lasting novels don’t come from literature departments. Successful businesses don’t come from business schools. Scientific revolutions don’t come from research universities.Get your education, then get moving. Find the loners tinkering at the edge.",[],[],[],149,10188,2562
1141740790542213121,tweet,1561047191000,WSJ,3108351,5hirish,1141740790542213121,"Slack shares open at $38.50 in their trading debut, above $26 reference price and giving the company a valuation of about $23.2 billionhttps://on.wsj.com/2Xmkitn",['https://t.co/uo7yCGqSmC'],[],[],3,76,48
1141511813709717504,tweet,1560992599000,quocleix,989251872107085824,5hirish,1141511813709717504,"XLNet: a new pretraining method for NLP that significantly improves upon BERT on 20 tasks (e.g., SQuAD, GLUE, RACE)arxiv: https://arxiv.org/abs/1906.08237 github (code + pretrained models): https://github.com/zihangdai/xlnet with Zhilin Yang, @ZihangDai, Yiming Yang, Jaime Carbonell, @rsalakhupic.twitter.com/JboOekUVPQ","['https://t.co/C1tFMwZvyW', 'https://t.co/kI4jsVzT1u', 'https://t.co/JboOekUVPQ']",[],"['@ZihangDai', '@rsalakhu']",21,1763,715
1141736965311569920,tweet,1561046279000,justinkan,28917111,5hirish,1141736965311569920,"One of the most important skills I’ve built is the ability to sit with discomfort. Being able to be uncomfortable (bored, on the receiving end of anger, in pain) and not needing to escape has changed my happiness and my life. It would have seemed impossible to me 12 months ago.",[],[],[],38,1578,242
....

要求

python包依赖项列在requirements.txt

功能

  • 提取包含所有元数据的用户推文
  • 从tweet中提取外部链接、标签和提及内容
  • 提取tweet的回复数、收藏夹数和转发数
  • 使用utf-8编码将数据导出到csv或json格式的文件中
  • 以递归和贪婪的方式发送垃圾消息
  • 支持代理请求、请求延迟
  • 提取用户信息,包括个人信息、位置和统计信息

待办事项

  • [X]从Twitter用户的个人资料中提取推文
  • [X]使用高级过滤器从Twitter搜索中提取推文
  • [X]将tweets导出到文件
  • [X]支持无限滚动
  • [X]从twitter线程中提取tweets,给定该线程
  • []将引用的tweet与tweet一起提取

贡献

请参阅contributing documentation了解一些入门技巧。

维护人员

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java如何在数据库中存储工作日数据以及开始时间和结束时间?   mysql Java JPA内部联接查询不起作用   java MQ消息在不同的客户端应用程序中显示不同   javascript如何在xPage上提高typeAhead性能?   java在尝试保存EditText时无法暂停   mongodb“未能使用带参数的构造函数NO_构造函数实例化java.util.List”,   java如何修改使IllegalArgumentException:列“\u id”不存在的游标   转换Android。网Uri到Java。伊奥。输入流   java NetBeans RCP居中弹出窗口无法按预期工作   java将sql开发人员与netbeans连接起来   带有xmldsig签名的java JAXB编组   java ORACLE JDBC批处理执行不会返回受影响行的实际计数   java无法理解如何在由swagger自动生成的jaxrs服务器中访问请求头   java如何处理漏洞CVE20181258,同时将Spring安全版本5与外部客户端(最新版本)一起使用?   JavaSpring数据JPA存储库多租户单模式技术