维德情绪分析。vader(valence-aware dictionary and mootion reasoner)是一个基于词汇和规则的情感分析工具,它特别适合于社交媒体中表达的情感,并且在其他领域的文本中也有很好的应用。

vader-sentiment的Python项目详细描述


==================================== 维德情绪分析

vader(valence-aware dictionary and mootion reasoner)是一个词汇和基于规则的情感分析工具,它特别适合于社交媒体中表达的情感。它是根据[mit license]<;http://choosealicense.com/>;>完全开放的(我们真诚地感谢您的所有贡献,并随时接受大多数贡献,但请不要追究我们的责任)。

  • 功能和更新
  • 简介
  • 引文信息
  • 安装
  • 资源和数据集说明
  • python代码示例
  • 关于评分
  • 端口到其他编程语言

功能和更新

非常感谢乔治·贝里、伊万·克莱恩、皮耶尔保罗·潘通对维德的贡献。新的更新包括以下功能:

重构Python3兼容性,改进模块性,并将其并入[nltk]<;http://www.nltk.org/\u modules/nltk/mousion/vader.html>;\u…非常感谢Ewan&;Pierpaolo。 γ。重组以提高速度/性能,将时间复杂度从o(n^4)降低到o(n)…多亏了乔治。 γ。简化了pip安装,更好地支持vadermousion模块和组件导入。(对vader_lexicon.txt文件的依赖现在使用自动文件位置发现,因此不需要在代码中手动指定其位置,也不需要将文件复制到执行代码的目录中。) γ。在vader_momentation.py\u main\u中有更完整的演示。演示有:

* examples of typical use cases for sentiment analysis, including proper handling of sentences with:

	- typical negations (e.g., "*not* good")
	- use of contractions as negations (e.g., "*wasn't* very good")
	- conventional use of **punctuation** to signal increased sentiment intensity (e.g., "Good!!!")
	- conventional use of **word-shape** to signal emphasis (e.g., using ALL CAPS for words/phrases)
	- using **degree modifiers** to alter sentiment intensity (e.g., intensity *boosters* such as "very" and intensity *dampeners* such as "kind of")
	- understanding many **sentiment-laden slang** words (e.g., 'sux')
	- understanding many sentiment-laden **slang words as modifiers** such as 'uber' or 'friggin' or 'kinda'
	- understanding many sentiment-laden **emoticons** such as :) and :D
	- translating **utf-8 encoded emojis** such as ? and ? and ?
	- understanding sentiment-laden **initialisms and acronyms** (for example: 'lol')

* more examples of **tricky sentences** that confuse other sentiment analysis tools
* example for how VADER can work in conjunction with NLTK to do **sentiment analysis on longer texts**...i.e., decomposing paragraphs, articles/reports/publications, or novels into sentence-level analyses
* examples of a concept for assessing the sentiment of images, video, or other tagged **multimedia content**
* if you have access to the Internet, the demo has an example of how VADER can work with analyzing sentiment of **texts in other languages** (non-English text sentences).

==================================== 简介

本自述文件描述了论文的数据集:

|  **VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text**
|  (by C.J. Hutto and Eric Gilbert) 
|  Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014. 

|如有疑问,请联系: | C.J.赫托 |乔治亚理工学院,亚特兰大,佐治亚州30032
| cjhutto[在]gatech[点]edu

引文信息

如果你在研究中使用数据集或任何维德情感分析工具(维德情感词典或基于规则的情感分析引擎的python代码),请引用上述论文。例如:

Hutto,C.J.和Gilbert,E.E.(2014年)。维德:一个基于简约规则的社交媒体文本情感分析模型。第八届网络日志和社交媒体国际会议(ICWSM-14)。密歇根州安娜堡,2014年6月。

==================================== 安装

有几种方法可以安装和使用维德情感:

最简单的方法是使用命令行从[pypi]<;https://pypi.python.org/pypi/vadermousion>;>使用pip进行安装,例如, >;pip安装vadermousion γ。或者,您可能已经有了vader,只需要升级到最新版本,例如, >;pip安装--升级vadermousion γ。您还可以克隆此[github repository]<;https://github.com/holek/vader_mousion>;。_ γ。您可以下载并解压缩[完整的主分支zip文件]<;https://github.com/holek/vader_momentation/archive/master.zip>;\u

除了vader情绪分析python模块之外,选项3或4还将下载所有附加资源和数据集(如下所述)。

==================================== 资源和数据集说明

这里的包包括主要资源(项目1-3)以及附加的数据集和测试资源(项目4-12):

vader_icwsm2014_final.pdf 数据集的原始论文,请参阅引文信息(见上文)。

维德词典.txt 格式:文件以标记、平均情绪评分、标准差和原始人类情绪评分分隔 注:电流算法立即使用前两个元素(标记和平均价)。最后两个元素(标准差和原始评级)提供了严格性。例如,如果你想遵循我们在研究中使用的同样严格的过程,你应该找到10个独立的人来评估/评价你想添加到词典中的每个新标记,确保标准差不超过2.5,并取价格的平均值。这将保持文件的一致性。

DESCRIPTION: 
Empirically validated by multiple independent human judges, VADER incorporates a "gold-standard" sentiment lexicon that is especially attuned to microblog-like contexts.

The VADER sentiment lexicon is sensitive both the **polarity** and the **intensity** of sentiments expressed in social media contexts, and is also generally applicable to sentiment analysis in other domains.

Sentiment ratings from 10 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability). Over 9,000 token features were rated on a scale from "[–4] Extremely Negative" to "[4] Extremely Positive", with allowance for "[0] Neutral (or Neither, N/A)".  We kept every lexical feature that had a non-zero mean rating, and whose standard deviation was less than 2.5 as determined by the aggregate of those ten independent raters.  This left us with just over 7,500 lexical features with validated valence scores that indicated both the sentiment polarity (positive/negative), and the sentiment intensity on a scale from –4 to +4. For example, the word "okay" has a positive valence of 0.9, "good" is 1.9, and "great" is 3.1, whereas "horrible" is –2.5, the frowning emoticon :( is –2.2, and "sucks" and it's slang derivative "sux" are both –1.5.

Manually creating (much less, validating) a comprehensive sentiment lexicon is a labor intensive and sometimes error prone process, so it is no wonder that many opinion mining researchers and practitioners rely so heavily on existing lexicons as primary resources. We are pleased to offer ours as a new resource. We began by constructing a list inspired by examining existing well-established sentiment word-banks (LIWC, ANEW, and GI). To this, we next incorporate numerous lexical features common to sentiment expression in microblogs, including:

* a full list of Western-style emoticons, for example, :-) denotes a smiley face and generally indicates positive sentiment
* sentiment-related acronyms and initialisms (e.g., LOL and WTF are both examples of sentiment-laden initialisms)
* commonly used slang with sentiment value (e.g., nah, meh and giggly). 

We empirically confirmed the general applicability of each feature candidate to sentiment expressions using a wisdom-of-the-crowd (WotC) approach (Surowiecki, 2004) to acquire a valid point estimate for the sentiment valence (polarity & intensity) of each context-free candidate feature. 

虚情假意.py 基于规则的情绪分析引擎的python代码。实现文中所述的语法和句法规则,并结合经验推导出的量化方法,研究每个规则对句子级文本中情感感知强度的影响。重要的是,这些启发式方法超出了通常在典型的单词包模型中捕获的范围。它们结合了词与词之间对词序敏感的关系。例如,程度修饰语(也称为加强词、加强词或程度副词)通过增加或减少强度来影响情绪强度。考虑以下示例:

(a) "The service here is extremely good" 
(b) "The service here is good" 
(c) "The service here is marginally good" 

From Table 3 in the paper, we see that for 95% of the data, using a degree modifier increases the positive sentiment intensity of example (a) by 0.227 to 0.36, with a mean difference of 0.293 on a rating scale from 1 to 4. Likewise, example (c) reduces the perceived sentiment intensity by 0.293, on average.

tweets_groundtruth.txt 格式:文件用id、mean-moutation-rating和tweet-text分隔

DESCRIPTION: includes "tweet-like" text as inspired by 4,000 tweets pulled from Twitter’s public timeline, plus 200 completely contrived tweet-like texts intended to specifically test syntactical and grammatical conventions of conveying differences in sentiment intensity. The "tweet-like" texts incorporate a fictitious username (@anonymous) in places where a username might typically appear, along with a fake URL (http://url_removed) in places where a URL might typically appear, as inspired by the original tweets. The ID and MEAN-SENTIMENT-RATING correspond to the raw sentiment rating data provided in 'tweets_anonDataRatings.txt' (described below).

tweets_anodatarantings.txt 格式:文件用id、mean-mootion-rating、standard deviation和raw-mootion-ratings制表符分隔

DESCRIPTION: Sentiment ratings from a minimum of 20 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability).

nyteditorialsnippets\u groundtruth.txt 格式:文件用id、mean-mootion-rating和文本片段分隔

DESCRIPTION: includes 5,190 sentence-level snippets from 500 New York Times opinion news editorials/articles; we used the NLTK tokenizer to segment the articles into sentence phrases, and added sentiment intensity ratings. The ID and MEAN-SENTIMENT-RATING correspond to the raw sentiment rating data provided in 'nytEditorialSnippets_anonDataRatings.txt' (described below).

nyteditorialsnippets\u anodatarantings.txt 格式:文件用id、mean-mootion-rating、standard deviation和raw-mootion-ratings制表符分隔

DESCRIPTION: Sentiment ratings from a minimum of 20 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability).

moviereviewsnippets_groundtruth.txt电影 格式:文件用id、mean-mootion-rating和文本片段分隔

DESCRIPTION: includes 10,605 sentence-level snippets from rotten.tomatoes.com. The snippets were derived from an original set of 2000 movie reviews (1000 positive and 1000 negative) in Pang & Lee (2004); we used the NLTK tokenizer to segment the reviews into sentence phrases, and added sentiment intensity ratings. The ID and MEAN-SENTIMENT-RATING correspond to the raw sentiment rating data provided in 'movieReviewSnippets_anonDataRatings.txt' (described below).

moviereviewsnippets_anodatarantings.txt 格式:文件用id、mean-mootion-rating、standard deviation和raw-mootion-ratings制表符分隔

DESCRIPTION: Sentiment ratings from a minimum of 20 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability).

amazonreviewsnippets_groundtruth.txt 格式:文件用id、mean-mootion-rating和文本片段分隔

* examples of typical use cases for sentiment analysis, including proper handling of sentences with:

	- typical negations (e.g., "*not* good")
	- use of contractions as negations (e.g., "*wasn't* very good")
	- conventional use of **punctuation** to signal increased sentiment intensity (e.g., "Good!!!")
	- conventional use of **word-shape** to signal emphasis (e.g., using ALL CAPS for words/phrases)
	- using **degree modifiers** to alter sentiment intensity (e.g., intensity *boosters* such as "very" and intensity *dampeners* such as "kind of")
	- understanding many **sentiment-laden slang** words (e.g., 'sux')
	- understanding many sentiment-laden **slang words as modifiers** such as 'uber' or 'friggin' or 'kinda'
	- understanding many sentiment-laden **emoticons** such as :) and :D
	- translating **utf-8 encoded emojis** such as ? and ? and ?
	- understanding sentiment-laden **initialisms and acronyms** (for example: 'lol')

* more examples of **tricky sentences** that confuse other sentiment analysis tools
* example for how VADER can work in conjunction with NLTK to do **sentiment analysis on longer texts**...i.e., decomposing paragraphs, articles/reports/publications, or novels into sentence-level analyses
* examples of a concept for assessing the sentiment of images, video, or other tagged **multimedia content**
* if you have access to the Internet, the demo has an example of how VADER can work with analyzing sentiment of **texts in other languages** (non-English text sentences).
1

amazonreviewsnippets_anodatarantings.txt网站 格式:文件用id、mean-mootion-rating、standard deviation和raw-mootion-ratings制表符分隔

DESCRIPTION: Sentiment ratings from a minimum of 20 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability).

包含更多论文/研究的综合社交网站: 公司社交

==================================== python代码示例

对于更完整的演示,请将终端指向vader的安装目录(例如,如果使用pip安装,则可能是\python3x\lib\s it e packages\vadermousion),然后运行python vadermousion.py

演示中有更多复杂句子的例子,这些句子会混淆其他情绪分析工具。它还演示了维德如何与nltk一起对较长的文本进行情感分析,即将段落、文章/报告/出版物或小说分解为句子级分析。它还演示了评估图像、视频或其他标记多媒体内容的情感的概念。

如果您可以访问Internet,演示还将展示维德如何分析非英语文本句子的情感。

* examples of typical use cases for sentiment analysis, including proper handling of sentences with:

	- typical negations (e.g., "*not* good")
	- use of contractions as negations (e.g., "*wasn't* very good")
	- conventional use of **punctuation** to signal increased sentiment intensity (e.g., "Good!!!")
	- conventional use of **word-shape** to signal emphasis (e.g., using ALL CAPS for words/phrases)
	- using **degree modifiers** to alter sentiment intensity (e.g., intensity *boosters* such as "very" and intensity *dampeners* such as "kind of")
	- understanding many **sentiment-laden slang** words (e.g., 'sux')
	- understanding many sentiment-laden **slang words as modifiers** such as 'uber' or 'friggin' or 'kinda'
	- understanding many sentiment-laden **emoticons** such as :) and :D
	- translating **utf-8 encoded emojis** such as ? and ? and ?
	- understanding sentiment-laden **initialisms and acronyms** (for example: 'lol')

* more examples of **tricky sentences** that confuse other sentiment analysis tools
* example for how VADER can work in conjunction with NLTK to do **sentiment analysis on longer texts**...i.e., decomposing paragraphs, articles/reports/publications, or novels into sentence-level analyses
* examples of a concept for assessing the sentiment of images, video, or other tagged **multimedia content**
* if you have access to the Internet, the demo has an example of how VADER can work with analyzing sentiment of **texts in other languages** (non-English text sentences).
3

要获得更完整的演示,请转到install目录并运行python vadermousion.py。(确保已设置为在终端或IDE中处理UTF-8编码。)

==================================== 上述示例代码的输出

* examples of typical use cases for sentiment analysis, including proper handling of sentences with:

	- typical negations (e.g., "*not* good")
	- use of contractions as negations (e.g., "*wasn't* very good")
	- conventional use of **punctuation** to signal increased sentiment intensity (e.g., "Good!!!")
	- conventional use of **word-shape** to signal emphasis (e.g., using ALL CAPS for words/phrases)
	- using **degree modifiers** to alter sentiment intensity (e.g., intensity *boosters* such as "very" and intensity *dampeners* such as "kind of")
	- understanding many **sentiment-laden slang** words (e.g., 'sux')
	- understanding many sentiment-laden **slang words as modifiers** such as 'uber' or 'friggin' or 'kinda'
	- understanding many sentiment-laden **emoticons** such as :) and :D
	- translating **utf-8 encoded emojis** such as ? and ? and ?
	- understanding sentiment-laden **initialisms and acronyms** (for example: 'lol')

* more examples of **tricky sentences** that confuse other sentiment analysis tools
* example for how VADER can work in conjunction with NLTK to do **sentiment analysis on longer texts**...i.e., decomposing paragraphs, articles/reports/publications, or novels into sentence-level analyses
* examples of a concept for assessing the sentiment of images, video, or other tagged **multimedia content**
* if you have access to the Internet, the demo has an example of how VADER can work with analyzing sentiment of **texts in other languages** (non-English text sentences).
4

==================================== 关于评分

  • 复合词的分数是通过对词典中每个单词的价分数求和来计算的,根据规则进行调整,然后标准化为-1之间(最极端的负数)和+1(最极端的积极)。这是最有用的指标,如果你想要一个单一的一维情绪测量给定的句子。称之为"标准化加权综合得分"是准确的。

    对于那些希望设置标准化阈值来将句子分为阳性、中性或阴性的研究人员来说,这也很有用。
    典型的阈值(在本页引用的文献中使用)是:

积极情绪复合得分>;=0.05 γ。中性情绪:(复合得分>;-0.05)和(复合得分<;0.05) γ。负面情绪复合得分<;=-0.05

  • posneuneg分数是属于每一类别的文本比例的比率(因此这些加起来应该是1……或接近它的浮动操作)。如果你想对一个给定的句子进行多维情绪测量,这些是最有用的指标。

==================================== 其他编程语言的端口

请随时告诉我有关维达情绪到其他编程语言的端口。到目前为止,我知道这些有用的端口:

爪哇 vadermomentjava<;https://github.com/apanimesh061/vadermomentjava>;,作者:apanimesh061

JavaScript vadermousion js<;https://github.com/vadermousion/vadermousion js>;,作者:nimaeskandary

PHP php vadermousion<;https://github.com/abusby/php vadermousion>;《阿布斯比》

斯卡拉 情操<;https://github.com/ziyasal/mousion>;\by ziyasal

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
使用Java将JSON转换为哈希映射   java如何通过使用MapStruct从源对象复制值来填充目标对象内部的映射?   注入泛型类型为@InjectGuice的类时发生java错误   字符串如何在JavaIDE中导入基本符号   在Java中将JSON转换为List<List<String>>   java Java9 HttpClient SSLHandshakeException   java jOOQ代码生成器   java如何调整图标(图像按钮),使其不会放大?   java我可以在使用Eclipse时安装2个或更多Android SDK吗   sqlite Java实现DAO   如何解决此错误?“java.lang.NoSuchFieldError:org.apache.http.conn.ssl.SSLConnectionSocketFactory上的实例。<clinit>。”   java Maven:将基于OSP的zip文件解压缩到WEBINF/lib   java如何在每次调用函数时获取当前gps位置?