<p>正如@engineercoding在对@rmalouf答案的评论中指出的那样,与WordNet相比,Treebank中有更多的标记,请参见<a href="https://web.stanford.edu/~jurafsky/slp3/10.pdf" rel="nofollow noreferrer">here for details</a>。</p>
<p>以下映射覆盖尽可能多的基,它还显式地定义了在WordNet中没有匹配项的POS标记:</p>
<pre><code># Create a map between Treebank and WordNet
from nltk.corpus import wordnet as wn
# WordNet POS tags are: NOUN = 'n', ADJ = 's', VERB = 'v', ADV = 'r', ADJ_SAT = 'a'
# Descriptions (c) https://web.stanford.edu/~jurafsky/slp3/10.pdf
tag_map = {
'CC':None, # coordin. conjunction (and, but, or)
'CD':wn.NOUN, # cardinal number (one, two)
'DT':None, # determiner (a, the)
'EX':wn.ADV, # existential ‘there’ (there)
'FW':None, # foreign word (mea culpa)
'IN':wn.ADV, # preposition/sub-conj (of, in, by)
'JJ':[wn.ADJ, wn.ADJ_SAT], # adjective (yellow)
'JJR':[wn.ADJ, wn.ADJ_SAT], # adj., comparative (bigger)
'JJS':[wn.ADJ, wn.ADJ_SAT], # adj., superlative (wildest)
'LS':None, # list item marker (1, 2, One)
'MD':None, # modal (can, should)
'NN':wn.NOUN, # noun, sing. or mass (llama)
'NNS':wn.NOUN, # noun, plural (llamas)
'NNP':wn.NOUN, # proper noun, sing. (IBM)
'NNPS':wn.NOUN, # proper noun, plural (Carolinas)
'PDT':[wn.ADJ, wn.ADJ_SAT], # predeterminer (all, both)
'POS':None, # possessive ending (’s )
'PRP':None, # personal pronoun (I, you, he)
'PRP$':None, # possessive pronoun (your, one’s)
'RB':wn.ADV, # adverb (quickly, never)
'RBR':wn.ADV, # adverb, comparative (faster)
'RBS':wn.ADV, # adverb, superlative (fastest)
'RP':[wn.ADJ, wn.ADJ_SAT], # particle (up, off)
'SYM':None, # symbol (+,%, &)
'TO':None, # “to” (to)
'UH':None, # interjection (ah, oops)
'VB':wn.VERB, # verb base form (eat)
'VBD':wn.VERB, # verb past tense (ate)
'VBG':wn.VERB, # verb gerund (eating)
'VBN':wn.VERB, # verb past participle (eaten)
'VBP':wn.VERB, # verb non-3sg pres (eat)
'VBZ':wn.VERB, # verb 3sg pres (eats)
'WDT':None, # wh-determiner (which, that)
'WP':None, # wh-pronoun (what, who)
'WP$':None, # possessive (wh- whose)
'WRB':None, # wh-adverb (how, where)
'$':None, # dollar sign ($)
'#':None, # pound sign (#)
'“':None, # left quote (‘ or “)
'”':None, # right quote (’ or ”)
'(':None, # left parenthesis ([, (, {, <)
')':None, # right parenthesis (], ), }, >)
',':None, # comma (,)
'.':None, # sentence-final punc (. ! ?)
':':None # mid-sentence punc (: ; ... – -)
}
</code></pre>