Python如何将编码的一个热向量分配给字符串值

from sklearn.preprocessing import LabelBinarizer encoder = LabelBinarizer() transfomed_label = encoder.fit_transform(["CC","CD","DT","EX","FW","IN","JJ","JJR","JJS","LS","MD","NN","NNS","NNP","NNPS","PDT","POS","PRP","PRP$","RB","RBR","RBS","RP","SYM","TO","UH","VB","VBD","VBG","VBN","VBP","VBZ","WDT","WP","WP$","WRB"]) #print(transfomed_label) #START OF This is to get the mapping between the labels and its index #print(encoder.classes_) labels = encoder.classes_ mappings = {} for index, label in zip(range(len(labels)), labels): mappings[label]=index #print(mappings) #END OF This is to get the mapping between the labels and its index for item in transfomed_label: print (item)

2条回答

网友

1楼 · 编辑于 2024-05-16 23:23:52

首先让我们获取nltk包中的所有pos标记。（小心！！这取决于你使用的语言的宾州树库）。在

 pos_tags_list = ['CC', 'CD', 'EX', 'FW', 'IN', 'JJ', 'JJR', 'JJS', 'LS', 'MD', 'NN', 'NNS','NNP', 'NNPS', 'PDT', 'POS', 'PRP', 'PRP$', 'RB', 'RBR', 'RBS', 'RP', 'TO', 'UH', 'VB', 'VBD', 'VBG', 'VBN', 'VBP', 'VBZ', 'WDT', 'WP', 'WP$', 'WRB']

现在制作两个地图词典

^{pr2}$

现在，您可以从句子中提取所有标记，并使用sklearnone-hot encoder或pandasdummies或kerasto_catgeorical方法对标记进行编码。在

网友

2楼 · 编辑于 2024-05-16 23:23:52

如果我没听错，你想要这样的东西：

res = [transfomed_label[mappings[tagged[j][1]]] for j in xrange(len(tagged))]

相关问题更多 >

编程相关推荐

热门问题

热门文章