文本可视化python包

titulus的Python项目详细描述


提图斯

文本可视化python包

fromtitulusimportcolor,print_test="Nous sommes le 12/24/2018 aujourd'hui. Mon numéro de tel est le (301)227-1340"tokens=test.split()weights=np.random.randint(low=0,high=10,size=len(tokens))print_(' '.join(color(tokens,weights,n=10)))

alt text

fromsklearn.datasetsimportfetch_20newsgroupsfromsklearn.feature_extraction.textimportTfidfVectorizerfromsklearn.linear_modelimportSGDClassifierfromsklearn.pipelineimportPipelinecategories=['alt.atheism','talk.religion.misc']newsgroups_train=fetch_20newsgroups(subset='train',categories=categories)newsgroups_test=fetch_20newsgroups(subset='test',categories=categories)X_train,X_test=newsgroups_train.data,newsgroups_test.datay_train,y_test=newsgroups_train.target,newsgroups_test.target
idx=np.random.randint(len(X_vec_list))tokens=tokenizer(X_train[idx])token_idx=[voc.index(t)iftinvocelse-1fortintokens]weights=[X_vec_arr[idx,:][i]ifi>0else0foriintoken_idx]print_(' '.join(color(tokens,weights,start_hex="#FEFEFE",finish_hex="#00a4e4",n=20)))

alt text

text_clf=Pipeline([('vect',vectorizer),('clf',SGDClassifier(loss='hinge',penalty='l2',tol=0.2,alpha=1e-3,max_iter=15,random_state=42)),])_=text_clf.fit(X_train,y_train)X_vec=vectorizer.transform(X_train)X_vec_arr=X_vec.toarray()X_vec_list=[list(x)forxinX_vec_arr]voc=vectorizer.get_feature_names()idx=np.random.randint(len(X_vec_list))tokens=tokenizer(X_train[idx])token_idx=[voc.index(t)iftinvocelse-1fortintokens]weights_=np.multiply(X_vec_arr[idx,:],text_clf.named_steps['clf'].coef_[0,:])weights=[weights_[i]ifi>0else0foriintoken_idx]print_(' '.join(color(tokens,weights,start_hex="#34BF49",finish_hex="#BE0027",middle_hex="#FEFEFE",n=20)))

alt text

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java Eclipse内存分析器(MAT):不显示当前正在运行的进程   java Apache Velocity:转义字符不能作为关联数组键用于PHP   不截断零的java格式十进制输出   在另一个类文件中调用时返回空值的java getter   java集合获取连接   java解析json使用Gson登录系统应用程序强制关闭   java DelferredResult带有两个请求的ajax请求   java可降低功耗,同时应使用无线   java BoxLayout无法共享错误?   java如何使用计时器制作闹钟   java使用OAuth2保护RESTWeb服务:一般原则   java在一个jframe上显示多个图像和按钮