就其性质而言,K-Menas是相互排斥的。我在网上找到了一些对文本进行聚类的代码。我承认,这有点不正统,但也有点酷。有没有办法让下面的示例代码将文本分配给集群,并确保每个集群中的文本是互斥的
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score
documents = ["This little kitty came to play when I was eating at a restaurant.",
"Merley has the best squooshy kitten belly.",
"Google Translate app is incredible.",
"If you open 100 tab in google you get a smiley face.",
"Best cat photo I've ever taken.",
"Climbing ninja cat.",
"Impressed with google map feedback.",
"Key promoter extension for Google Chrome."]
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(documents)
true_k = 8
model = KMeans(n_clusters=true_k, init='k-means++', max_iter=1000, n_init=1)
model.fit(X)
print("Top terms per cluster:")
order_centroids = model.cluster_centers_.argsort()[:, ::-1]
terms = vectorizer.get_feature_names()
for i in range(true_k):
print("Cluster %d:" % i),
for ind in order_centroids[i, :10]:
print(' %s' % terms[ind]),
print
print("\n")
print("Prediction")
Y = vectorizer.transform(["chrome browser to open."])
prediction = model.predict(Y)
print(prediction)
Y = vectorizer.transform(["My cat is hungry."])
prediction = model.predict(Y)
print(prediction)
结果:
Top terms per cluster:
Cluster 0:
translate
app
incredible
google
eating
impressed
feedback
face
extension
ve
Cluster 1:
kitten
belly
squooshy
merley
best
eating
google
feedback
face
extension
Cluster 2:
eating
kitty
little
came
restaurant
play
ve
feedback
face
extension
Cluster 3:
ve
taken
photo
best
cat
eating
google
feedback
face
extension
Cluster 4:
impressed
map
feedback
google
ve
eating
face
extension
climbing
key
Cluster 5:
100
open
tab
smiley
face
google
feedback
extension
eating
climbing
Cluster 6:
chrome
extension
promoter
key
google
eating
impressed
feedback
face
ve
Cluster 7:
climbing
ninja
cat
eating
impressed
google
feedback
face
extension
ve
我试过这个:
documents = list(set(documents))
仍然在多个集群中显示相同的文本项。我可能错过了一些简单的东西,但我已经工作了一上午(是的,在一个周六),现在很累,所以我只是没有看到解决办法
目前没有回答
相关问题 更多 >
编程相关推荐