为化学fingerprin选择n簇

smiles = [] molFin = [] fps = [] np_fps = [] #mol["idx"] contain the name of the molecules for x in mol["idx"]: res = cs.search(x) #get the smiles code of a molecule smi = res[0].smiles #get the fingerprint of the molecule fp = Chem.MolFromSmiles(str(smi)) fp = FingerprintMols.FingerprintMol(fp) fps.append(fp) #compute the similarity score (end up with a cross molecule matrix where each occurence correspond to the taminoto score) dists = [] nfps = len(fps) for i in range(0,nfps): sims = DataStructs.BulkTanimotoSimilarity(fps[i],fps) dists.append(sims) #store the value on a data frame and apply kmean mol_dist = pd.DataFrame(dists) k_means = cluster.KMeans(n_clusters=13) k1 = k_means.fit_predict(mol_dist) mol["cluster"] = k1 #get the result final = mol[["idx","cluster"]]

1条回答

网友

1楼 · 发布于 2024-05-20 22:35:06

我认为集群中的问题是如何选择合适的k，您的问题可以通过以下方式解决：

确定适当的k-簇数。你可以使用一些方法，如肘，。。。请参阅下面的链接https://datasciencelab.wordpress.com/2013/12/27/finding-the-k-in-k-means-clustering
在得到k-数之后，选择适当的特征以及获得的k-聚类，然后对数据集进行聚类和评估。

致以崇高的敬意！在

相关问题更多 >

编程相关推荐

热门问题

热门文章