我有这个密码。在
arfffile = []
inputed = raw_input("Enter Evaluation for name including file extension...")
reader = open(inputed, 'r')
verses = []
for line in reader:
verses.append(line)
for line in verses:
if line.split('@') == "@":
verses.pop(line)
numclusters = int(raw_input("Enter the number of clusters"))
clusters = {}
for i in range(1,numclusters+1):
clusters["cluster"+str(i)] = 0
print clusters
# If verse belongs to a cluster, increment the cluster count by one in the dictionary value.
for verse in verses:
for k in clusters:
if k in verse:
clusters[k] += 1
else:
print "not in"
print clusters
yeslist = []
for verse in verses:
for k in clusters:
if k not in yeslist:
yeslist.append((k,0))
elif k in yeslist:
print "already in" + k
for verse in verses:
for k in clusters:
if k in verse and "Yes" in verse:
yeslist.append(yeslist.index(k), +1)
# iterate through dictionary and iterate through the lines
# need to read in file line by line,
# if "yes" and cluster x increment cluster
# need to work out percentage of possitive verses in each cluster.
arff文件的一个例子是
^{pr2}$当它站着时,程序读入数据线如
0,1,0,0,0,0,0,0,0,1,1,No,cluster3
我创建了一个字典来检测数据文件中有多少个簇。在这个例子中有3个。cluster1 cluster2和cluster3。然后,代码将每个簇作为键值附加在字典“clusters”中以字符串表示
然后我遍历所有的诗句,并计算每一行,看看它属于哪个集群。在
我的下一步是计算每个集群中出现“Yes”的行的次数。所以假设数据中每一行的字符串中有10行带有“yes”,代码应该能够计算出发生这种情况的次数。在
到目前为止,我做的代码在这里
for verse in verses:
for k in clusters:
if k in verse and "Yes" in verse:
yeslist.append(yeslist.index(k), +1)
我正在创建一个名为“yeslist”的元组列表,其值如下[(cluster1,0),(cluster2,3)]
所以对于每一行(用字符串表示),检查其中是否有“Yes”,如果有检查它属于哪个集群,那么将元组值增加一。在
我想不出该怎么做。。。有人能帮忙吗?在
谢谢。在
你得到了两本字典:
^{pr2}$如果您真的需要元组列表:
相关问题 更多 >
编程相关推荐