如何在Python循环中递增元组值并搜索字符串

Question

我有一段代码。

 arfffile = []

inputed = raw_input("Enter Evaluation for name including file extension...")

reader = open(inputed, 'r')

verses = []

for line in reader:
    verses.append(line)

for line in verses:
    if line.split('@') == "@":
        verses.pop(line)


numclusters = int(raw_input("Enter the number of clusters"))

clusters = {}

for i in range(1,numclusters+1):
    clusters["cluster"+str(i)] = 0



print clusters
 # If verse belongs to a cluster, increment the cluster count by one in the dictionary value.
for verse in verses:
    for k in clusters:
        if k in verse:
            clusters[k] += 1
        else:
            print "not in"

print clusters

yeslist = []

for verse in verses:
    for k in clusters:
        if k not in yeslist:
            yeslist.append((k,0))
        elif k in yeslist:
            print "already in" + k


for verse in verses:
    for k in clusters:
        if k in verse and "Yes" in verse:
            yeslist.append(yeslist.index(k), +1)


    # iterate through dictionary and iterate through the lines
    # need to read in file line by line, 



    # if "yes" and cluster x increment cluster 
    # need to work out percentage of possitive verses in each cluster.

这里有一个arff文件的例子：

@relation tester999.arff_clustered

@attribute Instance_number numeric
@attribute allah numeric
@attribute day numeric
@attribute lord numeric
@attribute people numeric
@attribute earth numeric
@attribute men numeric
@attribute truth numeric
@attribute verily numeric
@attribute chapter numeric
@attribute verse numeric
@attribute CLASS {Yes,No}
@attribute Cluster {cluster1,cluster2,cluster3}

@data
0,1,0,0,0,0,0,0,0,1,1,No,cluster3
1,1,0,0,0,0,0,0,0,1,2,No,cluster3
2,0,0,0,0,0,0,0,0,1,3,No,cluster3
3,0,1,0,0,0,1,0,0,1,4,No,cluster3
4,0,0,0,0,0,0,0,0,1,5,No,cluster3
5,0,0,0,0,0,0,0,0,1,6,No,cluster3
6,0,0,0,0,0,0,0,0,1,7,No,cluster3
7,0,0,0,0,0,0,0,0,2,1,No,cluster3
8,1,0,0,0,0,0,0,0,2,2,No,cluster3
9,0,0,0,0,0,0,0,0,2,3,No,cluster3
10,0,0,0,0,0,0,0,0,2,4,No,cluster3
11,0,0,1,0,0,0,0,0,2,5,No,cluster2

目前这个程序会读取数据行，比如：

0,1,0,0,0,0,0,0,0,1,1,No,cluster3

我创建了一个字典，用来检测数据文件中有多少个聚类。在这个例子中，有3个聚类，分别是cluster1、cluster2和cluster3。代码会把每个聚类作为一个键值对，存储为字符串在字典“clusters”中。
然后我会遍历所有的行，统计每一行属于哪个聚类。

我的下一步是尝试统计每个聚类中，包含“Yes”的行出现的次数。比如说，如果数据中的每一行都有10次“yes”，那么代码应该能够统计出这个次数。

到目前为止，我写的代码在这里：

for verse in verses:
        for k in clusters:
            if k in verse and "Yes" in verse:
                yeslist.append(yeslist.index(k), +1)

我基本上是在创建一个叫“yeslist”的元组列表，里面的值像这样 [ (cluster1, 0), (cluster2, 3)]。

所以对于每一行（用字符串表示），检查里面是否有“Yes”，如果有，就看看它属于哪个聚类，然后把那个元组的值加一。

我在想这个逻辑的时候遇到了一些困难……有人能帮帮我吗？

谢谢。

元组字符串处理字典数据统计循环聚类计数 arff文件

如何在Python循环中递增元组值并搜索字符串

1 个回答

撰写回答