迭代每一行并增加csv fi中与特定研究者相关的术语的计数

[Table 1] Terms Researcher 1.Asthma Dr. Roberts 2.Brochial cancer Dr. Lee 3.HIV Dr.Roberts 4.HIV Dr. Lee 5.Influenzae Dr. Wang 6.Bronchial Cancer Dr. Wang 7.Influenzae Dr. Roberts 8.dengue prof. christopher 9.Arthritis prof. swaminathan 10.Arthritis prof. christopher 11.Asthma Dr. Roberts 12.HIV Dr. Lee 13.Bronchial Cancer Dr. Wang 14.dengue prof. christopher 15.HIV prof. christopher 16.HIV Dr. Lee

Term you are looking for : HIV Names of the researchers Frequency Dr. Roberts 1 Dr. Lee 3 prof. christopher 1

In[2]: term = input("Enter the term you are looking for:") term = term.lower() list_of_terms = [] for row in data: if row[data.Terms] == term researcher1 += 1 elif data.Terms == term researcher2 += 1 elif data.Terms == term researcher3 += 1 else print("Sorry!", term, "not found in the database!") print("Term you are looking for : ", term) print("Dr. Roberts:", researcher1) print("Dr. Lee:", researcher2) print("prof. christopher:", researcher3)

3条回答

网友

1楼 · 编辑于 2024-06-10 03:04:31

在Python中，当创建if、elif、for循环等时，正确的语法是在初始化行的末尾有一个冒号。因此，在您的代码中，您需要将其更新为以下内容：

    for row in data: 
        if row[data.Terms] == term:
            researcher1 += 1

        elif data.Terms == term:
            researcher2 += 1

        elif data.Terms == term:
            researcher3 += 1

        else:
            print("Sorry!", term, "not found in the database!")

而且，一旦你纠正了这个问题，基于你的代码，你看起来也会有一个bug。您正在将用户输入设置为小写，但对从CSV文件读取的数据没有执行相同的操作。因此，没有一个术语能与用户输入相等。你知道吗

网友

2楼 · 编辑于 2024-06-10 03:04:31

`groupby`和`value_counts`

简单直观

df.Terms = df.Terms.str.replace('\d+\.\s*', '').str.upper()
df.Researcher = df.Researcher.str.title()
s = df.groupby('Terms').Researcher.value_counts()

s

Terms             Researcher       
ARTHRITIS         Prof. Christopher    1
                  Prof. Swaminathan    1
ASTHMA            Dr. Roberts          2
BROCHIAL CANCER   Dr. Lee              1
BRONCHIAL CANCER  Dr. Wang             2
DENGUE            Prof. Christopher    2
HIV               Dr. Lee              3
                  Dr.Roberts           1
                  Prof. Christopher    1
INFLUENZAE        Dr. Roberts          1
                  Dr. Wang             1
Name: Researcher, dtype: int64

您可以使用loc或xs访问各种术语

s.loc['HIV']

Researcher
Dr. Lee              3
Dr.Roberts           1
Prof. Christopher    1
Name: Researcher, dtype: int64

或者

s.xs('HIV')

Researcher
Dr. Lee              3
Dr.Roberts           1
Prof. Christopher    1
Name: Researcher, dtype: int64

`pd.factorize`和`np.bincount`

import re

pat = re.compile('\d+\.\s*')
f, u = pd.factorize(list(zip(
    (re.sub(pat, '', x).upper() for x in df.Terms),
    df.Researcher.str.title()
)))

s = pd.Series(dict(zip(u, np.bincount(f))))

您可以使用与上述相同的方式访问。你知道吗

网友

3楼 · 编辑于 2024-06-10 03:04:31

您可以用与您所做的类似的方式迭代数据帧，但是由于您使用的是pandas，因此值得利用pandas函数。它们通常比迭代快得多，代码最终看起来更干净。你知道吗

term_of_interest = 'HIV'

(df.groupby('Researcher')
 .apply(lambda x: x.Terms.str.contains(term_of_interest)
        .sum())
 .rename('Frequency').to_frame())

                   Frequency
Researcher                  
Dr. Lee                    3
Dr. Roberts                0
Dr. Wang                   0
Dr.Roberts                 1
prof. christopher          1
prof. swaminathan          0

`groupby`和`value_counts`

`pd.factorize`和`np.bincount`

相关问题更多 >

编程相关推荐

热门问题

热门文章