通过多列过滤数据并打印行?

2024-04-20 14:54:54 发布

您现在位置:Python中文网/ 问答频道 /正文

是我最后一个问题的后续。所以我在.csv文件中得到了如下数据:

 id,first_name,last_name,email,gender,ip_address,birthday
 1,Ced,Begwell,cbegwell0@google.ca,Male,134.107.135.233,17/10/1978
 2,Nataline,Cheatle,ncheatle1@msn.com,Female,189.106.181.194,26/06/1989  
 3,Laverna,Hamlen,lhamlen2@dot.gov,Female,52.165.62.174,24/04/1990
 4,Gawen,Gillfillan,ggillfillan3@hp.com,Male,83.249.190.232,31/10/1984
 5,Syd,Gilfether,sgilfether4@china.com.cn,Male,180.153.199.106,11/07/1995

我想要的是,当python程序运行时,它会询问用户要搜索哪些关键字。然后它接受所有输入的关键字(可能它们存储在一个列表中??),然后打印出包含所有关键字的所有行,无论该关键字位于哪个列中。你知道吗

我一直在玩csv和pandas,在google上搜索了好几个小时,但似乎无法让它像我希望的那样工作。我对Python3还是有点陌生。请帮忙。你知道吗

**编辑以显示我到目前为止的成果: 导入csv

# Asks for search criteria from user
search_parts = input("Enter search criteria:\n").split(",")
# Opens csv data file
file = csv.reader(open("MOCK_DATA.csv"))
# Go over each row and print it if it contains user input.
for row in file:
    if all([x in row for x in search_parts]):
        print(row)

如果只搜索一个关键字,效果会很好。但我想选择一个或多个关键字过滤。你知道吗


Tags: csvnameincomforsearchgoogle关键字
2条回答

请尝试使用以下代码对关键字进行AND搜索:

def AND_serach(df,list_of_keywords):
    # init a numpy array to store the index
    index_arr = np.array([]) 
    for keyword in list_of_keywords:
        # drop the nan if entire row is nan and get remaining rows' indexs
        index = df[df==keyword].dropna(how='all').index.values
        # if index_arr is empty then assign to it; otherwise update to intersect of two arrays
        index_arr = index if index_arr.size == 0 else np.intersect1d(index_arr,index)
    # get back the df by filter the index
    return df.loc[index_arr.astype(int)]

请尝试使用以下代码对关键字进行OR搜索:

def OR_serach(df,list_of_keywords):
    index_arr = np.array([]) 
    for keyword in list_of_keywords:
        index = df[df==keyword].dropna(how='all').index.values
        # get all the unique index
        index_arr = np.unique(np.concatenate((index_arr,index),0))
    return df.loc[index_arr.astype(int)]

输出

d = {'A': [1,2,3], 'B': [10,1,5]}
df = pd.DataFrame(data=d)
print df
   A   B
0  1  10
1  2   1
2  3   5

keywords = [1,5]
AND_serach(df,keywords) # return nothing
Out[]:
    A   B

OR_serach(df,keywords)
Out[]: 
    A   B
0   1   10
1   2   1
2   3   5

这里使用try和except,因为如果数据类型与关键字不匹配,则会引发错误

import pandas as pd
def fun(data,keyword):
    ans = pd.DataFrame()
    for i in data.columns:
        try:
            ans = pd.concat((data[data[i]==keyword],ans))
        except:
            pass
    ans.drop_duplicates(inplace=True)
    return ans

相关问题 更多 >