如果列表元素存在,则搜索CSV

2024-04-20 11:20:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我是Python新手,尝试使用csv.reader导入2个csv文件,然后比较一个文件中的元素是否存在于另一个文件中,如果存在,则删除整行。你知道吗

我发现类似问题的其他问题表明列表理解是一种方法,但是当我循环检查appList是否存在于machine中时,我得到的结果是像这样的空括号[]。你知道吗

到目前为止,我的代码是:

import csv

appList = csv.reader(open('applist.csv', encoding = "ISO-8859-1"))
appList = list(appList)

machine = csv.reader(open('machine.csv', encoding = "ISO-8859-1"))
machine = list(machine)

for app in appList:
     machine = [app for app in machine if app not in machine]
     print(machine)

那个应用程序.csv看起来像这样(它是一个macOS标准版本上的应用程序列表)

Adobe Creative Cloud for Enterprise
Adobe Acrobat DC Professional
Adobe Bridge CC
Adobe Extension Manager CC
Adobe Illustrator CC 2015
Adobe InDesign CC 2015
Adobe Photoshop CC 2015
Adobe Media Encoder CC 2015
AirPort Utility 6
App Store
Automator 2
[...]

那个计算机.csv看起来像这样。。。你知道吗

"Application name";"Metric";"Last used";"Requirement";"Entitlement state";"Remark"
"Adobe Creative Cloud for Enterprise (Mac)";"Installations";"2018-03-28T10:45:00+01:00";"1";"Not covered";""
"Adobe Acrobat DC Professional (Mac)";"Installations";"2018-03-22T17:08:00+00:00";"0";"No requirement";"Installation included in software bundle"
"Adobe Bridge CC (Mac)";"No license required";"2018-03-12T13:45:00+00:00";"";"";"Installation included in software bundle"
"Adobe Extension Manager CC (Mac)";"No license required";"";"";"";"Installation included in software bundle"
"Adobe Illustrator CC 2015 (Mac)";"Installations";"2018-03-12T13:41:00+00:00";"0";"No requirement";"Installation included in software bundle"

[更新为添加]

我当前的代码:

#!/usr/local/bin/python3

import os
import csv

def csv_reader(machine_dir, machine):
    mach_list = list(csv.reader(open(machine_dir + "/" + machine, encoding="ISO-8859-1"), delimiter=";"))
    return mach_list

def main():
    # Get the paths to the csv files
    csvFile = input("drop the app list csv here: ")
    machine_dir = input("drop the machines csv folder here: ")

    # Import appList csv
    app_list = list(csv.reader(open(csvFile, encoding = "ISO-8859-1")))

    # Get list of machine csv
    machines = os.listdir(machine_dir)

    for machine in machines:
        machine_list = csv_reader(machine_dir, machine)

        new_machine = [app for app in app_list if app not in machine_list]

        print(new_machine)



if __name__ == '__main__': main()

我目前正在一个机器csv文件上测试它,返回的结果不是从machine_list减去app_list后剩下的结果


Tags: 文件csvinappformacdiropen
2条回答

你正在使用一个常规的循环,然后做一个列表理解,我不认为这是你需要的。你知道吗

在列表理解中,您循环遍历machine中的值,如果值在machine中是而不是,则将值附加到列表中。所以你的逻辑有点错误。实际上,您需要在列表理解中遍历appList的值,并查看它们是否出现在列表machine中:

import csv

appList = csv.reader(open('applist.csv', encoding = "ISO-8859-1"))
appList = list(appList)

machine = csv.reader(open('machine.csv', encoding = "ISO-8859-1"))
machine = list(machine)

new_machine = [app for app in appList if app not in machine]

编辑:

打开文件时,如果检查它们,它们就是嵌套列表。一种解决方案是将列表展平,然后使用相同的列表理解:

import csv

appList = csv.reader(open('applist.csv'))
appList = list(appList)

machine = csv.reader(open('machine.csv'))
machine = list(machine)

# Flatten both appList and machine
flat_appList = [item for sublist in appList for item in sublist]
flat_machine = [item for sublist in machine for item in sublist]

new_machine = [app for app in flat_machine if app not in flat_appList]

注意:小心-在示例csv文件中应用程序.csv包含例如Adobe Creative Cloud for Enterprise,它与您的计算机.csvAdobe Creative Cloud for Enterprise (Mac)

或者,您可以使用pandashttps://pandas.pydata.org/pandas-docs/stable/api.html)(假设要保留的每个文件中没有重复的行)。你知道吗

import pandas

app = pandas.read_csv('applist.csv', encoding="ISO-8859-1")
machine = pandas.read_csv('machine.csv', encoding="ISO-8859-1")

# Combine both dataframes into one
dataframe = app.append(machine, ignore_index=True)

# Only keep the first of each set of duplicates
# This should give us the machine list (without any of the lines
# duplicated in the applist) plus the full applist
dataframe.drop_duplicates(keep='first', inplace=True)
# Now add the applist again
dataframe = dataframe.append(app, ingore_index=True)
# Now drop all the duplicates
# (since the applist was added again, this should drop the entire applist)
dataframe.drop_duplicates(keep=False, inplace=True)
dataframe.reset_index(inplace=True)

# Now 'dataframe' should be the machine list without any lines from applist

如果这些文件相对较小,那么使用循环与使用pandas的时间大致相同,但是如果这些文件较大,pandas的速度应该明显更快。你知道吗

相关问题 更多 >