图书馆图书调用号的Python排序脚本(CSV文件)

2024-04-23 09:26:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在编写一个Python脚本,以便在CSV呼叫号码和标题列表中找到重复的条目。以下是CSV文件的格式:

920.105,George Mueller
920.105,George Mueller
920.105,George Mueller
327.373,The Letters to the Galatians and Ephesians
327.371,Galatians and Ephesians
289,The Modern Tongues Movement
288.01,The Seduction of Christianity
288.003,Understanding Cults and New Religions
288.002,Understanding Cults and New Religions
286.061,"History of the Baptists, A"
286.044,"History of the Baptists, A"
286.003,This Day in Baptist History 3
286.003,This Day in Baptist History 3
286.003,This Day in Baptist History 3

我需要做的是找到所有重复的电话号码,有不同的标题。所以我不在乎大部分的条目,因为它们是同一本书的副本。我在找同一个电话号码的不同的书。我的脚本将完成没有错误,但当我打开文件的脚本创建它是空的。
这是我的代码:

#!/usr/bin/python3

import csv


def readerObject(csvFileName):
    """
    Opens and returns a reader object.
    """
    libFile = open(csvFileName)
    libReader = csv.reader(libFile)
    libData = list(libReader)
    return libData


def main():

    # Initialize the state variable
    state = 0

    # Prompt the user for the CSV file name
    fileName = input('Enter the CSV file to be read (Please use the full path): \n')
    # Open readerObject and copy its contents into a list
    csvToList = readerObject(fileName)
    loopList1 = list(csvToList)

    # Create writer object to... Write to
    fileToWrite = input('Enter the name of the file to write to: \n')
    libOutputFile = open(fileToWrite, 'w', newline='')
    libOutputWriter = csv.writer(libOutputFile)

    # Loop 1:
    for a in range(len(loopList1)):
        if state == 1:
            libOutputWriter.writerow(loopList2[0])
            del loopList1[0]
        loopList2 = list(csvToList)
        state = 0
        # Loop 2:
        for b in range(len(loopList2)):
            if loopList2[0][0] == loopList2[1][0]:
                if loopList2[0][1] != loopList2[1][1]:
                    libOutputWriter.writerow(loopList2[1])
                    del loopList2[1]
                    state = 1

    libOutputFile.close()

if __name__ == "__main__":
    main()

提前谢谢!你知道吗


Tags: andofcsvthetoin脚本if
2条回答

如果输入按书号排序,则可以使用^{}

import csv
from io import StringIO
from itertools import groupby

text = '''920.105,George Mueller
920.105,George Mueller
920.105,George Mueller 1
327.373,The Letters to the Galatians and Ephesians
327.371,Galatians and Ephesians
289,The Modern Tongues Movement
288.01,The Seduction of Christianity
288.003,Understanding Cults and New Religions
288.002,Understanding Cults and New Religions
286.061,"History of the Baptists, A"
286.044,"History of the Baptists, A"
286.003,This Day in Baptist History 1
286.003,This Day in Baptist History 2
286.003,This Day in Baptist History 3'''

with StringIO(text) as in_file, StringIO() as out_file:
    reader = csv.reader(in_file)
    writer = csv.writer(out_file)

    for number, group in groupby(reader, key=lambda x: x[0]):

        titles = set(item[1] for item in group)
        if len(titles) != 1:
            writer.writerow((number, *titles))

    print(out_file.getvalue())

它将输出

920.105,George Mueller 1,George Mueller
286.003,This Day in Baptist History 2,This Day in Baptist History 3,This Day in Baptist History 1

请注意,我必须更改您的输入,因为这将不会产生任何输出。。。你知道吗

为了使用它,您需要将with StringIO(text) as file:替换为with open('infile.txt', 'r') as file之类的内容,以便程序读取您的实际文件(对于输出文件,类似于open('outfile.txt', 'w'))。你知道吗

再次说明:如果您的输入按数字排序,则这将仅起作用。你知道吗

这是基于@hiro protaginist的answer但它允许未排序的重复。你知道吗

import csv
from io import StringIO
from itertools import groupby
from collections import defaultdict

text = '''286.003,This Day in Baptist History 1
920.105,George Mueller
327.373,The Letters to the Galatians and Ephesians
327.371,Galatians and Ephesians
920.105,George Mueller 1
289,The Modern Tongues Movement
288.01,The Seduction of Christianity
920.105,George Mueller
288.003,Understanding Cults and New Religions
288.002,Understanding Cults and New Religions
286.061,"History of the Baptists, A"
286.044,"History of the Baptists, A"
286.003,This Day in Baptist History 2
286.003,This Day in Baptist History 3'''

with StringIO(text) as in_file, StringIO() as out_file:
    reader = csv.reader(in_file)
    writer = csv.writer(out_file)

    grouped = defaultdict(set)
    # Maps call_numbers to a set of all book_titles under that number
    for entry in reader:
        grouped[entry[0]].add(entry[1])
    for call_number, titles in grouped.items():
        if len(titles) > 1:
            for title in titles:
                writer.writerow((call_number, title))
    print(out_file.getvalue()) # Remove this line if actually writing to a file

与上述答案一样,用open(filename)替换StringIO(text),用open(outfilename, 'w')替换StringIO()。你知道吗

相关问题 更多 >