如何在使用Pandas时避免CSV重复条目？

Question

我正在尝试在一个CSV文件中记录成功的次数和失败的次数。为此，我使用了Python的Pandas库。

我的期望是：所有的记录应该是唯一的。邮箱这一列也应该是唯一的，成功次数或失败次数应该根据参数来增加。

但是我遇到了一个问题：在CSV文件中出现了重复的记录。输出的CSV中有多个相同邮箱的记录。

我在代码中检查了邮箱是否已经存在，但仍然在data.csv中得到了重复的记录。

编辑：添加了一段完整的代码来重现这个问题。

import pandas as pd
from datetime import datetime


def save_data(email: str, is_success: int, is_failed: int) -> None:
    csv_file_path = "data.csv"

    # Check if the CSV file already exists and handle empty file case
    try:
        df = pd.read_csv(csv_file_path)
    except FileNotFoundError:
        df = pd.DataFrame(
            columns=["email", "success_count", "failure_count", "last_updated_on"]
        )

    # Check for duplicates and update counts
    if email in df["email"].values:
        index = df[df["email"] == email].index[0]
        df.at[index, "failure_count"] += is_failed
        df.at[index, "success_count"] += is_success
        df.at[index, "last_updated_on"] = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    else:
        # Append new entry
        new_entry = {
            "email": email,
            "success_count": is_success,
            "failure_count": is_failed,
            "last_updated_on": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
        }
        df = df._append(new_entry, ignore_index=True)

    # Write to CSV file
    try:
        df.to_csv(csv_file_path, index=False)
    except Exception as e:
        print("Error occurred while writing to CSV file:", e)


if __name__ == "__main__":
    arr = [
        ("123456", 1, 0),
        ("456789", 0, 1),
        ("789012", 1, 0),
        #
        ("123456", 0, 1),
        ("456789", 1, 0),
        ("789012", 0, 1),
    ]

    for data in arr:
        email, is_success, is_failed = data
        save_data(email=email, is_success=is_success, is_failed=is_failed)

数据处理数据清洗数据分析 pandas库 csv文件邮箱验证重复记录唯一性验证

如何在使用Pandas时避免CSV重复条目？

1 个回答

撰写回答