如何通过d将多个db选择合并到单个数据集

2024-06-11 09:09:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据库,我从中添加每个迭代的新数据,并尝试在datetime列中组合它们

我正在使用这部分代码:

# Iterate by days

    for row in rows:
        i += 1;
        df_name = f"{row[0]}_{row[1]}";
        print(f"Getting {df_name} {i}/{len(rows)}{spaces}", end="\r");

        if (predictionPoint == row[0]):
            currentDf = pd.read_sql(f"SELECT updated_at, c as '{df_name}_c', "
                                f"v as '{df_name}_v', o FROM commons "
                                f"WHERE cid LIKE '{predictionMeasure}%' AND s = '{row[0]}' AND cid = '{row[1]}' "
                                "ORDER BY updated_at DESC", con = sqlite);
        else:
            currentDf = pd.read_sql(f"SELECT updated_at, c as '{df_name}_c', "
                                f"v as '{df_name}_v' FROM commons "
                                f"WHERE cid LIKE '{predictionMeasure}%' AND s = '{row[0]}' AND cid = '{row[1]}' "
                                "ORDER BY updated_at DESC", con = sqlite);

        currentDf["updated_at"] = currentDf["updated_at"].apply(convertDatetime);

        if (df.empty == False):
            df = pd.merge(left = df, right = currentDf, on = "updated_at", how = "inner");
        else:
            df = currentDf;

    if not os.path.exists(f"{dirName}/{datasetFilename}"):
        df.to_csv(f"{dirName}/{datasetFilename}", encoding = "utf-8", index = False);
    else:
        tempDf = pd.read_csv(f"{dirName}/{datasetFilename}", parse_dates = ["updated_at"]);
        df = pd.concat([tempDf, df], axis = 0, sort = False);
        df.to_csv(f"{dirName}/{datasetFilename}", encoding = "utf-8", index = False);

    print(f"Dataset created {a}/{len(archives)}{spaces}");


df = pd.read_csv(f"{dirName}/{datasetFilename}", parse_dates = ["updated_at"]);
df = df.set_index("updated_at", drop = False);

print("Sorting, filling N/A, cleaning...");
df = df.sort_index(ascending = False);

df = df.fillna(method = "ffill").fillna(method = "bfill");

我在这段代码的27个字符串处遇到了一个错误,它返回了一个未合并的数组,其中重复的列更新了,或者数据帧被剪切了,但是我希望:

updated_at one two three four
2019-06-02 23:59:45  1  2  3  4
2019-06-02 23:59:30  2  3  4  5
2019-06-02 23:59:15  3  4  5  6
2019-06-02 23:59:00  4  5  6  7
2019-06-02 23:58:45  5  6  7  8

由于通过合并添加的数据,无需复制更新的_at,也无间隙。 我已经尝试过使用连接和其他类型的合并


Tags: andcsvnamefalsedfreadasat