允许Pandas中的重复列

2024-06-17 12:11:15 发布

您现在位置：Python中文网/ 问答频道 /正文

1628

网友

男 | 程序猿一只，喜欢编程写python代码。

我将一个大的CSV（包含股票财务数据）文件拆分成更小的块。CSV文件的格式不同。类似于Excel数据透视表。第一列的前几行包含一些标题。在

公司名称、id等在下面的列中重复出现。因为一个公司有多个属性，不像一个公司只有一个列。在

在前几行之后，列开始类似于一个典型的数据帧，其中标题在列中而不是在行中。在

无论如何，我要做的是让Pandas允许重复的列标题，而不是让它在标题后面添加“.1”、“.2”、“.3”等。我知道熊猫天生就不允许这样，有解决办法吗？我试图在read_csv上设置header=None，但它抛出了一个我认为有意义的标记化错误。我就是想不出一个简单的方法。在

import pandas as pd

csv_path = "C:\\Users\\ThirdHandBD\\Desktop\\Data Splitting\\pd-split\\chunk4.csv"

#df = pd.read_csv(csv_path, header=1, dtype='unicode', sep=';', low_memory=False, error_bad_lines=False)
df = pd.read_csv(csv_path, header = 1, dtype='unicode', sep=';', index_col=False)
print("I read in a dataframe with {} columns and {} rows.".format(
len(df.columns), len(df)
))

filename = 1

#column increment
x = 30 * 59

for column in df:
    loc = df.columns.get_loc(column)
    if loc == (x * filename) + 1:
        y = filename - 1
        a = (x * y) + 1
        b = (x * filename) + 1
        date_df = df.iloc[:, :1]
        out_df = df.iloc[:, a:b]
        final_df = pd.concat([date_df, out_df], axis=1, join='inner')
        out_path = "C:\\Users\\ThirdHandBD\\Desktop\\Data Splitting\\pd-split\\chunk4-part" + str(filename) + ".csv"
        final_df.to_csv(out_path, index=False)
        #out_df.to_csv(out_path)
        filename += 1

# This should be the same as df, but with only the first column.
# Check it with similar code to above.

编辑：

从https://github.com/pandas-dev/pandas/issues/19383开始，我添加：

^{pr2}$

所以，完整代码：

import pandas as pd

csv_path = "C:\\Users\\ThirdHandBD\\Desktop\\Data Splitting\\pd-split\\chunk4.csv"

#df = pd.read_csv(csv_path, header=1, dtype='unicode', sep=';', low_memory=False, error_bad_lines=False)
df = pd.read_csv(csv_path, header = 1, dtype='unicode', sep=';', index_col=False)
print("I read in a dataframe with {} columns and {} rows.".format(
len(df.columns), len(df)
))

filename = 1

#column increment
x = 30 * 59

for column in df:
    loc = df.columns.get_loc(column)
    if loc == (x * filename) + 1:
        y = filename - 1
        a = (x * y) + 1
        b = (x * filename) + 1
        date_df = df.iloc[:, :1]
        out_df = df.iloc[:, a:b]
        final_df = pd.concat([date_df, out_df], axis=1, join='inner')
        out_path = "C:\\Users\\ThirdHandBD\\Desktop\\Data Splitting\\pd-split\\chunk4-part" + str(filename) + ".csv"
        final_df.columns = final_df.iloc[0]
        final_df = final_df.reindex(final_df.index.drop(0)).reset_index(drop=True)
        final_df.to_csv(out_path, index=False)
        #out_df.to_csv(out_path)
        filename += 1

# This should be the same as df, but with only the first column.
# Check it with similar code to above.

现在，整个第一排都不见了。对于不带标题的“.1”，应将其替换为“.1”

截图：

SimFin ID行已不存在。在

Tags： columns csv to path false df read index

1条回答

网友

1楼 · 发布于 2024-06-17 12:11:15

我是这样做的：

    final_df.columns = final_df.columns.str.split('.').str[0]

参考文献： https://pandas.pydata.org/pandas-docs/stable/text.html

允许Pandas中的重复列

相关问题更多 >

编程相关推荐

热门问题

热门文章

允许Pandas中的重复列

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >