从具有不同列数的csv文件中读取和选择项目

2条回答

网友

1楼 · 编辑于 2024-05-13 23:39:14

你需要做一些预处理。如果您处理来自外部系统的数据，那么考虑这些集成点是非常常见的。你知道吗

外部文件包含结构化数据。CSV行的序列，每个项目有5个标题行。最后一个标题行包含CSV列标签。你知道吗

从外部文件读入内容。根据您的需要调整下面的代码。你知道吗

external_file_content = r'''
"Path","File","Date Acquired","Sample","Misc"
"C:\msdchem\2\DATA\AlbertVirgili\DaniGM\","DGM_CPTIS003 1h.D","25-Mar-19, 11:55:48","DGM_CPTIS003 1h"," "
"INT FID1A.CH"
"Mon Mar 25 17:48:31 2019"
"Peak","R.T.","Start","End","PK TY","Height","Area","Pct Max","Pct Total"
1, 2.082, 2.063, 2.189,"BB ",223849319,4951058782,100.00, 46.349
2, 2.317, 2.281, 2.386,"BB ",73209942,1093871144, 22.09, 10.240
3, 3.343, 3.224, 3.403,"BB ",93165657,2220621038, 44.85, 20.788
4, 5.538, 5.409, 5.598,"BB ",51783798,1975386485, 39.90, 18.492
5, 5.744, 5.693, 5.803,"BB ",24084957,360235490, 7.28, 3.372
6, 8.716, 8.676, 8.776,"BB ",8566883, 80973220, 1.64, 0.758
"Path","File","Date Acquired","Sample","Misc"
"C:\msdchem\2\DATA\AlbertVirgili\DaniGM\","DGM_CPTIS003 2h.D","25-Mar-19, 12:15:42","DGM_CPTIS003 2h"," "
"INT FID1A.CH"
"Mon Mar 25 12:31:45 2019"
"Peak","R.T.","Start","End","PK TY","Height","Area","Pct Max","Pct Total"
1, 2.083, 2.064, 2.194,"BB ",232382153,5255486688,100.00, 59.673
2, 2.318, 2.282, 2.384,"BB ",37916041,587535474, 11.18, 6.671
3, 3.322, 3.241, 3.381,"BB ",67715293,1373898201, 26.14, 15.600
4, 5.509, 5.406, 5.569,"BB ",39502747,1227609422, 23.36, 13.939
5, 5.731, 5.689, 5.791,"BB ",17799521,230201751, 4.38, 2.614
6, 8.717, 8.674, 8.776,"BB ",12367646,132409300, 2.52, 1.503
'''

使用定义良好的分隔符将序列拆分为唯一的部分

parts = external_file_content.split('"Path","File","Date Acquired","Sample","Misc"')

选择要进一步处理到数据帧中的单个部件。配置pd.read_csv跳过4行。你知道吗

df = pd.read_csv(StringIO(parts[1]), skiprows=4);

显示数据帧的第一行

df.head(5)


    Peak    R.T.    Start   End     PK TY   Height  Area    Pct Max     Pct Total
0   1   2.082   2.063   2.189   BB  223849319   4951058782  100.00  46.349
1   2   2.317   2.281   2.386   BB  73209942    1093871144  22.09   10.240
2   3   3.343   3.224   3.403   BB  93165657    2220621038  44.85   20.788
3   4   5.538   5.409   5.598   BB  51783798    1975386485  39.90   18.492
4   5   5.744   5.693   5.803   BB  24084957    360235490   7.28    3.372

网友

2楼 · 编辑于 2024-05-13 23:39:14

过滤掉非数字行

def gen_rows(stream):
    for row in csv.reader(stream):             
        if row.pop(0).isdigit(): # check that value is a number  
            yield row

with open('data.csv') as fo:
    df = pd.DataFrame.from_records(gen_rows(fo), 
    columns = ["Peak","R.T.","Start","End","PKTY",
                    "Height","Area","Pct Max","Pct Total"])

相关问题更多 >

编程相关推荐

热门问题

热门文章

从具有不同列数的csv文件中读取和选择项目

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >