我试图分析一个文件,看起来像一个csv文件,但它不是。它由逗号分隔,但每个逗号后面都有一个空格。而且没有标题,行的长度也不同。你知道吗
下面是一个示例,如果我以.txt格式打开文件,会得到如下结果:
FUD, speed, time, heading, offsets
MUD, speed, time, heading, offsets, error
CLA, head, time, speed, offset, error, errorfix
MUD, speed, time, heading, offsets, error
MUD, speed, time, heading, offsets, error
FUD, speed, time, heading, offsets
CLA, head, time, speed, offset, error, errorfix
CLA, head, time, speed, offset, error, errorfix
(note head, time, offset and all those after the first column are all values.)
现在我已经试过了。你知道吗
import pandas as pd
df =pd.read_csv('data.csv', headers = None)
MUD = df[df[0]=='MUD'].values.tolist()
然而,我得到了这个错误
CParserError: Error tokenizing data. C error: Expected 10 fields in line 3, saw 18
当我在谷歌上搜索错误时,有人建议我应该使用
error_bad_lines=False
但是,这给了我一个错误:
expected 10 fields, saw 15.
我试着把我看到的每一个泥巴都列一个熊猫名单,所以以后我可以这样做:
newMUD = MUD[4]/100
最终我会得到这样的结果:
print (MUD)
MUD, 12, 1, 5, 1, 1
MUD, 13, 2, 3, 2, 0
MUD, 12, 3, 5, -2, 0
MUD, 4, 4, 3, -3, 1
我的数据样本
NKF1, 447526092, -3.08, 0.01, 175.83, -0.02133949, 0.03264881, -0.06251871, 0, -28.93325, 26.49632, -0.1290034, 0.07, -0.02, 0.14
NKF2, 447526092, -26, 0.00, 0.00, 0.00, 0.00, 0.00, 255, 55, 341, 0, 0, 0, 0
NKF3, 447526092, -0.01, 0.06, 0.12, -0.04, -0.08, -0.03, 0, 0, 0, -0.73, 0.00
NKF4, 447526092, 0.03, 0.01, 0.00, 0.00, 0.00, 0.0002261061, 0, 0, 0, 16, 9023, 0, 1
NKF5, 447526092, 0, 0, 0, 0, 1.14, 0.88, 0.00, 0.00, 0.50, 0.003602755, 0.01431285, 0.02802294
NKF6, 447526092, -2.66, -0.98, 187.53, -0.06789517, -0.2714562, -0.1189714, 0, -28.96132, 26.25431, -0.2784806, 0.00, 0.36, -0.49
NKF7, 447526092, 21, 0.00, 0.00, 0.00, 0.00, 0.00, 258, 55, 338, 0, 0, 0, 0
NKF8, 447526092, -0.04, -0.20, 0.07, -0.04, -0.23, -0.17, 0, 0, 0, 10.83, 0.00
NKF9, 447526092, 0.04, 0.03, 0.01, 0.12, 0.00, 0.000866859, 0, 0, 0, 16, 9023, 0, 1
AHR2, 447526241, -3.12, -0.42, 176.43, 418.84, 34.3167522, -118.4068499
POS, 447526306, 34.3167515, -118.406853, 419.03, 0.2784806
IMU, 447545009, -0.09418038, 0.1740572, -0.05483108, 0.6083156, 0.2225795, -9.380787, 0, 0, 52.99446, 1, 1
IMU2, 447545009, -0.09127176, 0.1908958, -0.06220703, 0.524766, 0.3107446, -8.754621, 0, 0, 56.125, 1, 1
SONR, 447545584, 0, 0, 0, 0
RFND, 447545593, 0.00, 0.00
IMU, 447565482, -0.08753563, 0.1228692, -0.04508965, 0.6137247, -0.01505011, -9.579732, 0, 0, 53.0831, 1, 1
IMU2, 447565482, -0.08944235, 0.139776, -0.05096832, 0.4677677, 0.03778861, -9.214079, 0, 0, 55.875, 1, 1
GPS, 447565911, 4, 246769200, 1920, 14, 0.70, 34.3167523, -118.4068497, 418.91, 0.05656854, 135, -0.16, 1
GPA, 447565911, 1.11, 0.73, 1.04, 0.29, 1, 447565
SONR, 447566084, 0, 0, 0, 0
RFND, 447566093, 0.00, 0.00
ATT, 447566114, 0.00, -2.88, 0.00, -0.62, 0.00, 187.41, 0.02, 0.01
PIDR, 447566125, 0, 0, 0, 0, 0, 0
PIDP, 447566135, 0, 0, 0, 0, 0, 0
PIDY, 447566145, 0, 0, 0, 0, 0, 0
PIDS, 447566155, 0, 0, 0, 0, 0, 0
NKF1, 447566164, -3.30, 0.35, 175.70, -0.02778457, 0.03493549, -0.04115778, 0, -28.9337, 26.49665, -0.1338468, 0.07, -0.02, 0.14
NKF2, 447566164, -26, 0.00, 0.00, 0.00, 0.00, 0.00, 255, 55, 341, 0, 0, 0, 0
NKF3, 447566164, -0.01, 0.06, 0.12, -0.04, -0.08, -0.11, 0, 0, 0, -0.73, 0.00
NKF4, 447566164, 0.03, 0.01, 0.00, 0.00, 0.00, 0.0002256641, 0, 0, 0, 16, 9023, 0, 1
NKF5, 447566164, 0, 0, 0, 0, 1.14, 0.88, 0.00, 0.00, 0.50, 0.003267812, 0.01763795, 0.02970827
NKF6, 447566164, -2.88, -0.62, 187.40, -0.07544779, -0.2697962, -0.09678251, 0, -28.96231, 26.2515, -0.2831134, 0.00, 0.36, -0.49
NKF7, 447566164, 21, 0.00, 0.00, 0.00, 0.00, 0.00, 258, 55, 338, 0, 0, 0, 0
NKF8, 447566164, -0.04, -0.20, 0.07, -0.04, -0.23, -0.25, 0, 0, 0, 10.83, 0.00
NKF9, 447566164, 0.04, 0.03, 0.01, 0.12, 0.00, 0.00086712, 0, 0, 0, 16, 9023, 0, 1
AHR2, 447566373, -3.34, -0.07, 176.32, 418.84, 34.3167522, -118.4068497
POS, 447566396, 34.3167515, -118.406853, 419.04, 0.2831134
IMU, 447587271, -0.08603665, 0.071096, -0.03380377, 0.5931511, -0.07432687, -9.615693, 0, 0, 53.0831, 1, 1
IMU2, 447587271, -0.08848803, 0.09229023, -0.04071644, 0.4688947, 0.01987415, -9.166938, 0, 0, 56.125, 1, 1
MAG, 447587700, -265, -77, 332, -115, 0, 1, 0, 0, 0, 1, 447587691
MAG2, 447587700, -273, -29, 372, 77, -135, 38, 0, 0, 0, 1, 447587693
ARSP, 447587748, 2.969838, 4.424126, 38.22, -4.424126, 110.8502, 1
BARO, 447587789, -0.09136668, 97036.14, 55.03, -0.8952343, 447587, 0
CURR, 447587949, 16.91083, 0.6012492, 60.22538
如果你真的想对列进行计算,那么使用Pandas是有意义的(不是从问题中得到的)。在这种情况下,传递预期的列名就足够了,这样解析器就不会对不断变化的列数感到惊讶了:
您可以在使用
from_records
创建数据帧时过滤行。这里我使用csv
模块创建行并丢弃不需要的行。你知道吗这有点危险-如果非泥线不符合标准csv规则,读取器可能会出错。下面是一个更复杂的版本,它将csv解析器限制为泥线
相关问题 更多 >
编程相关推荐