任务:
我有一项任务,将csv文件第一列中的字符串与日志文件匹配,如果存在,则将匹配的字符串放在第三列,否则将“未检测到”
我的日志文件的内容-trendx.log 我的csv文件的内容-sha1_vsdt.csv
预期输出:
代码:
到目前为止,我使用pandaframe和numpy使用了这个概念,只是遵循了一些人的建议
import numpy as np
import pandas as pd
import csv
#Log data into dataframe using genfromtxt
logdata = np.genfromtxt("trendx.log", delimiter=" ",invalid_raise = False,dtype=str, comments=None,usecols=np.arange(0,24))
logframe = pd.DataFrame(logdata)
#Dataframe trimmed to use only SHA1, PRG and IP
df2=(logframe[[10,14,15]]).rename(columns={10:'SHA1', 14: 'PRG',15:'IP'})
#sha1_vsdt data into dataframe using read_csv
df1=pd.read_csv("sha1_vsdt.csv",delimiter=r"|",error_bad_lines=False,engine = 'python',quoting=3)
#Using merge to compare the two CSV
df = pd.merge(df1, df2, left_on='SHA-1', right_on='SHA1', how='left').replace(np.nan, 'undetected', regex=True)
print df[['SHA-1','VSDT','PRG','IP']]
我有个错误:
Warning (from warnings module):
File "C:\Users\Administrator\Desktop\OJT\match.py", line 6
logdata = np.genfromtxt("trendx.log", delimiter=" ",invalid_raise = False,dtype=str, comments=None,usecols=np.arange(0,24))
ConversionWarning: Some errors were detected !
Line #1 - #113 (got 1 columns instead of 24)
Traceback (most recent call last):
File "C:\Users\Administrator\Desktop\OJT\match.py", line 9, in <module>
df2=(logframe[[10,14,15]]).rename(columns={10:'SHA1', 14: 'PRG',15:'IP'})
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2682, in __getitem__
return self._getitem_array(key)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2726, in _getitem_array
indexer = self.loc._convert_to_indexer(key, axis=1)
File "C:\Python27\lib\site-packages\pandas\core\indexing.py", line 1327, in _convert_to_indexer
.format(mask=objarr[mask]))
KeyError: '[10 14 15] not in index'
这个代码应该有用。您不需要为
np.genfromtxt
传入分隔符,因为它默认为在空白处进行分隔,这可能是您想要的。你知道吗另外,
pd.read_csv
的分隔符应该是“,”,因为它是一个csv文件。你知道吗此代码生成
相关问题 更多 >
编程相关推荐