解析、删除和屏蔽IP地址的脚本

2024-04-27 04:51:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个包含3列的CSV文件:

  • 第1列-总值-它是ID\u IP address[5151515199.999.999.999]

  • 列2-时间列-时间[2019-02-25T19:04:59.999-0500]

  • 第3列-IP地址(IPv4和IPv6)-IP[99.999.999.999]

我试图解析第一列中的id,方法是将其拆分为两列,其中包含id和IP地址,然后丢弃包含新创建的IP地址的列,因为它们已经包含在第3列中。你知道吗

这是我目前掌握的代码:

import pandas as pd
from pandas import read_csv
df1= pd.read_csv('C:\\Users\\[redacted]\\Documents\\Python\\Parsing.csv')
df1.dropna(inplace = True) # dropping null value columns to avoid errors
df1 = df1["Overall Value"].str.split(" ", n = 1, expand = True) # updating data frame with split value columns
df1["ID"]= df1[0] # making seperate ID column from new data frame
df1["IP2"]= df1[1] # making seperate IP column from new data frame
df1["Time"]= df1[2]
df1["IP"]= df1[3]
df1.drop(columns =["IP2"], inplace = True) # deleting column 2
df2 = pd.read_csv('C:\\Users\\[redacted]\\Documents\\Python\\Parsingcopy.csv', index_col=0)
df1 = df1.map(df2)
df1.to_csv('C:\\Users\\[redacted]\\Documents\\Python\\Parsingcopy2.csv')

为什么它会给我以下的错误?你知道吗

C:\Users\[Redacted]>C:\Python27\python.exe C:\Users\[Redacted]\Documents\Python\Parsing.py
Traceback (most recent call last):
File "C:\Users\[Redacted]\Documents\Python\Parsing.py", line 21, in <module>
    df1["RestofData"]= df1[2]
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2139, in __getitem__
    return self._getitem_column(key)
  File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2146, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Python27\lib\site-packages\pandas\core\generic.py", line 1842, in _get_item_cache
    values = self._data.get(item)
  File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3843, in get
    loc = self.items.get_loc(item)
  File "C:\Python27\lib\site-packages\pandas\core\indexes\base.py", line 2527, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 2

Tags: csvinpyselfpandasgetlineitem
1条回答
网友
1楼 · 发布于 2024-04-27 04:51:17

通过这样做:

df1 = df1["Overall Value"].str.split(...)

您不是更新现有的数据帧,而是创建一个新的数据帧并将df1名称指向它。你知道吗

df1现在不再引用原始数据帧,因此df[2](和df[3])不存在,这就是KeyError: 2告诉您的。你知道吗

相反,应该为临时数据帧使用不同的名称,然后使用该名称更新原始数据帧中的列。你知道吗

另外,不要先创建两个新列,然后立即丢弃其中一个,而应该只使用实际需要的列。你知道吗

对于已经存在的其余列,应该使用索引1和索引2,而不是索引2和索引3,但是由于它们已经包含在df1中,因此不必“重新插入”它们。你知道吗

像这样:

ids_ips = df1["Overall Value"].str.split(" ", n = 1, expand = True)
df1["ID"] = ids_ips[0]
# df1["IP2"] = ids_ips[1]  <  don't do this
df1["Time"] = df1[1]  # this is probably not necessary, too
df1["IP"] = df1[2]    # neither is this

相关问题 更多 >