我已将连续特征转换为分类特征。我要去Pandas馆

2024-04-16 08:11:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我已将连续数据集转换为分类数据集。当转换后连续数据的值为0.0时,我得到的是nan值。下面是我的代码

import pandas as pd
import matplotlib as plt
df = pd.read_csv('NSL-KDD/KDDTrain+.txt',header=None)
data = df[33]
bins = [0.000,0.05,0.10,0.15,0.20,0.25,0.30,0.35,0.40,0.45,0.50,0.55,0.60,0.65,0.70,0.75,0.80,0.85,0.90,0.95,1.00]
category = pd.cut(data,bins)
category = category.to_frame()
print (category)

如何转换这些值,以便不获取NaN值。为了更好地理解实际数据和转换数据的外观,我附上了两个屏幕截图This is the main datasetThis is the what it becomes after using bins and pandas.cut()。“0.00”如何保持与数据集中的其他值相同


Tags: the数据代码importpandasdfdatais
1条回答
网友
1楼 · 发布于 2024-04-16 08:11:23

使用pd.cut时,可以指定参数include_lowest = True。这将使第一个内部左包含(它将包括0值,因为第一个间隔从0开始)

所以在你的情况下,你可以调整你的代码

import pandas as pd
import matplotlib as plt
df = pd.read_csv('NSL-KDD/KDDTrain+.txt',header=None)
data = df[33]
bins = [0.000,0.05,0.10,0.15,0.20,0.25,0.30,0.35,0.40,0.45,0.50,0.55,0.60,0.65,0.70,0.75,0.80,0.85,0.90,0.95,1.00]
category = pd.cut(data,bins,include_lowest=True)
category = category.to_frame()
print (category)

^{}的文档参考

相关问题 更多 >