在CSV中跨多个列进行Python Numpy搜索

low low 5more more big high vgood vhigh vhigh 2 2 small low unacc vhigh vhigh 2 2 small med unacc vhigh vhigh 2 2 small high unacc vhigh vhigh 2 2 med low unacc vhigh vhigh 2 2 med med unacc vhigh vhigh 2 2 med high unacc

1条回答

网友

1楼 · 发布于 2024-05-14 15:04:08

如果显示的是一个文件，则可以使用

In [259]: arr = np.genfromtxt('tmp.csv', names=True, dtype=None)

In [260]: arr
Out[260]: 
array([('vhigh', 'vhigh', 2, 2, 'small',  'low', 'unacc'),
       ('vhigh', 'vhigh', 2, 2, 'small',  'med', 'unacc'),
       ('vhigh', 'vhigh', 2, 2, 'small', 'high', 'unacc'),
       ('vhigh', 'vhigh', 2, 2,   'med',  'low', 'unacc'),
       ('vhigh', 'vhigh', 2, 2,   'med',  'med', 'unacc'),
       ('vhigh', 'vhigh', 2, 2,   'med', 'high', 'unacc')], 
      dtype=[('low', 'S5'), ('low_1', 'S5'), ('5more', '<i8'), ('more', '<i8'), ('big', 'S5'), ('high', 'S4'), ('vgood', 'S5')])

对于“搜索”，有几种解释。对于所有这些，我们希望一次只看一列。让我们看看5（从左起第六个，在顶行中标记为high，我假设这是该列的标题）。看起来像这样：

^{pr2}$

通过直接比较，您可以看到'high'列的值为'high'的行：

In [269]: arr['high'] == 'high'
Out[269]: array([False, False,  True, False, False,  True], dtype=bool)

{cd5>的索引可以看到：

In [270]: np.where(arr['high'] == 'high')
Out[270]: (array([2, 5]),)

或者您可以只获取'high'行中包含'high'的行：

In [271]: arr[arr['high'] == 'high']
Out[271]: 
array([('vhigh', 'vhigh', 2, 2, 'small', 'high', 'unacc'),
       ('vhigh', 'vhigh', 2, 2, 'med', 'high', 'unacc')], 
      dtype=[('low', 'S5'), ('low_1', 'S5'), ('5more', '<i8'), ('more', '<i8'), ('big', 'S5'), ('high', 'S4'), ('vgood', 'S5')])

如果您想同时搜索'vhigh'和'high'，可以使用np.char.endswith（如果不一定是结尾，则使用np.char.count），这将得到以下任一结果：

In [272]: np.char.endswith(arr['low'], 'high')
Out[272]: array([ True,  True,  True,  True,  True,  True], dtype=bool)

In [273]: np.char.endswith(arr['high'], 'high')
Out[273]: array([False, False,  True, False, False,  True], dtype=bool)

要将它们放在一起，可以使用以下命令检查哪些行包含所有三个行：

In [290]: np.all([arr['low'] == 'vhigh', arr['low_1'] == 'vhigh', arr['high'] == 'high'], 0)
Out[290]: array([False, False,  True, False, False,  True], dtype=bool)

由于不再有整型列5more和more，所以可以只生成一个普通的字符串数组：

In [293]: b = np.column_stack([arr['low'], arr['low_1'], arr['high']])

In [294]: b
Out[294]: 
array([['vhigh', 'vhigh', 'low'],
       ['vhigh', 'vhigh', 'med'],
       ['vhigh', 'vhigh', 'high'],
       ['vhigh', 'vhigh', 'low'],
       ['vhigh', 'vhigh', 'med'],
       ['vhigh', 'vhigh', 'high']], 
      dtype='|S5')

In [295]: np.char.endswith(b, 'high')
Out[295]: 
array([[ True,  True, False],
       [ True,  True, False],
       [ True,  True,  True],
       [ True,  True, False],
       [ True,  True, False],
       [ True,  True,  True]], dtype=bool)

In [297]: np.all(np.char.endswith(b, 'high'), 1)
Out[297]: array([False, False,  True, False, False,  True], dtype=bool)

相关问题更多 >

编程相关推荐

热门问题

热门文章