根据单元格值中的列表检索数据帧行

2024-05-17 00:01:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从单元格值为列表的数据帧中检索一行。我试过isin,但看起来像是在执行OR操作,而不是AND操作。你知道吗

>>> import pandas as pd
>>> df = pd.DataFrame([['100', 'RB','stacked'], [['101','102'], 'CC','tagged'], ['102', 'S+C','tagged']],
    columns=['vlan_id', 'mode' ,    'tag_mode'],index=['dinesh','vj','mani'])

>>> df
           vlan_id  mode  tag_mode
dinesh         100   RB  stacked
vj      [101, 102]   CC   tagged
mani           102  S+C   tagged

>>> df.loc[df['vlan_id'] == '102']; # Fetching string value match
      vlan_id mode tag_mode
mani     102  S+C   tagged

>>> df.loc[df['vlan_id'].isin(['100','102'])]; # Fetching if contains either 100 or 102

       vlan_id mode tag_mode
dinesh     100   RB  stacked
mani       102  S+C   tagged
>>> df.loc[df['vlan_id'] == ['101','102']]; # Fails ? 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\pandas\core\ops.py", line 1283, in wrapper
    res = na_op(values, other)
  File "C:\Python27\lib\site-packages\pandas\core\ops.py", line 1143, in na_op
    result = _comp_method_OBJECT_ARRAY(op, x, y)
  File "C:\Python27\lib\site-packages\pandas\core\ops.py", line 1120, in _comp_method_OBJECT_ARRAY
    result = libops.vec_compare(x, y, op)
  File "pandas\_libs\ops.pyx", line 128, in pandas._libs.ops.vec_compare
ValueError: Arrays were different lengths: 3 vs 2

我可以将这些值放到一个列表中并进行比较。相反,有没有任何方法可以让我们使用.loc方法本身对照列表值检查它?你知道吗


Tags: inidpandasdf列表modetagline
3条回答

要查找列表,可以迭代vlan_id的值并使用np.array_equal比较每个值:

df.loc[[np.array_equal(x, ['101','102']) for x in df.vlan_id.values]]


     vlan_id    mode    tag_mode
vj  [101, 102]  CC       tagged

不过,建议避免在数据帧中使用列表作为单元格值。你知道吗

DataFrame.loc可以使用标签列表或布尔数组来访问行和列。上面的列表构造了一个布尔数组。你知道吗

我不确定这是不是最好的方法,或者是否有一个好的方法,因为据我所知pandas并不支持在Series中存储lists。仍然:

l = ['101', '102']

df.loc[pd.concat([df['vlan_id'].str[i] == l[i] for i in range(len(l))], axis=1).all(axis=1)]

输出:

       vlan_id mode tag_mode
vj  [101, 102]   CC   tagged

另一种解决方法是转换vlan_id列,以便可以将其作为字符串进行查询。可以通过将vlan_id列表值连接到逗号分隔的字符串中来实现这一点。你知道吗

df['proxy'] = df['vlan_id'].apply(lambda x: ','.join(x) if type(x) is list else ','.join([x]) )

l = ','.join(['101', '102'])
print(df.loc[df['proxy'] == l])

相关问题 更多 >