如何在数据框的两列值之间进行查询

df.head() ref_loc ref_chr REF ALT coverage base 9532728 21 G [A] 1 A 9540473 21 C [G] 2 G 9540473 21 CTATT [C] 2 G 9540794 21 C [T] 1 A 9542965 21 C [A] 1 T

def grouping(row): if row['ALT'] == row['base']: val = "same_variants" elif row['ALT'] != row['base']: val = "diff_variants" return val df["cate"] = df.apply(grouping,axis=0)

KeyError Traceback (most recent call last) <ipython-input-13-a265dee72ec1> in <module> ----> 1 df["group"] =df.apply(grouping,axis=0) ~/software/anaconda/lib/python3.7/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds) 6911 kwds=kwds, 6912 ) -> 6913 return op.get_result() 6914 6915 def applymap(self, func): ~/software/anaconda/lib/python3.7/site-packages/pandas/core/apply.py in get_result(self) 184 return self.apply_raw() 185 --> 186 return self.apply_standard() 187 188 def apply_empty_result(self): ~/software/anaconda/lib/python3.7/site-packages/pandas/core/apply.py in apply_standard(self) 290 291 # compute the result using the series generator --> 292 self.apply_series_generator() 293 294 # wrap results ~/software/anaconda/lib/python3.7/site-packages/pandas/core/apply.py in apply_series_generator(self) 319 try: 320 for i, v in enumerate(series_gen): --> 321 results[i] = self.f(v) 322 keys.append(v.name) 323 except Exception as e: <ipython-input-11-098066170c2f> in grouping(row) 1 def grouping(row): ----> 2 if row['ALT'] == row['base']: 3 val = "same_variants" 4 elif row['ALT'] != row['base']: 5 val= "diff_variants" ~/software/anaconda/lib/python3.7/site-packages/pandas/core/series.py in __getitem__(self, key) 1066 key = com.apply_if_callable(key, self) 1067 try: -> 1068 result = self.index.get_value(self, key) 1069 1070 if not is_scalar(result): ~/software/anaconda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_value(self, series, key) 4728 k = self._convert_scalar_indexer(k, kind="getitem") 4729 try: -> 4730 return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None)) 4731 except KeyError as e1: 4732 if len(self) > 0 and (self.holds_integer() or self.is_boolean()): pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value() pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value() pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc() pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type() KeyError: ('ALT', 'occurred at index ref_loc')

ref_loc ref_chr REF ALT coverage base cate 9532728 21 G [A] 1 A same_variants 9540473 21 C [G] 2 G same_variants 9540473 21 CTATT [C] 2 G diff_variants 9540794 21 C [T] 1 A diff_variants 9542965 21 C [A] 1 T diff_variants

3条回答

网友

1楼 · 编辑于 2024-05-15 23:06:01

您需要将函数应用于每一行：

df["cate"] = df.apply(grouping, axis=1)

如果我理解正确，ALT列包含列表。因此，您需要访问每个列表的第一个元素：

def grouping(row):
    if row['ALT'][0] == row['base']:
         return "same_variants"
    else:
         return "diff_variants"

或者，可以使用numpy函数where：

df['cate'] = np.where(df['ALT'].str[0]==df['base'], 'same_variants', 'diff_variants')

网友

2楼 · 编辑于 2024-05-15 23:06:01

尽管这是一种不同的方法，但我认为值得一提的是：您可以使用以下一行代码来实现：

df['cate'] = np.where(df['ALT'] == '['+df['base']+']', 'same_variants', 'diff_variants')

我尝试在比较的右侧使用format，但没有效果

网友

3楼 · 编辑于 2024-05-15 23:06:01

注意，因为在ALT列周围有方括号，所以它总是不同的。您可以首先提取括号内的内容：
df["ALT"] = df.ALT.apply(lambda l: l[0])

您需要使用axis=1对行进行迭代axis=0遍历列

df["cate"] = df.apply(grouping,axis=1)
print(df)

   ref_loc  ref_chr    REF ALT  coverage base           cate
0  9532728       21      G   A         1    A  same_variants
1  9540473       21      C   G         2    G  same_variants
2  9540473       21  CTATT   C         2    G  diff_variants
3  9540794       21      C   T         1    A  diff_variants
4  9542965       21      C   A         1    T  diff_variants

相关问题更多 >

编程相关推荐

热门问题

热门文章