回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>假设我有一个包含以下列的数据框</p>
<pre><code>df.head()
ref_loc ref_chr REF ALT coverage base
9532728 21 G [A] 1 A
9540473 21 C [G] 2 G
9540473 21 CTATT [C] 2 G
9540794 21 C [T] 1 A
9542965 21 C [A] 1 T
</code></pre>
<p>我想比较<code>ALT</code>列和<code>base</code>列,看看它们的匹配和区别。基于匹配和差异,我想生成一个名为<code>cate</code>的新列</p>
<p>为此,我尝试使用以下函数</p>
<pre><code>def grouping(row):
if row['ALT'] == row['base']:
val = "same_variants"
elif row['ALT'] != row['base']:
val = "diff_variants"
return val
df["cate"] = df.apply(grouping,axis=0)
</code></pre>
<p>但是,尝试应用于数据帧的函数会引发此错误</p>
<pre><code> KeyError Traceback (most recent call last)
<ipython-input-13-a265dee72ec1> in <module>
----> 1 df["group"] =df.apply(grouping,axis=0)
~/software/anaconda/lib/python3.7/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)
6911 kwds=kwds,
6912 )
-> 6913 return op.get_result()
6914
6915 def applymap(self, func):
~/software/anaconda/lib/python3.7/site-packages/pandas/core/apply.py in get_result(self)
184 return self.apply_raw()
185
--> 186 return self.apply_standard()
187
188 def apply_empty_result(self):
~/software/anaconda/lib/python3.7/site-packages/pandas/core/apply.py in apply_standard(self)
290
291 # compute the result using the series generator
--> 292 self.apply_series_generator()
293
294 # wrap results
~/software/anaconda/lib/python3.7/site-packages/pandas/core/apply.py in apply_series_generator(self)
319 try:
320 for i, v in enumerate(series_gen):
--> 321 results[i] = self.f(v)
322 keys.append(v.name)
323 except Exception as e:
<ipython-input-11-098066170c2f> in grouping(row)
1 def grouping(row):
----> 2 if row['ALT'] == row['base']:
3 val = "same_variants"
4 elif row['ALT'] != row['base']:
5 val= "diff_variants"
~/software/anaconda/lib/python3.7/site-packages/pandas/core/series.py in __getitem__(self, key)
1066 key = com.apply_if_callable(key, self)
1067 try:
-> 1068 result = self.index.get_value(self, key)
1069
1070 if not is_scalar(result):
~/software/anaconda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_value(self, series, key)
4728 k = self._convert_scalar_indexer(k, kind="getitem")
4729 try:
-> 4730 return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
4731 except KeyError as e1:
4732 if len(self) > 0 and (self.holds_integer() or self.is_boolean()):
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: ('ALT', 'occurred at index ref_loc')
</code></pre>
<p>我想提出一些我可以继续前进的建议</p>
<p>最后,输出应该如下所示</p>
<pre><code>ref_loc ref_chr REF ALT coverage base cate
9532728 21 G [A] 1 A same_variants
9540473 21 C [G] 2 G same_variants
9540473 21 CTATT [C] 2 G diff_variants
9540794 21 C [T] 1 A diff_variants
9542965 21 C [A] 1 T diff_variants
</code></pre>