Python（或numpy）中R的match等效函数

19 投票

3 回答

9856 浏览

提问于 2025-04-16 06:38

在Python中，有没有简单的方法可以实现R语言中的match函数的功能？

R语言中的match函数的作用是返回第一个参数在第二个参数中匹配的位置的一个向量。

比如，下面这个R语言的代码片段。

> a <- c(5,4,3,2,1)
> b <- c(2,3)
> match(a,b)
[1] NA NA  2  1 NA

我想把这个转换成Python，想要的是一个能做到以下功能的函数。

>>> a = [5,4,3,2,1]
>>> b = [2,3]
>>> match(a,b)
[None, None, 2, 1, None]

谢谢！

编程语言数组处理数据匹配向量操作函数转换

3 个回答

你可以在Python中实现R语言的匹配功能，并将匹配到的索引作为数据框的索引返回（这对后续的子集操作很有用），方法如下：

import numpy as np
import pandas as pd
def match(ser1, ser2):
"""
return index of ser2 matching elements of ser1(or return np.nan)
equivalent to match function of R
"""
idx=[ser2.index[ser2==ser1[i]].to_list()[0] if ser1.isin(ser2)[i] == True else np.nan for i in range(len(ser1))]
return (pd.Index(idx))

回答于 2025-04-16 由 Python大师

分享举报

这里有一种更快的方法，基于Paulo Scardine的回答（当数组的大小增加时，差异变得更明显）。如果你不介意失去一行代码的简洁性：

from typing import Hashable, List


def match_list(a: List[Hashable], b: List[Hashable]) -> List[int]:
    return [b.index(x) if x in b else None for x in a]


def match(a: List[Hashable], b: List[Hashable]) -> List[int]:
    b_dict = {x: i for i, x in enumerate(b)}
    return [b_dict.get(x, None) for x in a]


import random

a = [random.randint(0, 100) for _ in range(10000)]
b = [i for i in range(100) if i % 2 == 0]


%timeit match(a, b)
>>> 580 µs ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit match_list(a, b)
>>> 6.13 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

match(a, b) == match_list(a, b)
>>> True

回答于 2025-04-16 由 Python大师

分享举报

>>> a = [5,4,3,2,1]
>>> b = [2,3]
>>> [ b.index(x) if x in b else None for x in a ]
[None, None, 1, 0, None]

如果你真的需要从1开始计数，而不是从0开始，就加1。

>>> [ b.index(x)+1 if x in b else None for x in a ]
[None, None, 2, 1, None]

如果你打算经常使用这段代码，可以把它写成一个可以重复使用的简短代码。

>>> match = lambda a, b: [ b.index(x)+1 if x in b else None for x in a ]
>>> match
<function <lambda> at 0x04E77B70>
>>> match(a, b)
[None, None, 2, 1, None]

回答于 2025-04-16 由 Python大师

分享举报

Python（或numpy）中R的match等效函数

3 个回答

撰写回答