在python datafram中查找范围内的正则表达式

Index Name FINAL_CATEGORY 68781 central board of excise and customs cat b 68782 c a g hotels pvt ltd cat b 68783 avaneetha textiles pvt ltd cat a 68784 trendy wheels pvt ltd cat a+ 68785 wings brand activations india pvt ltd cat b

pincode[pincode['Compnay Name'].str.contains('wings brand activation i pvt ltd')] Compnay Name FINAL_CATEGORY ____________________________________ pincode[pincode['Compnay Name'].str.contains('wings brand activation i pvt')] Compnay Name FINAL_CATEGORY ____________________________________ pincode[pincode['Compnay Name'].str.contains('wings brand activation i')] Compnay Name FINAL_CATEGORY ____________________________________ pincode[pincode['Compnay Name'].str.contains('wings brand activation')] Name FINAL_CATEGORY 68785 wings brand activations india pvt ltd cat b

2条回答

网友

1楼 · 编辑于 2024-06-16 11:54:12

您可以使用如下迭代方法：

def find_substr(employer, pincode):
    employer = employer.set_index("employer")
    for words in employer.index.map(str.split):
        length = len(words)
        found = False
        while length > 0 and not found:
            substr = ' '.join(words[:length]).replace('(', '\(')
            mask = pincode.Name.str.contains(substr)
            if mask.any():
                employer.loc[' '.join(words), 'cat'] = pincode.loc[mask, 'FINAL_CATEGORY'].values[0]
                found = True
            length -= 1
    employer = employer.reset_index()
    return employer

employer = find_substr(employer, pincode)
print(employer)

                                           employer    cat
0                  wings brand activation i pvt ltd  cat b
1  hofincons infotech &industrial services pvt .ltd    NaN
2                     bharat fritz werner bangalore    NaN
3                              kludi rak indpvt ltd    NaN

网友

2楼 · 编辑于 2024-06-16 11:54:12

这里有一个方法。你知道吗

首先将您的pin df转换成一个字典，将字符串映射到相应的类别。然后使用双列表创建雇员数据框的cat列，以记录与其姓名匹配的所有类别：

# Example df
employer = pd.DataFrame({"employer":["wings brand activation i pvt ltd", "bharat fritz werner bangalore"]})
pins = pd.DataFrame({"Name":["trendy wheels pvt ltd", "wings brand activation i pvt ltd"], "FINAL_CATEGORY":["cat a+", "cat b"]}) 

dict_pins = dict(zip(pins['Name'], pins['FINAL_CATEGORY']))
employer['cat'] = [[dict_pins[key] for key in dict_pins.keys() if x in key] for x in employer['employer']]

输出：

                           employer      cat
0  wings brand activation i pvt ltd  [cat b]
1     bharat fritz werner bangalore       []

相关问题更多 >

编程相关推荐

热门问题

热门文章