Python中如何根据正则表达式规则对字符串进行分类

2条回答

网友

1楼 · 编辑于 2024-04-20 02:12:27

没有任何多余的绒毛：

categories = [
  ('cat1', ['foo']),
  ('cat2', ['football']),
  ('cat3', ['abc', 'aba', 'bca'])
]

def classify(text):
  for category, matches in categories:
    if any(match in text for match in matches):
      return category
  return None

在Python中，可以使用in运算符来测试字符串的子集。您可以添加一些东西，如isinstance(match, str)，以检查您使用的是简单字符串还是正则表达式对象。它有多先进取决于你。在

网友

2楼 · 编辑于 2024-04-20 02:12:27

pseudo python中的这个解决方案怎么样：

def classify(journaltext):
    prio_list = ["FOO", "BAR", "UPS", ...] # "..." is a placeholder: you have to give the full list here.
    # dictionary: 
    # - key is the name of the category, must match the name in the above prio_list
    # - value is the regex that identifies the category
    matchers = {"FOO": "the regex for FOO", "BAR": "the regex for BAR", "UPS":"...", ...}
    for category in prio_list:
        if re.match(matchers[category], journaltext):
            return category
    return "UNKOWN" # or you can "return None"

特点：

这有一个prio\u列表，它是按降序排列的所有类别。在
它尝试按列表的顺序匹配。在
它与matchers字典中的正则表达式匹配。所以类别名称可以是任意的。在
函数返回类别的名称
如果没有匹配项，则获取占位符类别名称。在

您甚至可以从配置文件中读取优先类别列表和regex，但这是留给读者的练习。。。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python中如何根据正则表达式规则对字符串进行分类

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >