在python raw inpu中使用正则表达式

2024-06-17 10:39:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试创建一个脚本,允许用户输入许多正则表达式,这些表达式将通过输入文件并检索匹配项。我目前使用的是ahocorasick,但当我尝试输入regexed模式时遇到问题。在

我在第二个原始输入(colour_regex)中输入正则表达式,但收到以下错误:

Traceback (most recent call last):
  File "PLA_Enrichment_options.py", line 189, in <module>
    main()
  File "PLA_Enrichment_options.py", line 41, in main
    tree.add(regex)
  File "build/bdist.linux-x86_64/egg/ahocorasick/__init__.py", line 29, in add

TypeError: argument 1 must be string or read-only buffer, not _sre.SRE_Pattern

file_name = raw_input("What is the filename you wish to enhance? ")
enhanced_name = file_name.replace(".csv", "")

# User regexed input
tree = ahocorasick.KeywordTree()
print ("What regex would you like to use for colour? (Enter 'exit' to move on) ")
colour_regex = raw_input()
regex = re.compile(colour_regex)
while colour_regex != "exit":
    tree.add(regex)
tree.make()

print 'Finding colour matches...'
output = open(enhanced_name + '-colour.csv', 'w')
file = open(feed_name, 'r')
for line in iter(file):
    id, title, desc, link, image = line.strip('\n').split('\t')
    offerString = '|'.join([title.lower(), desc.lower(), link.lower()])
    keywords = set()
    for match in tree.findall_long(offerString): # find colours
        indices = list(match)
        keyword = offerString[indices[0]:indices[1]]
        if re.search(r'(?<![âêîôûäëïöüàèìòùáéíóú])\b%s\b(?![âêîôûäëïöüàèìòùáéíóú])' %(keyword), offerString):
            keywords.add(keyword)                                     
    if keywords:
        output.write('\t'.join([id, '|'.join(keywords), desc, link, image])+'\n')
    else:
        output.write('\t'.join([id, title, desc, link, image])+'\n')
file.close()
output.close()

任何对正确方向的帮助/指导都会很好。在

谢谢


Tags: nameinaddtreeoutputlinelinkdesc
1条回答
网友
1楼 · 发布于 2024-06-17 10:39:50
tree = ahocorasick.KeywordTree()
regex = re.compile(colour_regex)
tree.add(regex)

您将错误的类型传递给ahocorasick.KeywordTree.add()

regex是已编译的正则表达式对象。类型是_sre.SRE_Pattern。如果使用原始字符串,则不会出现此错误。在

^{pr2}$

此外,这将导致一个无限循环。我认为您希望if而不是while,或者将colour_regex = raw_input()放入循环中。在

while colour_regex != "exit":

相关问题 更多 >