CategorizedPlaintextCorpusReader:如何使用regex指定类别？'NoneType“object”没有属性“group”

2024-04-25 09:58:24 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试创建一个CategorizedPlaintextCorpusReader，它有两个类别：neg和pos。这些类别在文件名中是“_neg”和“u pos”。示例：

bda_TD_2520_HD_001.pdf_neg.txt
info_Ei650_de.pdf_pos

我的代码：

^{pr2}$

我得到了一个错误：

AttributeError: 'NoneType' object has no attribute 'group'

我做错什么了？在

编辑

我改变了它，现在原来的错误不再发生了。但我不确定它是否有效，因为我没有得到任何结果：

len(reader.categories()) # nothing

for cat in reader.categories():
    print (cat) # nothing

reader.fileids("neg") # ValueError: Category neg not found

Tags： pos 示例 pdf 文件名错误类别 reader cat

1条回答

网友

1楼 · 发布于 2024-04-25 09:58:24

您需要查看仅包含pos或neg的文件：

CategorizedPlaintextCorpusReader('C:/users/s/desktop/corpus/', 
                                 r'.*?_(neg|pos).*', 
                                 cat_pattern=r'.*?_(neg|pos).*')

其中.*?是对任意字符任意次数的匹配，^{cd4>}是一个匹配neg或pos的捕获组（必须进行捕获才能使类别提取器工作）。在

对我有用。在

CategorizedPlaintextCorpusReader:如何使用regex指定类别？'NoneType“object”没有属性“group”

相关问题更多 >

编程相关推荐

热门问题

热门文章

CategorizedPlaintextCorpusReader:如何使用regex指定类别？'NoneType“object”没有属性“group”

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >