Pandas系列应用程序不是由字符串组成的

りんごリンゴりんご名詞-一般をヲを助詞-格助詞-一般食べタベ食べる動詞-自立一段連用形ましマシます助動詞特殊・マス連用形たタた助動詞特殊・タ基本形、、、記号-読点そしてソシテそして接続詞、、、記号-読点みかんミカンみかん名詞-一般もモも助詞-係助詞食べタベ食べる動詞-自立一段連用形ましマシます助動詞特殊・マス連用形たタた助動詞特殊・タ基本形 EOS --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-174-81a0d5d62dc4> in <module>() 32 aa = extractKeyword(text) #working!! 33 ---> 34 me = df.apply(lambda x: extractKeyword(x)) ~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds) 4260 f, axis, 4261 reduce=reduce, -> 4262 ignore_failures=ignore_failures) 4263 else: 4264 return self._apply_broadcast(f, axis) ~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce) 4356 try: 4357 for i, v in enumerate(series_gen): -> 4358 results[i] = func(v) 4359 keys.append(v.name) 4360 except Exception as e: <ipython-input-174-81a0d5d62dc4> in <lambda>(x) 32 aa = extractKeyword(text) #working!! 33 ---> 34 me = df.apply(lambda x: extractKeyword(x)) <ipython-input-174-81a0d5d62dc4> in extractKeyword(text) 20 """Morphological analysis of text and returning a list of only nouns""" 21 tagger = MeCab.Tagger('-Ochasen') ---> 22 node = tagger.parseToNode(text) 23 keywords = [] 24 while node: ~/anaconda3/lib/python3.6/site-packages/MeCab.py in parseToNode(self, *args) 280 __repr__ = _swig_repr 281 def parse(self, *args): return _MeCab.Tagger_parse(self, *args) --> 282 def parseToNode(self, *args): return _MeCab.Tagger_parseToNode(self, *args) 283 def parseNBest(self, *args): return _MeCab.Tagger_parseNBest(self, *args) 284 def parseNBestInit(self, *args): return _MeCab.Tagger_parseNBestInit(self, *args) TypeError: ("in method 'Tagger_parseToNode', argument 2 of type 'char const *'", 'occurred at index 0')w

2条回答

网友

1楼 · 编辑于 2024-05-19 02:27:45

我知道你在日语StackOverflow上得到了一些帮助，但是这里有一个英语答案：

你的第一件事就是解决这个问题示例.csv作为标题。要解决这个问题，请在read_csv中使用names参数。在

接下来，df.apply将默认地对dataframe的列应用函数。您需要做一些类似df.apply(lambda x: extractKeyword(x['String']), axis=1)的操作，但这行不通，因为每个句子都有不同数量的名词，Pandas会抱怨它不能在1x5数组上堆叠1x2数组。最简单的方法是在String的序列上apply。在

最后一个问题是，MeCab Python3绑定中有一个bug：请参见https://github.com/SamuraiT/mecab-python3/issues/3您通过运行两次parseToNode找到了一个解决方法，您还可以在parseToNode之前调用parse。在

把这三件事放在一起：

import pandas as pd
import MeCab  
df = pd.read_csv('sample.csv', encoding='utf-8', names=['Number', 'String'])

def extractKeyword(text):
    """Morphological analysis of text and returning a list of only nouns"""
    tagger = MeCab.Tagger('-Ochasen')
    tagger.parse(text)
    node = tagger.parseToNode(text)
    keywords = []
    while node:
        if node.feature.split(",")[0] == u"名詞": # this means noun
            keywords.append(node.surface)
        node = node.next
    return keywords

me = df['String'].apply(extractKeyword)
print(me)

运行此脚本时，使用示例.csv您提供：

^{pr2}$

网友

2楼 · 编辑于 2024-05-19 02:27:45

每次parseToNode失败，所以需要把这个代码

 tagger.parseToNode('dummy')

以前

^{pr2}$

而且成功了！在

但我不知道原因，也许parseToNode方法有bug。。在

def extractKeyword(text):
    """Morphological analysis of text and returning a list of only nouns"""
   tagger = MeCab.Tagger('-Ochasen')
   tagger.parseToNode('ダミー') 
   node = tagger.parseToNode(text)
   keywords = []
   while node:
       if node.feature.split(",")[0] == u"名詞": # this means noun
           keywords.append(node.surface)
       node = node.next
   return keywords

相关问题更多 >

编程相关推荐

热门问题

热门文章