AttributeError运行小写字母。翻译&字符串。标点符号

2024-05-16 13:54:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我在运行小写时得到AttributeError;包含评论的数据框上的string.标点符号。导入的数据很难看。收到的错误为AttributeError: 'DataFrame' object has no attribute 'translate'。完整错误如下

我在评论中尝试了不同的验证

# cleaned_text = lower_case.translate(str.maketrans(string.punctuation, ' '*len(string.punctuation)))
# cleaned_text = lower_case.translator = str.maketrans('', '', string.punctuation)

cleaned_text = lower_case.translate(str.maketrans('', '', string.punctuation))

我也尝试了这个SO post,并在上面添加了一个fillna,希望能够修复它。

#checking for nulls if present any
print("Number of rows with null values:")
print(lower_case.isnull().sum().sum())

lower_case.fillna("")

a[small sample excel][2]用于数据帧https://github.com/taylorjohn/Simple_RecSys/blob/master/sample-data.xlsx

代码

import string
from collections import Counter
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from nltk.corpus import stopwords
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize

# data is in excel formatted ugly and unclean  columns are Artist Names rows are reviews for said Artist
df = pd.read_excel('sample-data.xlsx',encoding='utf8', errors='ignore')

lower_case = df.apply(lambda x: x.astype(str).str.lower())

#checking for nulls if present any
print("Number of rows with null values:")
print(lower_case.isnull().sum().sum())

lower_case.fillna("")


#cleaned_text = lower_case.translate(str.maketrans(string.punctuation, ' '*len(string.punctuation)))
# cleaned_text = lower_case.translator = str.maketrans('', '', string.punctuation)

cleaned_text = lower_case.translate(str.maketrans('', '', string.punctuation))

收到的错误为

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-78-9f23b8a5e8e0> in <module>
      2 # cleaned_text = lower_case.translator = str.maketrans('', '', string.punctuation)
      3 
----> 4 cleaned_text = lower_case.translate(str.maketrans('', '', string.punctuation))

~\anaconda3\envs\nlp_course\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5272             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5273                 return self[name]
-> 5274             return object.__getattribute__(self, name)
   5275 
   5276     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'translate'

Tags: textnamefromimportselfstringlowertranslate
1条回答
网友
1楼 · 发布于 2024-05-16 13:54:12

熊猫数据帧没有.translate()方法,但是Python字符串有。例如:

import string

my_str = "hello world!"                                                                                                                                                                            
my_str.translate(str.maketrans('', '', string.punctuation)) 

如果要将该转换应用于数据帧行中的每个列值,可以在该列上使用.map().map()方法采用一个接受列值作为参数的函数,您可以返回转换后的值:

def remove_punctuation(value):
    return value.translate(str.maketrans('', '', string.punctuation))

df["my_cleaned_column"] = df["my_dirty_column"].map(remove_punctuation)

也可以使用lambda函数,而不是定义新函数:

df["my_cleaned_column"] = df["my_dirty_column"].map(
    lambda x: x.translate(str.maketrans('', '', string.punctuation))
)

如果有许多列需要应用此功能,可以执行以下操作:

for column_name in df.columns:
    df[column_name] = df[column_name].map(
        lambda x: x.translate(str.maketrans('', '', string.punctuation))
    )

相关问题 更多 >