如何使用Python删除非ASCII字符但保留句点和空格？

def onlyascii(char): if ord(char) < 48 or ord(char) > 127: return '' else: return char def get_my_string(file_path): f=open(file_path,'r') data=f.read() f.close() filtered_data=filter(onlyascii, data) filtered_data = filtered_data.lower() return filtered_data

3条回答

网友

1楼 · 编辑于 2024-05-13 18:33:46

根据@artfulrobot的说法，这应该比filter和lambda更快：

re.sub(r'[^\x00-\x7f]',r'', your-non-ascii-string)

网友

2楼 · 编辑于 2024-05-13 18:33:46

使用encode（）或decode（）是更改为其他编解码器的简单方法。在您的示例中，您希望转换为ASCII并忽略所有不受支持的符号。例如，瑞典语字母å不是ASCII字符：

    >>>s = u'Good bye in Swedish is Hej d\xe5'
    >>>s = s.encode('ascii',errors='ignore')
    >>>print s
    Good bye in Swedish is Hej d

编辑：

Python3:str->；字节->；str

>>>"Hej då".encode("ascii", errors="ignore").decode()
'hej d'

Python2:unicode->；str->；unicode

>>> u"hej då".encode("ascii", errors="ignore").decode()
u'hej d'

Python2:str->；unicode->；str（按相反顺序解码和编码）

>>> "hej d\xe5".decode("ascii", errors="ignore").encode()
'hej d'

网友

3楼 · 编辑于 2024-05-13 18:33:46

可以使用string.printable筛选字符串中所有不可打印的字符，如下所示：

>>> s = "some\x00string. with\x15 funny characters"
>>> import string
>>> printable = set(string.printable)
>>> filter(lambda x: x in printable, s)
'somestring. with funny characters'

string.printable在我的机器上包含：

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c

编辑：在Python 3上，filter将返回一个iterable。获取字符串的正确方法是：

''.join(filter(lambda x: x in printable, s))

相关问题更多 >

编程相关推荐

热门问题

热门文章