擅长:python、mysql、java
<p><code>chr()</code>是生成Unicode代码点的方法:</p>
<pre class="lang-py prettyprint-override"><code>def preprocessing(content):
import re
for d in range(10):
arabic_digit = chr(0x660 + d)
persian_digit = chr(0x6f0 + d)
content = re.sub(arabic_digit, persian_digit, content)
return content
</code></pre>
<p>但是,<code>str</code>有一个内置的<code>.translate</code>函数,用于进行更有效的大规模替换。给出要替换的字符列表和相同长度的新字符列表:</p>
<pre class="lang-py prettyprint-override"><code>arabic_digits = ''.join([chr(i) for i in range(0x660,0x66a)])
persian_digits = ''.join([chr(i) for i in range(0x6f0,0x6fa)])
print('Arabic: ',arabic_digits)
print('Persian:',persian_digits)
# compute the translation table once
_xlat = str.maketrans(arabic_digits,persian_digits)
def preprocessing(content):
return content.translate(_xlat)
test = '4\u06645\u06656\u0666'
print('before:',test)
print('after: ',preprocessing(test))
</code></pre>
<p>输出:</p>
<pre class="lang-none prettyprint-override"><code>Arabic: ٠١٢٣٤٥٦٧٨٩
Persian: ۰۱۲۳۴۵۶۷۸۹
before: 4٤5٥6٦
after: 4۴5۵6۶
</code></pre>