擅长:python、mysql、java
<p>就像@Desmond Lua answer一样,但是有不同的标记化函数:</p>
<pre>
def tokenize(word):
token=[]
words = word.split(' ')
for word in words:
for i in range(len(word)):
if i==0: continue
w = word[i]
if i==1:
token+=[word[0]+w]
continue
token+=[token[-1:][0]+w]
return ",".join(token)
</pre>
<p>它将把<code>hello world</code>解析为<code>he,hel,hell,hello,wo,wor,worl,world</code>。</p>
<p>它有助于光自动完成</p>