我在Python中设置了一个任务,为长文本文件1-26编码字母表中的字母,为非字母数字编码26+,参见下面的代码:
#open the file,read the contents and print out normally
my_file = open("timemachine.txt")
my_text = my_file.read()
print (my_text)
print ""
print ""
#open the file and read each line, taking out the eol chars
with open("timemachine.txt","r") as myfile:
clean_text = "".join(line.rstrip() for line in myfile)
#close the file to prevent memory hogging
my_file.close()
#print out the result all in lower case
clean_text_lower = clean_text.lower()
print clean_text_lower
print ""
print ""
#establish a lowercase alphabet as a list
my_alphabet_list = []
my_alphabet = """ abcdefghijklmnopqrstuvwxyz.,;:-_?!'"()[] %/1234567890"""+"\n"+"\xef"+"\xbb"+"\xbf"
#find the index for each lowercase letter or non-alphanumeric
for letter in my_alphabet:
my_alphabet_list.append(letter)
print my_alphabet_list,
print my_alphabet_list.index
print ""
print ""
#go through the text and find the corresponding letter of the alphabet
for letter in clean_text_lower:
posn = my_alphabet_list.index(letter)
print posn,
当我打印这个我应该得到(1)原始文本,(2)文本减少到小写,没有空格,(3)使用的代码索引,最后(4)转换代码。然而,我只能得到原文的后半部分,或者如果我注释掉(4),它将打印所有的文本。为什么
结尾的位:
一直在重新分配
posn
,而实际上什么都不做。因此,您将只获得干净文本中最后一个字母的my_alphabet_list.index(letter)
为了解决这个问题,你可以做一些事情。首先想到的是初始化列表并将值附加到其中,即:
相关问题 更多 >
编程相关推荐