用python将两个词类列表与自己的语料库连接起来

with open('LIWC_words.txt', 'rU') as document: answer = {} for line in document: line = line.split() if not line: #empty line continue answer[line[0]] = line[1:] with open ('LIWC_categories.txt','rU') as document1: categoriesLIWC = {} for line in document1: line = line.strip() if not line: continue key, value = line.split(':') if key.isdigit(): categoriesLIWC[int(key)] = value else: categoriesLIWC[key] = value

2条回答

网友

1楼 · 编辑于 2024-06-16 15:09:38

这里有一种将数据转换成这种格式的方法。在

dic = {}
ref = {}
tempdic = open('dic.txt','r').read().split('\n')
tempref = open('ref.txt','r').read().split('\n')

for line in tempdic:
  if line:
    line = line.split()
    dic[line[0]] = line[1:]
for line in tempref:
  if line:
    line = line.split(':')
    ref[line[0]] = line[1]
#dic = {'word1':[1,2,3], word2:[2,3]...}
#ref = {1:'ref1',2:'ref2',...}
for word in dic:
  for indx in range(len(dic[word])):#for each number after word
    dic[word][indx] = ref[dic[word][indx]]

假设我们从{'apple':[1,2,3]}开始。dic['apple'][0]将解析为1，右边是{}，可能是{}。这将留给我们{'apple' : ['pronoun', 2, 3]，剩下的数字将在下一次迭代中被替换。在

网友

2楼 · 编辑于 2024-06-16 15:09:38

我不知道你到底想创建什么样的结束格式。例如，您可以制作一个字典，其中dict['pronoun']包含document中包含'01'的所有行。在

#for example from this format
dic = {'word1': [1,2,3], 'word2':[3,2]}
ref = {1: 'pronoun', 2: 'I' , 3: 'you'}

out = {}

for word in dic:
  for entry in dic[word]:
    if entry in out:
      out[entry].append(word)
    else:
      out[entry] = []
      out[entry].append(word)

print out
>>>{1: ['word1'], 2: ['word1', 'word2'], 3: ['word1', 'word2']}

或者，您可以将document中的数字替换为document1中的条目。在

^{pr2}$

否则你有没有想过要建立一个数据库？在

相关问题更多 >

编程相关推荐

热门问题

热门文章