擅长:python、mysql、java
<p>如果您可以使用pandas先在一个数据帧中读取整个csv文件,它会变得更容易。在</p>
<pre><code>import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder
df = pd.read_csv('sample.csv', index_col=None, skipinitialspace=True)
# Converting the text Term to column index
le = LabelEncoder()
df['column']=le.fit_transform(df['Term'])
# Converting the Doc to row index
df['row']=df['Doc'] - 1
# Rows will be equal to max index of document
num_rows = max(df['row'])+1
# Columns will be equal to number of distinct terms
num_cols = len(le.classes_)
# Initialize the array with all zeroes
tfidf_arr = np.zeros((num_rows, num_cols))
# Iterate the dataframe and set the appropriate values in tfidf_arr
for index, row in df.iterrows():
tfidf_arr[row['row'],row['column']]=row['TFIDF score']
</code></pre>
<p>一定要仔细阅读评论,如果不理解任何东西。在</p>