如何去除csv文件中的NaN值？python

import pandas as pd # Reading the file path_root = 'gdrive/My Drive/Colab Notebooks/MBTI/mbti_datasets.csv' root_fn = path_rooth + 'mbti_datasets.csv' df = pd.read_csv(path_root, sep = ',', quotechar = '"', usecols = [0, 1]) # split the column where there are new lines and turn it into a series serie = df['description'].str.split('\n').apply(pd.Series, 1).stack() # remove the second index for the DataFrame and the series to share indexes serie.index = serie.index.droplevel(1) # give it a name to join it to the DataFrame serie.name = 'description' # remove original column del df['description'] # join the series with the DataFrame, based on the shared index df = df.join(serie) # New file name and writing the new csv file root_new_fn = path_root + 'mbti_new.csv' df.to_csv(root_new_fn, sep = ',', quotechar = '"', encoding = 'utf-8', index = False) new_df = pd.read_csv(root_new_fn) print(new_df)

2 INTJ Existe soledad en la cima y-- siendo # adds -- in blank random blank spaces 3 INTJ -- y las mujeres # adds -- in the beginning 3 INTJ (...) el 0--8-- de la poblaci # doesnt end the word 'población' 10 INTJ icos-- un conflicto que parecer--a imposible. # starts letters randomly 12 INTJ c #adds just 1 letter

2 INTJ There is loneliness at the top and-- being # adds -- in blank spaces 3 INTJ -- and women # adds - in the beginning 3 INTJ (...) on 0--8-- of the popula-- # doesnt end the word 'population' 10 INTJ icos-- a conflict that seems--to impossible. # starts letters randomly 12 INTJ c #adds just 1 letter

print(new_df['descripcion'].isnull()) <class 'float'> 0 False 1 False 2 False 3 False 4 False 5 False 6 False 7 True 8 False 9 True 10 False 11 True continue...

2条回答

网友

1楼 · 编辑于 2024-05-14 10:38:46

问题可归因于描述单元，因为有两个新的连续行的零件，它们之间没有任何内容

我只是使用.dropna()读取创建的新csv，并在没有NaN值的情况下重写它。无论如何，我认为重复这个过程不是最好的方法，但它作为一个解决方案是直接进行的

df.to_csv(root_new_fn, sep = ',', quotechar = '"', encoding = 'utf-8', index = False)
new_df = pd.read_csv(root_new_fn).dropna()

new_df.to_csv(root_new_fn, sep = ',', quotechar = '"', encoding = 'utf-8', index = False)
new_df = pd.read_csv(root_new_fn)

print(type(new_df.iloc[7, 1]))# where was a NaN value
print(new_df['descripcion'].isnull())

<class 'str'>
0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
and continues...

网友

2楼 · 编辑于 2024-05-14 10:38:46

这里有一个方法，我必须找到一个替代\n字符的解决方法，但不知怎么的，它没有以直接的方式工作：

df['DESCRIPTION'] = df['DESCRIPTION'].str.replace('[^a-zA-Z0-9\s.]',' ').str.split(' n')

df = df.explode('DESCRIPTION')

print(df)

           TYPE                               DESCRIPTION
0   a             This personality likes to eat apples...
0   a                           They look like monkeys...
0   a                      In fact  are strong people...
1   b                                       b.description
2   c                                       c.description
3   d                                       d.description

相关问题更多 >

编程相关推荐

热门问题

热门文章