Pandas read_sql_查询对某些列中的所有值返回None

from sqlalchemy import create_engine import pandas as pd import math querystr = "SELECT * FROM dbname.mytable" engine = create_engine('mysql+pymysql://username:password@localhost/' + "dbname") df = pd.read_sql_query(querystr, engine) df.head() sys dias pef fer 0 NaN NaN None None 1 159.0 92.666 None None 2 NaN NaN None None 3 NaN NaN None None 4 102.0 63.333 None None

2条回答

网友

1楼 · 编辑于 2024-05-29 11:17:27

这个问题是一个开放的问题，解释如下：这里：https://github.com/pandas-dev/pandas/issues/14314

read_sql_query just gets result sets back, without any column type information. If you use the read_sql_table functions, there it uses the column type information through SQLAlchemy.

read_sql_query似乎只检查列中返回的前3个值，以确定列的类型。因此，如果前3个值为NULL，则无法确定列的类型，因此返回None。在

因此，部分解决方法是使用read_sql_table。我将代码改为使用read_sql_table，它会按预期返回NaN值，即使对于所有NULL列也是如此。但我需要在我的应用程序中使用真正的u。因此，我现在在返回结果后立即用NaN替换任何None值：

df.replace([None], np.nan, inplace=True)

网友

2楼 · 编辑于 2024-05-29 11:17:27

我尝试使用read_sql_表，但它不能解决我的问题。另外，我发现被接受的答案实际上会产生其他问题。在

对于我的数据，只有pandas认为是对象的列才有“None”而不是NaN。对于datetime，missings是NaT；对于float，missings是NaN。在

read_sql_table不适用于我，并返回与read_sql相同的问题。所以我试着接受了答案然后跑了df.更换（[无]，np.nan公司，in place=真）。这实际上将我所有缺少数据的日期时间对象更改为对象数据类型。所以现在我必须把它们改回最新的时间，这可能会根据数据的大小而增加负担。在

相反，我建议您首先在df中标识object dtype字段，然后替换None：

obj_columns = list(df.select_dtypes(include=['object']).columns.values)
df[obj_columns] = df[obj_columns].replace([None], np.nan)

相关问题更多 >

编程相关推荐

热门问题

热门文章