为什么我会得到负温度值?
我在写代码的时候遇到一个问题。我想要获取基辅市的平均气温,但我得到的却是所有季节的负温度值,我不知道为什么。把华氏度转换为摄氏度的过程没有问题。
这是我的代码:
# Selecting temperature data from Kiev
kiev_df = df[df["City"] == "Kiev"].copy()
# Converting fahr to celsius
kiev_df = fahr_to_celsius(kiev_df)
# Converting "dt" column to datetime format
kiev_df.loc[:, "dt"] = pd.to_datetime(kiev_df["dt"], format="%Y%m%d")
def get_season(month):
"""
Dividing months into seasons.
Parameters: month (int): Month number (1 for January, 2 for Februrary, etc.)
Returns:
str: The season corresponding to input months.
"""
if month in [12, 1, 2]:
return "Winter"
elif month in [3, 4, 5]:
return "Spring"
elif month in [6, 7, 8]:
return "Summer"
else:
return "Autumn"
# Map get_season function to the month of each date in "dt" column
kiev_df.loc[:, "Season"] = kiev_df["dt"].dt.month.map(get_season)
# Group data by year and season, calculate the mean of temp data
seasonal_avg = kiev_df.groupby([kiev_df["dt"].dt.year, "Season"]).agg({
"AverageTemperature": "mean" ,
"Tuncertainty": "mean"
}).reset_index()
seasonal_avg.columns = ["Year", "Season", "AvgTemperature", "AvgUncertainty"]
print(seasonal_avg)
我试着计算所有季节的平均气温,但得到的都是负值,这显然是不对的。
下面是转换的代码:
创建一个函数,将华氏度转换为摄氏度
def fahr_to_celsius(df):
"""
Converts Fahrenheit temperature to Celsius (excluding the "Tuncertainty" column)
Parameters:
- df (DataFrame): DataFrame containing temperature data.
Returns:
- DataFrame: A modified DataFrame with temperature columns converted from Fahrenheit to Celsius values.
"""
# Converting these columns from fahr to celsius
celsius_conv = ["TMAX", "TMIN", "AverageTemperature"]
for col in celsius_conv:
df[col] = (df[col] - 32) / 1.8
return df
这是数据的一个示例:
dt AverageTemperature Tuncertainty City Country TMAX TMIN
114929 17440401 49.676 4.4964 Kiev Ukraine 54.1724 45.1796
114930 17440501 55.6556 3.321 Kiev Ukraine 58.9766 52.3346
114931 17440601 63.3074 3.0654 Kiev Ukraine 66.3728 60.242
114932 17440701 66.9002 2.8656 Kiev Ukraine 69.7658 64.0346
114933 17440801 -9999 2.9466 Kiev Ukraine 68.28430418 63.60738403
1 个回答
2
这里有一个和你的数据类似的例子:
s = pd.Series([49.0, 55.5, 63.0, 66.9, -999])
你觉得像这样的数据,s
的mean()
(平均值)会是什么呢?当然是负数,因为-999
的绝对值比其他值大得多,所以它会对平均值产生很大的影响。
在进行所有ETL(提取、转换、加载)任务时,你需要清理数据,以便从中提取有意义的信息。手动检查你的数据,决定哪些值不应该被考虑,然后在进行统计分析之前把它们过滤掉。
你觉得温度-999
在乌克兰是一个合理的温度值吗?无论是华氏度还是摄氏度?假设热力学定律没有被打破,我认为可以安全地把这些值过滤掉。
>>> s = pd.Series([49.0, 55.5, 63.0, 66.9, -999])
>>> s.mean()
-152.92000000000002
>>> s[s > -999].mean()
58.6