为什么我会得到负温度值?

-1 投票
1 回答
74 浏览
提问于 2025-04-14 16:15

我在写代码的时候遇到一个问题。我想要获取基辅市的平均气温,但我得到的却是所有季节的负温度值,我不知道为什么。把华氏度转换为摄氏度的过程没有问题。

这是我的代码:

# Selecting temperature data from Kiev

kiev_df = df[df["City"] == "Kiev"].copy()

# Converting fahr to celsius
kiev_df = fahr_to_celsius(kiev_df)

# Converting "dt" column to datetime format
kiev_df.loc[:, "dt"] = pd.to_datetime(kiev_df["dt"], format="%Y%m%d")

def get_season(month):
    """
    
    Dividing months into seasons.
    
    Parameters: month (int): Month number (1 for January, 2 for Februrary, etc.)
    
    Returns: 
    str: The season corresponding to input months.
    
    """
    if month in [12, 1, 2]:
        return "Winter"
    elif month in [3, 4, 5]:
        return "Spring"
    elif month in [6, 7, 8]:
        return "Summer"
    else: 
        return "Autumn"

# Map get_season function to the month of each date in "dt" column
kiev_df.loc[:, "Season"] = kiev_df["dt"].dt.month.map(get_season)

# Group data by year and season, calculate the mean of temp data
seasonal_avg = kiev_df.groupby([kiev_df["dt"].dt.year, "Season"]).agg({
    "AverageTemperature": "mean" ,
    "Tuncertainty": "mean"
}).reset_index()
                                        
seasonal_avg.columns = ["Year", "Season", "AvgTemperature", "AvgUncertainty"]

print(seasonal_avg)

我试着计算所有季节的平均气温,但得到的都是负值,这显然是不对的。

下面是转换的代码:

创建一个函数,将华氏度转换为摄氏度

def fahr_to_celsius(df):
"""
Converts Fahrenheit temperature to Celsius (excluding the "Tuncertainty" column)

Parameters:
- df (DataFrame): DataFrame containing temperature data.

Returns:
- DataFrame: A modified DataFrame with temperature columns converted from Fahrenheit to Celsius values.
"""

# Converting these columns from fahr to celsius
celsius_conv = ["TMAX", "TMIN", "AverageTemperature"]

for col in celsius_conv:
    df[col] = (df[col] - 32) / 1.8

return df

这是数据的一个示例:

    dt  AverageTemperature  Tuncertainty    City    Country TMAX        TMIN
114929  17440401    49.676      4.4964      Kiev    Ukraine 54.1724     45.1796
114930  17440501    55.6556     3.321       Kiev    Ukraine 58.9766     52.3346
114931  17440601    63.3074     3.0654      Kiev    Ukraine 66.3728     60.242
114932  17440701    66.9002     2.8656      Kiev    Ukraine 69.7658     64.0346
114933  17440801    -9999       2.9466      Kiev    Ukraine 68.28430418 63.60738403

1 个回答

2

这里有一个和你的数据类似的例子:

s = pd.Series([49.0, 55.5, 63.0, 66.9, -999])

你觉得像这样的数据,smean()(平均值)会是什么呢?当然是负数,因为-999的绝对值比其他值大得多,所以它会对平均值产生很大的影响。

在进行所有ETL(提取、转换、加载)任务时,你需要清理数据,以便从中提取有意义的信息。手动检查你的数据,决定哪些值不应该被考虑,然后在进行统计分析之前把它们过滤掉。

你觉得温度-999在乌克兰是一个合理的温度值吗?无论是华氏度还是摄氏度?假设热力学定律没有被打破,我认为可以安全地把这些值过滤掉。

>>> s = pd.Series([49.0, 55.5, 63.0, 66.9, -999])
>>> s.mean()
-152.92000000000002
>>> s[s > -999].mean()
58.6

撰写回答