我在探索一组数据,但参数计算必须在更小的指定时间范围内。下面的代码正确吗?
我刚开始学习Python。
这是我的代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
url = "https://storage.googleapis.com/courses_data/Assignment%20CSV/finance_liquor_sales.csv"
df = pd.read_csv(url)
print("Missing Data: \n", df.isna().sum())
df.dropna(inplace=True)
time_period = pd.date_range(start="2016-01-01", end="2019-12-31")
print(df[df["date"].isin(time_period)])
while df[df["date"]] in time_period:
popular_item = df.groupby("zip_code")["bottles_sold"].sum().sort_values(ascending=False)
print(popular_item)
popular_item = plt.scatter(df["zip_code"], df["bottles_sold"])
plt.title("Bottles Sold per region in 2016-2019")
plt.xlabel("Zip Code")
plt.ylabel("Bottles Sold")
plt.show()
我想要展示2016到2019年之间每个邮政编码卖出的瓶子数量,所以我尝试写了一段代码
time_period = pd.date_range(start="2016-01-01", end="2019-12-31")
print(df[df["date"].isin(time_period)])
while df[df["date"]] in time_period:
来从我的数据中获取这个时间范围,这样计算结果就只会基于这个特定的时间段。
1 个回答
0
你可以这样修改你的代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
# Load libraries
url = "https://storage.googleapis.com/courses_data/Assignment%20CSV/finance_liquor_sales.csv"
df = pd.read_csv(url)
# Check for missing data
print("Missing Data: \n", df.isna().sum())
# Drop rows with missing values
df.dropna(inplace=True)
# Convert 'date' column to datetime type
df['date'] = pd.to_datetime(df['date'])
# Filter data for the specified time period
start_date = "2016-01-01"
end_date = "2019-12-31"
filtered_df = df[(df['date'] >= start_date) & (df['date'] <= end_date)]
# Calculate total bottles sold per zip code
bottles_sold_per_zip = filtered_df.groupby("zip_code")["bottles_sold"].sum().sort_values(ascending=False)
# Plotting
plt.figure(figsize=(10, 6))
plt.bar(bottles_sold_per_zip.index, bottles_sold_per_zip.values)
plt.title("Total Bottles Sold per Zip Code (2016-2019)")
plt.xlabel("Zip Code")
plt.ylabel("Total Bottles Sold")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()