<p>我不认为这个特定的数据集非常适合plotly.express首选的长数据格式。特别是由于<code>Province / State</code>的许多缺失观测。既然你的目的是</p>
<blockquote>
<p>plot the COVID-19 evolution as lines for all countries, day by day</p>
</blockquote>
<p>…不需要<code>Province / State</code>、<code>Lat</code>或<code>Lon</code>。因此,我只需对每个国家的数据求和,并使用每个国家的<code>go.Scatter</code>跟踪。不,它不会变得太混乱,因为你可以很容易地选择痕迹或集中在字符的不同部分,因为我们在这里应用了plotly的强大功能。无论如何,我希望设置将满足您的喜好。如果您还需要什么,请随时告诉我</p>
<p><strong>绘图:</strong></p>
<p><a href="https://i.stack.imgur.com/NN0dO.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/NN0dO.png" alt="enter image description here"/></a></p>
<p><strong>绘图,缩放:</strong></p>
<p><a href="https://i.stack.imgur.com/snjDl.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/snjDl.png" alt="enter image description here"/></a></p>
<p><strong>编辑-第2版:</strong>按首次出现后的天数进行开发</p>
<p>一种使绘图不那么凌乱的方法是测量每个区域从第一天开始的发展情况,如下所示:</p>
<p><a href="https://i.stack.imgur.com/N52U7.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/N52U7.png" alt="enter image description here"/></a></p>
<p>为了生成第一个绘图,只需复制链接中的数据,并将其作为<code>covid.csv</code>存储在名为<code>c:\data</code>的文件夹中</p>
<p><strong>第一个绘图的完整代码:</strong></p>
<pre><code>import os
import pandas as pd
import plotly.graph_objects as go
dfi = pd.read_csv(r'C:\data\covid.csv',sep = ",", header = 0)
# drop province, latitude and longitude
df = dfi.drop(['Province/State', 'Lat', 'Long'], axis = 1)
# group by countries
df_gr = df.groupby('Country/Region').sum()#.reset_index()
time = df_gr.columns.tolist()
df_gr.columns = pd.to_datetime(time)
df_gr.reset_index(inplace = True)
# transpose df to get dates as a row index
df = df_gr.T
# set first row as header
new_header = df.iloc[0] #grab the first row for the header
df = df[1:] #take the data less the header row
df.columns = new_header #set the header row as the df header
# order df columns descending by country with most cases
df_current = df.iloc[-1].to_frame().reset_index()
df_sort = df_current.sort_values(df_current.columns[-1], ascending = False)# plotly setup
order = df_sort['Country/Region'].tolist()
df = df[order]
fig = go.Figure()
# add trace for each country
for col in df.columns:
#print(col)
fig.add_trace(go.Scatter(x=df.index, y=df[col].values, name=col))
fig.show()
</code></pre>
<p><strong>最后一个绘图的代码:</strong></p>
<p><em>这是基于代码片段1的df:</em></p>
<pre><code># replace leading zeros with nans
df2= df.replace({'0':np.nan, 0:np.nan})
# shift leading nans, leaving
# nans in the last rows for some
# regions
df2=df2.apply(lambda x: x.shift(-x.isna().sum()))
df2.reset_index(inplace=True)
df2=df2.drop('index', axis = 1)
fig2 = go.Figure()
# add trace for each country
for col in df2.columns:
fig2.add_trace(go.Scatter(x=df2.index, y=df2[col].values
, name=col
))
fig2.update_layout(showlegend=True)
fig2.update_layout(xaxis=dict(title='Days from first occurence'))
fig2.show()
</code></pre>