<ul>
<li>也使用<code>zfill</code></li>
<li>将<code>'time'</code>设置为字符串,转换为日期时间并提取小时组件</李>
</ul>
<pre class="lang-py prettyprint-override"><code>df['hour'] = pd.to_datetime(df.time.astype('str').str.zfill(4), format='%H%M').dt.hour
# display(df)
invoiceNo time invoiceValue hour
0 A 6 2 0
1 B 12 3 0
2 C 356 5 3
3 D 2145 6 21
</code></pre>
<h2>从csv读取</h2>
<ul>
<li>在中读取数据时设置<code>'time'</code>列的类型,这样就不需要<code>.astype('str')</code></李>
</ul>
<pre class="lang-py prettyprint-override"><code>df = pd.read_csv('test.csv', dtype={'time': str})
df['hour'] = pd.to_datetime(df.time.str.zfill(4), format='%H%M').dt.hour
</code></pre>
<h2><code>timeit</code>测试</h2>
<pre class="lang-py prettyprint-override"><code># 2M rows of data
df = pd.DataFrame({'time':[6,12,356,2145]})
dft = pd.concat([df] * 500000).reset_index(drop=True)
%%timeit
pd.to_datetime(dft.time.astype('str').str.zfill(4), format='%H%M').dt.hour
[out]:
1.51 s ± 23.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
pd.to_numeric(dft.time.astype(str).str.zfill(4).str[0:2])
[out]:
2.6 s ± 41.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
</code></pre>