<p>使用<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.astype.html#pandas.Series.astype" rel="noreferrer">^{<cd1>}</a>将dtype转换为str,然后使用矢量化的<a href="http://pandas.pydata.org/pandas-docs/stable/api.html#string-handling" rel="noreferrer">^{<cd2>}</a>方法对str进行切片,然后再次转换回<code>int64</code>dtype:</p>
<pre><code>In [184]:
df['DATE'] = df['DATE'].astype(str).str[:-2].astype(np.int64)
df
Out[184]:
DATE
0 201107
1 201107
2 201107
3 201107
4 201107
5 201107
6 201107
7 201108
8 201108
9 201108
In [185]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 1 columns):
DATE 10 non-null int64
dtypes: int64(1)
memory usage: 160.0 bytes
</code></pre>
<p>嗯。。。</p>
<p>原来有一个内置的方法<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.floordiv.html#pandas.Series.floordiv" rel="noreferrer">^{<cd4>}</a>:</p>
<pre><code>In [191]:
df['DATE'].floordiv(100)
Out[191]:
0 201107
1 201107
2 201107
3 201107
4 201107
5 201107
6 201107
7 201108
8 201108
9 201108
Name: DATE, dtype: int64
</code></pre>
<p><strong>更新</strong></p>
<p>对于1000行df,<code>floordiv</code>方法要快得多:</p>
<pre><code>%timeit df['DATE'].astype(str).str[:-2].astype(np.int64)
%timeit df['DATE'].floordiv(100)
100 loops, best of 3: 2.92 ms per loop
1000 loops, best of 3: 203 µs per loop
</code></pre>
<p>这里我们观察到大约10倍的加速</p>