擅长:python、mysql、java
<p>当您让它尝试推断格式时,<code>pd.to_datetime</code>可以慢一个数量级。对于混合格式,您可以尝试多次解析它:</p>
<pre><code>import pandas as pd
from functools import reduce
# dd-mm-yy dd-mm-YYYY YYYY-mm-dd
df = pd.DataFrame({'date': ['12-01-01', '12-01-2001', '2001-07-05',
'Jan 19', 'January 2019', '1 January 2019']})
</code></pre>
<h3>代码:</h3>
^{pr2}$
<p>一般来说,如果指定<code>dayfirst</code>,那么{<cd1>}可以灵活地解析大多数格式。尽管这仍然比尝试用指定的格式解析它几次要慢。在</p>
<pre><code>pd.to_datetime(df.date, errors='coerce', dayfirst=True)
#0 2001-01-12
#1 2001-01-12
#2 2001-07-05
#3 NaT
#4 2019-01-01
#5 2019-01-01
#Name: date, dtype: datetime64[ns]
df = pd.concat([df]*10000, ignore_index=True)
%timeit reduce(lambda l,r: l.combine_first(r), [pd.to_datetime(df.date, format=fmt, errors='coerce') for fmt in formats])
#287 ms ± 2.35 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit pd.to_datetime(df.date, errors='coerce', dayfirst=True)
#5.79 s ± 36.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
</code></pre>
<p>因此,即使多次尝试解析它,您仍然会获得巨大的成功,而且您不会错过一些非标准格式。在</p>