回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我正在尝试执行以下操作,但似乎不支持此模式下的矢量化操作。在</p>
<pre><code>import pandas as pd
df=pd.DataFrame([[2017,1,15,1],
[2017,1,15,2],
[2017,1,15,3],
[2017,1,15,4],
[2017,1,15,5],
[2017,1,15,6],
[2017,1,15,7]],
columns=['year','month','day','month_offset'])
df['date']=df.apply(lambda g: pd.datetime(g.year,g.month,g.day),axis=1)
df['offset']=df.apply(lambda g: pd.offsets.MonthEnd(g.month_offset),axis=1)
df['date_offset']=df.date+df.offset
</code></pre>
<p>这是为代码段中的最后一条语句返回的警告:</p>
<blockquote>
<p>C:\Python3.5.2.3\WinPython-64bit-3.5.2.3\python-3.5.2.amd64\lib\site-packages\pandas\core\ops.py:533: PerformanceWarning: Adding/subtracting array of DateOffsets to Series not vectorized
"Series not vectorized", PerformanceWarning)</p>
</blockquote>
<p>我想这是一个矢量化的操作,因为性能的好处。在</p>
<p>谢谢。在</p>
<h2>编辑</h2>
<p>最后,比较@john zwinck的以下方法:</p>
^{pr2}$
<p>结果是:</p>
<pre><code>index year month day month_offset mydate offset1 final
0 2017 1 1 1 2017-01-01 2017-01-31 2017-01-31
1 2017 1 1 2 2017-01-01 2017-02-28 2017-02-28
2 2017 1 1 3 2017-01-01 2017-03-31 2017-03-31
3 2017 1 1 4 2017-01-01 2017-04-30 2017-04-30
4 2017 1 1 5 2017-01-01 2017-05-31 2017-05-31
5 2017 1 1 6 2017-01-01 2017-06-30 2017-06-30
6 2017 1 1 7 2017-01-01 2017-07-31 2017-07-31
runfile('C:/bitbucket/test/vector_dates.py', wdir='C:/bitbucket/test')
Method 1 0.003999948501586914 seconds
Method 2 with numpy vectorization 0.0009999275207519531 seconds
</code></pre>
<p>很明显,numpy要快得多</p>