<p>您可以使用regex模式非常灵活地<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.extract.html">extract</a>出不同的部分:</p>
<pre><code>In [11]: df.row.str.extract('(?P<fips>\d{5})((?P<state>[A-Z ]*$)|(?P<county>.*?), (?P<state_code>[A-Z]{2}$))')
Out[11]:
fips 1 state county state_code
0 00000 UNITED STATES UNITED STATES NaN NaN
1 01000 ALABAMA ALABAMA NaN NaN
2 01001 Autauga County, AL NaN Autauga County AL
3 01003 Baldwin County, AL NaN Baldwin County AL
4 01005 Barbour County, AL NaN Barbour County AL
[5 rows x 5 columns]
</code></pre>
<hr/>
<p>要解释有点长的regex:</p>
<pre><code>(?P<fips>\d{5})
</code></pre>
<ul>
<li>匹配五个数字(<code>\d</code>),并将它们命名为<code>"fips"</code>。</li>
</ul>
<p>下一部分:</p>
<pre><code>((?P<state>[A-Z ]*$)|(?P<county>.*?), (?P<state_code>[A-Z]{2}$))
</code></pre>
<p>做两件事中的任何一件:</p>
<pre><code>(?P<state>[A-Z ]*$)
</code></pre>
<ul>
<li>匹配任意数字(<code>*</code>)的大写字母或空格(<code>[A-Z ]</code>),并在字符串结尾(<code>$</code>)之前将其命名为<code>"state"</code></li>
</ul>
<p>或者</p>
<pre><code>(?P<county>.*?), (?P<state_code>[A-Z]{2}$))
</code></pre>
<ul>
<li>匹配任何其他(<code>.*</code>)然后</li>
<li>一个逗号和一个空格</li>
<li>匹配字符串结尾(<code>$</code>)之前的两位数<code>state_code</code>。</li>
</ul>
<p><em>在示例中:</em><br/>
<em>注意,前两行命中“state”(将NaN留在county和state_code列中),而最后三行命中county和state_code(将NaN留在state列中)。</em></p>