<p>假设您的数据帧如下所示:</p>
<pre><code>>>> import pandas as pd
>>> survey = pd.DataFrame(
... ["Virginia", "VA", "VA", "Penns.", "PA", "Pennsylvania"],
... columns=["State"]
... )
>>> survey
State
0 Virginia
1 VA
2 VA
3 Penns.
4 PA
5 Pennsylvania
</code></pre>
<p>您构造的初始映射可以是较长形式名称到规范缩写的映射。你知道吗</p>
<pre><code>>>> to_abbrev = {
... "Virginia": "VA",
... "Pennsylvania": "PA",
... "Penns.": "PA",
... }
</code></pre>
<p>然后,用缩写本身更新:</p>
<pre><code>>>> to_abbrev.update({v: v for v in to_abbrev.values()})
>>> to_abbrev
{'Virginia': 'VA',
'Pennsylvania': 'PA',
'Penns.': 'PA',
'VA': 'VA',
'PA': 'PA'}
</code></pre>
<p>最后,调用<code>.map()</code>得到结果:</p>
<pre><code>>>> survey["State"].map(to_abbrev)
0 VA
1 VA
2 VA
3 PA
4 PA
5 PA
Name: State, dtype: object
</code></pre>
<p>值得说明的是:您的<code>to_abbrev</code>必须是一个<em>完整的</em>映射;否则,缺少的值将是NaN:</p>
<pre><code>>>> survey.append({"State": "Wisconsin"}, ignore_index=True)["State"].map(to_abbrev)
0 VA
1 VA
2 VA
3 PA
4 PA
5 PA
6 NaN
Name: State, dtype: object
</code></pre>
<p>正如评论中所建议的,毫无疑问,有一些库是专门为您构建更完整的映射的,考虑到常见的拼写错误和小的语法差异,例如“D.C.”和“DC.”</p>