<p>我想稍微改变一下Wes给出的答案,因为0.16.2版需要<code>as_index=False</code>。如果不设置,则会得到一个空数据帧。</p>
<p><a href="http://pandas.pydata.org/pandas-docs/stable/groupby.html#aggregation" rel="noreferrer">Source</a>:</p>
<blockquote>
<p>Aggregation functions will not return the groups that you are aggregating over if they are named columns, when <code>as_index=True</code>, the default. The grouped columns will be the indices of the returned object.</p>
<p>Passing <code>as_index=False</code> will return the groups that you are aggregating over, if they are named columns.</p>
<p>Aggregating functions are ones that reduce the dimension of the returned objects, for example: <code>mean</code>, <code>sum</code>, <code>size</code>, <code>count</code>, <code>std</code>, <code>var</code>, <code>sem</code>, <code>describe</code>, <code>first</code>, <code>last</code>, <code>nth</code>, <code>min</code>, <code>max</code>. This is what happens when you do for example <code>DataFrame.sum()</code> and get back a <code>Series</code>. </p>
<p>nth can act as a reducer or a filter, see <a href="http://pandas.pydata.org/pandas-docs/stable/groupby.html#groupby-nth" rel="noreferrer">here</a>.</p>
</blockquote>
<pre><code>import pandas as pd
df1 = pd.DataFrame({"Name":["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"],
"City":["Seattle","Seattle","Portland","Seattle","Seattle","Portland"]})
print df1
#
# City Name
#0 Seattle Alice
#1 Seattle Bob
#2 Portland Mallory
#3 Seattle Mallory
#4 Seattle Bob
#5 Portland Mallory
#
g1 = df1.groupby(["Name", "City"], as_index=False).count()
print g1
#
# City Name
#Name City
#Alice Seattle 1 1
#Bob Seattle 2 2
#Mallory Portland 2 2
# Seattle 1 1
#
</code></pre>
<p>编辑:</p>
<p>在版本<code>0.17.1</code>和更高版本中,可以在<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.count.html" rel="noreferrer">^{<cd4>}</a>和<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.reset_index.html" rel="noreferrer">^{<cd5>}</a>中使用<code>subset</code>,在<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.size.html" rel="noreferrer">^{<cd7>}</a>中使用参数<code>name</code>:</p>
<pre><code>print df1.groupby(["Name", "City"], as_index=False ).count()
#IndexError: list index out of range
print df1.groupby(["Name", "City"]).count()
#Empty DataFrame
#Columns: []
#Index: [(Alice, Seattle), (Bob, Seattle), (Mallory, Portland), (Mallory, Seattle)]
print df1.groupby(["Name", "City"])[['Name','City']].count()
# Name City
#Name City
#Alice Seattle 1 1
#Bob Seattle 2 2
#Mallory Portland 2 2
# Seattle 1 1
print df1.groupby(["Name", "City"]).size().reset_index(name='count')
# Name City count
#0 Alice Seattle 1
#1 Bob Seattle 2
#2 Mallory Portland 2
#3 Mallory Seattle 1
</code></pre>
<p><code>count</code>和<code>size</code>的区别在于<code>size</code>计算NaN值,而<code>count</code>不计算NaN值。</p>