将Pandas GroupBy输出从Series转换为DataFram问题的回答

将Pandas GroupBy输出从Series转换为DataFram

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我想稍微改变一下Wes给出的答案，因为0.16.2版需要<code>as_index=False</code>。如果不设置，则会得到一个空数据帧。 <a href="http://pandas.pydata.org/pandas-docs/stable/groupby.html#aggregation" rel="noreferrer">Source</a>： <blockquote> Aggregation functions will not return the groups that you are aggregating over if they are named columns, when <code>as_index=True</code>, the default. The grouped columns will be the indices of the returned object. Passing <code>as_index=False</code> will return the groups that you are aggregating over, if they are named columns. Aggregating functions are ones that reduce the dimension of the returned objects, for example: <code>mean</code>, <code>sum</code>, <code>size</code>, <code>count</code>, <code>std</code>, <code>var</code>, <code>sem</code>, <code>describe</code>, <code>first</code>, <code>last</code>, <code>nth</code>, <code>min</code>, <code>max</code>. This is what happens when you do for example <code>DataFrame.sum()</code> and get back a <code>Series</code>. nth can act as a reducer or a filter, see <a href="http://pandas.pydata.org/pandas-docs/stable/groupby.html#groupby-nth" rel="noreferrer">here</a>. </blockquote> <pre><code>import pandas as pd df1 = pd.DataFrame({"Name":["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"], "City":["Seattle","Seattle","Portland","Seattle","Seattle","Portland"]}) print df1 # # City Name #0 Seattle Alice #1 Seattle Bob #2 Portland Mallory #3 Seattle Mallory #4 Seattle Bob #5 Portland Mallory # g1 = df1.groupby(["Name", "City"], as_index=False).count() print g1 # # City Name #Name City #Alice Seattle 1 1 #Bob Seattle 2 2 #Mallory Portland 2 2 # Seattle 1 1 # </code></pre> 编辑： 在版本<code>0.17.1</code>和更高版本中，可以在<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.count.html" rel="noreferrer">^{<cd4>}</a>和<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.reset_index.html" rel="noreferrer">^{<cd5>}</a>中使用<code>subset</code>，在<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.size.html" rel="noreferrer">^{<cd7>}</a>中使用参数<code>name</code>： <pre><code>print df1.groupby(["Name", "City"], as_index=False ).count() #IndexError: list index out of range print df1.groupby(["Name", "City"]).count() #Empty DataFrame #Columns: [] #Index: [(Alice, Seattle), (Bob, Seattle), (Mallory, Portland), (Mallory, Seattle)] print df1.groupby(["Name", "City"])[['Name','City']].count() # Name City #Name City #Alice Seattle 1 1 #Bob Seattle 2 2 #Mallory Portland 2 2 # Seattle 1 1 print df1.groupby(["Name", "City"]).size().reset_index(name='count') # Name City count #0 Alice Seattle 1 #1 Bob Seattle 2 #2 Mallory Portland 2 #3 Mallory Seattle 1 </code></pre> <code>count</code>和<code>size</code>的区别在于<code>size</code>计算NaN值，而<code>count</code>不计算NaN值。

将Pandas GroupBy输出从Series转换为DataFram

1 个回答

相关Python问题