回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我想知道是否可以创建一个Seaborn计数图,但不是y轴上的实际计数,而是显示组内的相对频率(百分比)(用<code>hue</code>参数指定)。</p>
<p>我用以下方法解决了这个问题,但我无法想象这是最简单的方法:</p>
<pre><code># Plot percentage of occupation per income class
grouped = df.groupby(['income'], sort=False)
occupation_counts = grouped['occupation'].value_counts(normalize=True, sort=False)
occupation_data = [
{'occupation': occupation, 'income': income, 'percentage': percentage*100} for
(income, occupation), percentage in dict(occupation_counts).items()
]
df_occupation = pd.DataFrame(occupation_data)
p = sns.barplot(x="occupation", y="percentage", hue="income", data=df_occupation)
_ = plt.setp(p.get_xticklabels(), rotation=90) # Rotate labels
</code></pre>
<p>结果:</p>
<p><a href="https://i.stack.imgur.com/feVbB.png" rel="noreferrer"><img src="https://i.stack.imgur.com/feVbB.png" alt="Percentage plot with seaborn"/></a></p>
<p>我正在使用<a href="http://archive.ics.uci.edu/ml/<a href="https://www.cnpython.com/pypi/dataset" class="inner-link">dataset</a>s/Adult" rel="noreferrer">UCI machine learning repository</a>中众所周知的成人数据集。熊猫数据框的创建方式如下:</p>
<pre><code># Read the adult dataset
df = pd.read_csv(
"data/adult.data",
engine='c',
lineterminator='\n',
names=['age', 'workclass', 'fnlwgt', 'education', 'education_num',
'marital_status', 'occupation', 'relationship', 'race', 'sex',
'capital_gain', 'capital_loss', 'hours_per_week',
'native_country', 'income'],
header=None,
skipinitialspace=True,
na_values="?"
)
</code></pre>
<p><a href="https://stackoverflow.com/questions/33179122/seaborn-countplot-with-frequencies">This question</a>是某种相关的,但不使用<code>hue</code>参数。在我的例子中,我不能仅仅改变y轴上的标签,因为条的高度必须取决于组。</p>