Python babe包_程序模块 - PyPI

婴儿姓名统计数据的获取与分析

babe的Python项目详细描述

宝贝

请注意，第一次导入name时，您需要访问Internet，下载所需数据需要几秒钟（取决于带宽）。在

但是这些数据会自动保存在一个本地文件中，这样下次的速度会更快。在

要安装：

pip install babe

然后在python控制台或笔记本中。。。在

frombabeimportUsNamesd=UsNames()

数据简介

基本数据提供了一个与(state, gender, name, year)元组相关的受欢迎程度评分（记录的婴儿数量）（该数据包含1910年至2019年间在美国出生的婴儿的名字）。在

^{pr2}$

	state	gender	year	name	popularity	name_g
0	AK	F	1910	Mary	14	Mary_F
1	AK	F	1910	Annie	12	Annie_F
2	AK	F	1910	Anna	10	Anna_F
3	AK	F	1910	Margaret	8	Margaret_F
4	AK	F	1910	Helen	7	Helen_F
...	...	...	...	...	...	...
28277	WY	M	2019	Theo	5	Theo_M
28278	WY	M	2019	Tristan	5	Tristan_M
28279	WY	M	2019	Vincent	5	Vincent_M
28280	WY	M	2019	Warren	5	Warren_M
28281	WY	M	2019	Waylon	5	Waylon_M

6122890行×6列

print(f"{len(d.names)} unique names")

31862 unique names

但是有些名字可以同时用于两种性别，所以大多数内部结构都会使用name_g，这个名字与性别（_F或_M）相连：

print(f"{len(d.name_gs)} unique names_g (gendered names)")

34952 unique names_g (gendered names)

您可以使用resolve_name_g来获得与姓名相对应的name_g，只要该名称不用于多个性别。在

d.resolve_name_g('Cora')

'Cora_F'

try:d.resolve_name_g('Vanessa')exceptAssertionErrorase:print(e)

The Vanessa can be used for both genders. Specify Vanessa_F or Vanessa_M

按州数据

在某些情况下，使用(state, name_g, year)索引视图会更方便。 by_state属性提供了这一点。在

d.by_state

state  name_g      year
AK     Mary_F      1910    14
       Annie_F     1910    12
       Anna_F      1910    10
       Margaret_F  1910     8
       Helen_F     1910     7
                           ..
WY     Theo_M      2019     5
       Tristan_M   2019     5
       Vincent_M   2019     5
       Warren_M    2019     5
       Waylon_M    2019     5
Name: popularity, Length: 6122890, dtype: int64

这允许用户执行某些操作，例如仅获取给定状态的数据：

d.by_state['CA']

name_g      year
Mary_F      1910    295
Helen_F     1910    239
Dorothy_F   1910    220
Margaret_F  1910    163
Frances_F   1910    134
                   ... 
Zayvion_M   2019      5
Zeek_M      2019      5
Zhaire_M    2019      5
Zian_M      2019      5
Ziyad_M     2019      5
Name: popularity, Length: 387781, dtype: int64

。。。在一个州内，获得某个名字的“年人气”：

d.by_state['CA']['Cora_F']# or d.by_state['CA', 'Cora_F']

year
1911      8
1912      9
1913     15
1914     15
1915     17
       ... 
2015    269
2016    244
2017    284
2018    282
2019    256
Name: popularity, Length: 109, dtype: int64

。。。如果您想获得给定名称（实际上是name_g）的所有状态的数据，可以使用“切片”来完成。在

例如，如果你想知道有多少小男孩被称为“凡妮莎”，更确切地说，是在何时何地？。。。在

d.by_state[:,'Vanessa_M']

state  year
AZ     1988     8
CA     1980     7
       1981     5
       1982    20
       1983    19
       1984    14
       1985    12
       1986    13
       1987    13
       1988    26
       1989    17
       1990    16
       1991    18
       1992    17
       1993    17
       1994    10
       1995     9
       1996    10
       1997    11
       1998     7
DC     1989    11
NY     1982     5
       1983     9
       1986     6
       1988     6
       1989     6
TX     1981     5
       1982     7
       1983    12
       1984     9
       1985    10
       1986     8
       1987     9
       1988     8
       1989     5
       1990     6
       1991     5
       1992     5
       1994     5
Name: popularity, dtype: int64

国家数据

可通过national属性获得国家聚集

d.national

name_g      year
Aaban_M     2013     6
            2014     6
Aadam_M     2019     6
Aadan_M     2008    12
            2009     6
                    ..
Zyriah_F    2013     7
            2014     6
            2016     5
Zyron_M     2015     5
Zyshonne_M  1998     5
Name: popularity, Length: 633239, dtype: int64

接口与by_state属性相同，但没有状态规范。在

^{pr21}$

year
1935       5
1947      24
1948      32
1949      16
1950      41
        ... 
2015    1687
2016    1633
2017    1486
2018    1345
2019    1188
Name: popularity, Length: 74, dtype: int64

密谋

d.plot_popularity('Cora');

png

d.plot_popularity('Cora','GA');

png

d.plot_popularity(['Cora','Vanessa_F']);

png

d.plot_popularity('Cora',['CA','GA']);

png

d.plot_popularity(['Cora','Vanessa_F'],['CA','GA']);

png

其他

性别模糊的名字

我们称一个名字的“女性气质”是指这个名字被用来给一个女孩起名的次数比例（所有州，所有时间），一个名字的“男性气质”也相应地被定义。在

d.femininity_of_name.iloc[12000:12010]

Lemmie      0.161290
Kashmere    0.161290
Clary       0.162162
Sung        0.162393
Kyrie       0.163527
Cedar       0.163686
Masyn       0.163895
Naveen      0.165605
Chai        0.166667
Atlee       0.167382
dtype: float64

d.femininity_of_name.plot(figsize=(17,5),ylabel='femininity');

png

^{pr31}$

Berkley     0.108889
Dasani      0.110092
Sharone     0.111111
Ifeoluwa    0.111111
Rama        0.111111
Scout       0.111486
Brownie     0.111732
Lashon      0.113158
Indigo      0.113364
Argie       0.113636
dtype: float64

d.masculinity_of_name.plot(figsize=(17,5),ylabel='masculinity');

png

因此，一个名字的（性别）“模糊性”可以定义为它的女性气质和男性气质的最低值的两倍。在

通过定义模糊度，我们得到一个介于0和1之间的分数。当相同比例的男孩和女孩用这个名字命名时，这是最大的（1）。当只有一个性别被命名时，它是最小值（0）。在

请注意，此分数是原始的（或“未平滑”）。它是用原始计数来计算的，所以极端分数通常适用于计数非常低的名称。在

d.ambiguity_of_name

^{pr35}$

t=d.ambiguity_of_nameprint(f"There are {len(t[t>0])} (gender-)ambiguous names")

There are 3090 (gender-)ambiguous names

t=d.ambiguity_of_namet[t>0].plot(figsize=(17,5),ylabel='gender-ambiguity');

png

t=list(d.ambiguous_names)print(f"{len(t)} (gender-)ambiguous names:")print(*t[:9],'...',sep=', ')

3090 (gender-)ambiguous names:
Nolie, Tyrese, Linn, Savannah, Bryn, Rei, Abby, Shilo, Tracy, ...

欢迎加入QQ群-->： 979659372

babe 0.0.7

babe的Python项目详细描述

宝贝

数据简介

按州数据

国家数据

密谋

其他

性别模糊的名字

推荐PyPI第三方库

cloudlab

openerp-procurement

rest-api-lib-creator

jupyter-saagie-plugin

notifyourself

pathlib2

ansible-playbook-grapher

raspberrysystem

ucivms

shinkenplugins.plugins.drupal_extensions

vdom

odoo10-addon-base-fontawesome

odoo9-addon-stock-picking-operation-quick-change

import-expression

mediamosa

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

babe 0.0.7

babe的Python项目详细描述

宝贝

数据简介

按州数据

国家数据

密谋

其他

性别模糊的名字

推荐PyPI第三方库

cloudlab

openerp-procurement

rest-api-lib-creator

jupyter-saagie-plugin

notifyourself

pathlib2

ansible-playbook-grapher

raspberrysystem

ucivms

shinkenplugins.plugins.drupal_extensions

vdom

odoo10-addon-base-fontawesome

odoo9-addon-stock-picking-operation-quick-change

import-expression

mediamosa

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签