数据分析：列中映射和更正拼写错误的问题

Question

我最近在用Jupyter查看一个项目的数据，想把一些数据分类到特定的公司类别里。

最后，我用了一个很大的if循环，但问题是我无法用某一列来解析每个单独的单元格，所以想知道有没有更好的方法来做这件事。其实我一开始的代码就不太好用，所以我试着用我一点点的Python知识来改进它。

我想从SicCodes这一列中选一个值，然后把它和映射进行比较，最后得到一个名称作为输出。我最开始的想法是用if循环来简单解析数据，然后再慢慢改进。但实际上，我无法把数据框架放进我小小的to_code_range里，所以我考虑用for循环来做，但目前还没有成功。

有没有人能给我一些好的建议来改进这个问题呢？

mappings = [
    (1000, 9990, 'Agriculture'),
    (10000, 14990, 'Mining'),
    (15000, 17990, 'Construction'),
    (18000, 19990, 'not used'),
    (20000, 39990, 'Manufacturing'),
    (40000, 49990, 'Utility Services'),
    (50000, 51990, 'Wholesale Trade'),
    (52000, 59990, 'Retail Trade'),
    (60000, 69200, 'Financials'),
    (70000, 90040, 'Services'),
    (91000, 97290, 'Public Administration'),
    (98000, 99990, 'Nonclassifiable'),
]

"""errors = set()
def to_code_range(i): 
    if type(i) != int: 
        print("Pas un int")
    if i=="None Supplied": 
        return np.nan
    code = int(i)
    for code_from, code_to, name in mappings: 
        if (code<=code_to)&(code>=code_from): 
            return name
        errors.add(code)
        return np.nan"""

def to_code_range(valeur): 
    if type(valeur) != int: print("Pas un int")
    code = int(valeur)
    if (code<1000): return np.nan
    if (code>=1000)&(code<=9990): return "Agriculture"
    if (code>=10000)&(code<=14990): return "Mining" 
    if (code>=10000)&(code<=14990): return "Mining"
    if (code>=15000)&(code<=17990): return "Construction"
    if (code>=18000)&(code<=19990): return "not used"
    if (code>=20000)&(code<=39990): return "Manufacturing"
    if (code>=40000)&(code<=49990): return "Utility Services"
    if (code>=50000)&(code<=51990): return "Wholesale Trade"
    if (code>=52000)&(code<=59990): return "Retail Trade"
    if (code>=60000)&(code<=69200): return "Financials"
    if (code>=70000)&(code<=90040): return "Services"
    if (code>=91000)&(code<=97290): return "Public Administration"
    if (code>=98000)&(code<=99990): return "Nonclassifiable"
    else :return np.nan
        
#report['SICCode.SicText_1'] = to_code_range(report["SicCodes"])
for i in report['SicCodes']: report['SICCode.SicText_1'][i] = to_code_range(i)

在这里输入图片描述

我在用if循环和for循环，但输出时出现了错误。

映射数据分析循环结构数据框架 jupyter 拼写错误分类 SicCodes

数据分析：列中映射和更正拼写错误的问题

1 个回答

撰写回答