如何在读取CSV时使用Pandas编写更干净、性能更好的代码

样本数据：

Status Server Server Name Bay # Model Processor Proc. Count Memory Serial Number State Power State iLO FW Firmware Appliance Name Critical enc2010, bay 1 tdm2066.example.com 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2 262144 2M272101N9 Unmanaged On 2.53 May 03 2017 I36 v2.40 (02/17/2017) OV C7000 enclosures 1 OK enc1011, bay 1 tdm1068.example.com 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2 262144 2M272101P6 Monitored On 2.55 Aug 16 2017 I36 v2.74 (07/21/2019) OV C7000 enclosures 1 OK enc1012, bay 1 tdm1083.example.com 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2 262144 2M272101NX Monitored On 2.61 Jul 27 2018 I36 v2.60 (05/21/2018) OV C7000 enclosures 1 OK ENC2004, bay 1 tdm2033.example.com 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz 2 524288 2M262602L2 Monitored On 2.55 Aug 16 2017 I36 v2.52 (10/25/2017) OV C7000 enclosures 1 OK ENC2006, bay 1 vds2009 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2 524288 2M263604ZZ Monitored On 2.40 Dec 02 2015 I36 v2.20 (05/05/2016) OV C7000 enclosures 1 OK ENC2011, bay 1 tdm2081.example.com 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2 524288 2M2708027Z Monitored On 2.55 Aug 16 2017 I36 v2.52 (10/25/2017) OV C7000 enclosures 1 OK ENC1003, bay 1 tdm1024.example.com 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz 2 524288 2M262602KW Monitored On 2.73 Feb 11 2020 I36 v2.52 (10/25/2017) OV C7000 enclosures 1 OK ENC1006, bay 1 vds1009 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz 2 524288 2M262505V5 Monitored On 2.40 Dec 02 2015 I36 v2.00 (12/28/2015) OV C7000 enclosures 1 OK ENC1007, bay 1 vds1023 1 ProLiant BL460c Gen9 Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz 2 524288 2M264800TR Monitored On 2.50 Sep 23 2016 I36 v2.30 (09/12/2016) OV C7000 enclosures 1

数据帧：

df = pd.read_csv("testcreate.csv", sep="\t") df = df[[ 'Server', 'Server Name', 'Bay #', 'Appliance Name']] df['Bay'] = df['Server'].str.split(',').str[1].str.lower() df['Enclosure'] = df['Server'].str.split(',').str[0].str.upper() df['Server Name'] = df['Server Name'].str.split('.').str[0] df = df.drop(['Server', 'Bay #'], axis=1) df = df[df['Appliance Name'].str.contains('C7000')]

数据帧输出：

Server Name Appliance Name Bay Enclosure 0 tdm2066 OV C7000 enclosures 1 bay 1 ENC2010 1 tdm1068 OV C7000 enclosures 1 bay 1 ENC1011 2 tdm1083 OV C7000 enclosures 1 bay 1 ENC1012 3 tdm2033 OV C7000 enclosures 1 bay 1 ENC2004 4 vds2009 OV C7000 enclosures 1 bay 1 ENC2006 5 tdm2081 OV C7000 enclosures 1 bay 1 ENC2011 6 tdm1024 OV C7000 enclosures 1 bay 1 ENC1003 7 vds1009 OV C7000 enclosures 1 bay 1 ENC1006 8 vds1023 OV C7000 enclosures 1 bay 1 ENC1007 9 vds0003 OV C7000 enclosures 1 bay 1 ENT0003 10 tdm7123 OV C7000 enclosures 1 bay 1 ENC7003 11 tdm2231 OV C7000 enclosures 1 bay 1 ENC2022 12 tdm2186 OV C7000 enclosures 1 bay 1 ENC2018 13 tdm1098 OV C7000 enclosures 1 bay 1 ENC1013 14 tdm1158 OV C7000 enclosures 1 bay 1 ENC1017 15 tdm2096 OV C7000 enclosures 1 bay 1 ENC2012 16 tdm1012 OV C7000 enclosures 1 bay 1 ENC1002 17 tdm1062 OV C7000 enclosures 1 bay 1 ENC1009 18 vds1041 OV C7000 enclosures 1 bay 1 ENC1010 19 vds1001 OV C7000 enclosures 1 bay 1 ENC1005 20 vds7025 OV C7000 enclosures 1 bay 1 ENC7009 21 vds2023 OV C7000 enclosures 1 bay 1 ENC2007 22 tdm7068 OV C7000 enclosures 1 bay 1 ENC7005 23 vds7006 OV C7000 enclosures 1 bay 1 ENC7006 24 tdm2126 OV C7000 enclosures 1 bay 1 ENC2014 25 vds2001 OV C7000 enclosures 1 bay 1 ENC2005 26 tdm1173 OV C7000 enclosures 1 bay 1 ENC1018 27 tdm1250 OV C7000 enclosures 1 bay 1 ENC1025

我尝试的是：

我借了

df1 = pd.concat( [g.set_index('Bay').add_suffix(f'_{n}') for n, g in df.groupby('Enclosure')], axis=1, sort=False).filter( like='Server Name').dropna(how='all', axis=1)

<；-这来自于提到的SO帖子，但我没有完全理解它，而且我不想添加后缀ieServer Name，因此它应该只有likeEnc1002`等

df1 = pd.concat( [g.set_index('Bay').add_suffix(f'_{n}') for n, g in df.groupby('Enclosure')], axis=1, sort=False).filter( like='Server Name').dropna(how='all', axis=1) print(df1)

结果:

编辑：

我从@Scott那里得到了desired解决方案

df = pd.concat([g.set_index('Bay')['Server Name'].rename(f'{n}') for n, g in df.groupby('Enclosure')], axis=1, sort=False)

我的代码可能有点凌乱，如Dataframe下所示，有没有更好的编码方法，只是在这里询问它，以获得更好的建议和代码写作

3条回答

网友

1楼 · 编辑于 2024-04-29 14:52:37

在阅读了您的文章和代码块之后，我看不到操作的太多范围。我想改变它如下，这将提供理想的结果。然而，我在代码审查上看到了类似的帖子

1-您应该选择所需的列，这将减少处理负担和灵活性，您可以使用usecols

2-您可以将df.assign与Dict一起使用，Dict将基于Keys, values从两个集合创建一个Dict，这将通过拆分,创建两个不同的列，因此您可以执行split，rename&drop一次性行动

它将如下所示，这应该是可行的

import pandas as pd
##### Pandas setting in case you want to visualize them on the screen. ####
#pd.set_option('display.height', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)
##################### END OF THE Display Settings ###################
# usecols is supposed to provide a filter before reading the whole DataFrame
# into memory; if used properly, there should never be a need to delete columns
# after reading.
df = pd.read_csv("testcreate.csv",
                  usecols=['Server',
                           'Server Name',
                           'Appliance Name'
                          ]
                )
df1  = df.assign(**dict
              (zip
              ('xy',
              df.Server.str.split(',')
              .str
              )
              )
              ).rename(columns=
              {'x': 'Enclosure',
               'y': 'Bay'
              }
              ).drop(['Server'], axis=1)
df1 = df1[
          df1['Appliance Name']
          .str.contains('C7000')
         ]
df1['Server Name'] = df1['Server Name'].str.split('.').str[0].str.lower()

df1['Enclosure']  = df1['Enclosure'].str.upper()

df1 = pd.pivot(df1,
               values='Server Name',
               index='Bay',
               columns='Enclosure'
               ).rename_axis(None)

df1.to_csv("YourCsvFileName.csv")
# Print(df1)

网友

2楼 · 编辑于 2024-04-29 14:52:37

如果我理解正确，您希望旋转您的表：

df1 = pd.pivot(df, values='Server Name', index='Bay', columns='Enclosure')


Enclosure  ENC1003  ENC1006  ENC1007  ENC1011  ENC1012  ENC2004  ENC2006  ENC2010  ENC2011  ENT0003
Bay                                                                                                
bay 1      tdm1024  vds1009  vds1023  tdm1068  tdm1083  tdm2033  vds2009  tdm2066  tdm2081  vds0003

网友

3楼 · 编辑于 2024-04-29 14:52:37

我希望我没弄错

# Read the CSV & assign it to `text`
with open('estcreate.csv', 'r') as fh:
    text = fh.read()

enc = dict()

for line in text.splitlines()[1:]:
    status, enclosure, bay, bay_no, vds, *na = line.split()
    
    enclosure = enclosure.replace(',','').upper()
    vds = vds.lower().split('.')[0]
    
    if enclosure not in enc:
        enc[enclosure] = dict()
    if bay_no not in enc[enclosure]:
        enc[enclosure][bay_no] = vds
    
    
>>> df = pd.DataFrame.from_dict(enc)
>>> df 

    ENC2010 ENC1011 ENC1012 ENC2004 ENC2006 ENC2011 ENC1003 ENC1006 ENC1007 ENT0003 ... ENC1010 ENC1005 ENC7009 ENC2007 ENC7005 ENC7006 ENC2014 ENC2005 ENC1018 ENC1025
1   tdm2066 tdm1068 tdm1083 tdm2033 vds2009 tdm2081 tdm1024 vds1009 vds1023 vds0003 ... vds1041 vds1001 vds7025 vds2023 tdm7068 vds7006 tdm2126 vds2001 tdm1173 tdm1250

>>> df.T
    1
ENC2010 tdm2066
ENC1011 tdm1068
ENC1012 tdm1083
ENC2004 tdm2033
ENC2006 vds2009
ENC2011 tdm2081
...

样本数据：

数据帧：

数据帧输出：

我尝试的是：

期望的：

编辑：

相关问题更多 >

编程相关推荐

热门问题

热门文章