检查多列数据格式，并将结果附加到表中的一列

id room area situation 0 1 A-102 world under construction 1 2 NaN 24 under construction 2 3 B309 NaN NaN 3 4 C·102 25 under decoration 4 5 E_1089 hello under decoration 5 6 27 NaN under plan 6 7 27 NaN NaN

id room area situation check 0 1 A-102 world under construction area is not numbers 1 2 NaN 24 under construction incorrect room name 2 3 B309 NaN NaN NaN 3 4 C·102 25 under decoration incorrect room name; decoration is in the content 4 5 E_1089 hello under decoration incorrect room name; area is not numbers; decoration is in the content 5 6 27 NaN under plan NaN 6 7 27 NaN NaN NaN

id room area situation check 0 1 A-102 world under construction nan 1 2 NaN 24 under construction nan 2 3 B309 NaN NaN nan 3 4 C·102 25 under decoration incorrect room name 4 5 E_1089 hello under decoration incorrect room name 5 6 27 NaN under plan nan 6 7 27 NaN NaN nan

id room area situation check 0 1 A-102 world under construction area is not a numbers 1 2 NaN 24 under construction nan 2 3 B309 NaN NaN nan 3 4 C·102 25 under decoration incorrect room name 4 5 E_1089 hello under decoration area is not a numbers 5 6 27 NaN under plan nan 6 7 27 NaN NaN nan

id room area situation check 0 1 A-102 world under construction decoration is in the content 1 2 NaN 24 under construction decoration is in the content 2 3 B309 NaN NaN nan 3 4 C·102 25 under decoration incorrect room name 4 5 E_1089 hello under decoration area is not a numbers 5 6 27 NaN under plan decoration is in the content 6 7 27 NaN NaN nan

3条回答

网友

1楼 · 编辑于 2024-05-13 02:54:36

我稍微修改了您的条件，使结果更接近您的预期输出：

a = np.where(df.room.str.match('^[a-zA-Z\d\-]*$').notnull(), pd.NA, 'incorrect room name')
b = np.where(df["area"].str.isnumeric() & df["area"].notnull(), pd.NA, 'area is not a numbers')
c = np.where(df.situation.str.contains('under decoration', na = False), 'decoration is in the content', pd.NA)

s = (pd.concat([pd.Series(i, index=df.index) for i in (a, b, c)], axis = 1)
       .stack().groupby(level = 0).agg("; ".join))

print(df.assign(check=s))

   id    room   area           situation                                              check
0   1   A-102  world  under construction                              area is not a numbers
1   2     NaN     24  under construction                                incorrect room name
2   3    B309    NaN                 NaN  area is not a numbers; decoration is in the co...
3   4   C·102     25    under decoration                       decoration is in the content
4   5  E_1089  hello    under decoration  area is not a numbers; decoration is in the co...
5   6      27    NaN          under plan                              area is not a numbers
6   7      27    NaN                 NaN  area is not a numbers; decoration is in the co...

网友

2楼 · 编辑于 2024-05-13 02:54:36

首先^{}更改每个测试的输出，然后zip更改每个数组，如果没有缺少值，则为join应用自定义函数：

a = np.where(df.room.str.match('^[a-zA-Z\d\-]*$', na = False), None,
                               'incorrect room name')
b = np.where(df.area.str.contains('^\d+$', na = True), None,
                                 'area is not a numbers')  
c = np.where(df.situation.str.contains('under decoration', na = False),
                                      'decoration is in the content', None) 


f = (lambda x: ';'.join(y for y in x if pd.notna(y)) 
                if any(pd.notna(np.array(x))) else np.nan )
df['check'] = [f(x) for x in zip(a,b,c)]
print(df)
   id    room   area           situation  \
0   1   A-102  world  under construction   
1   2     NaN     24  under construction   
2   3    B309    NaN                 NaN   
3   4   C·102     25    under decoration   
4   5  E_1089  hello    under decoration   
5   6      27    NaN          under plan   
6   7      27    NaN                 NaN   

                                               check  
0                              area is not a numbers  
1                                incorrect room name  
2                                                NaN  
3   incorrect room name;decoration is in the content  
4  incorrect room name;area is not a numbers;deco...  
5                                                NaN  
6                                                NaN

网友

3楼 · 编辑于 2024-05-13 02:54:36

您可以尝试以下方法：

import os
import glob
import pandas as pd
os.chdir(r"C:\Users\Rameez PC\Desktop\python data files 2\")

extension = 'xlsx'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

#combine all files in the list
combined_xlsx1 = pd.concat([pd.read_excel(f) for f in all_filenames] )
#export to csv
combined_xlsx1.to_excel( "combined.xlsx", index=False, encoding='utf-8-sig')

相关问题更多 >

编程相关推荐

热门问题

热门文章