嗨,我想从一列到另一列提取一些值,但是我在使用正则表达式操作符时遇到了一些困难。我想取两个值(61-150)和(1,1-800 GQ),并将它们提取到一个名为“box”的新列中。然而,我对正则表达式不太了解,我似乎只能接受所有的数字值。我如何获得它,使两个值(61-150)和(1,1-800 GQ)进入框列和框描述,然后将值更改为不带这些数字
df = pd.read_csv('boxstore.csv')
df['BOXES'] = None
# Defining indexes for desired columns
index_description = df.columns.get_loc('BOX DESCRIPTION')
index_boxing = df.columns.get_loc('BOXES')
# Creating a pattern to be extracted
boxing_pattern = r'\((\d+-\d+)\)'
# For loop to iterate through rows to find and extract pattern to 'Seating' column
for row in range(0, len(df)):
store = re.findall(boxing_pattern, df.iat[row, index_description])
df.iat[row, index_boxing] = store
df.loc[df['BOX DESCRIPTION'] == 'BOXES (1-1,800 GQ) NEW STORE','BOX DESCRIPTION'] = 'BOXES NEW STORE'
df.loc[df['BOX DESCRIPTION'] == 'BOXES (1-1,999 SF) NEW STORE','BOXES'] = '(1-1,800 GQ)'
df.loc[df['BOX DESCRIPTION'] == 'BOXES (61-150) OLD STORE','BOX DESCRIPTION'] = 'BOXES OLD
STORE'
print(df.head(265))
我只是想提取以下信息: 盒子(1-1999 SF)低风险 盒子(61-150)低风险
# sample dataframe
BOX DESCRIPTION
0 NEW STORE
1 BOXES STORE (1-1,999 SF) LOW RISK
2 BOXES (61-150) HIGH RISK
3 BOXES (0-30) MODERATE RISK
4 BOXES (151 + ) HIGH RISK
5 BOXES (151 + ) LOW RISK
6 BOXES (151 + ) MODERATE RISK
7 BOXES (31-60) LOW RISK
8 BOXES (0-30) HIGH RISK
9 BOXES (31-60) HIGH RISK
10 BOXES (0-30) LOW RISK
11 BOXES (2,000+ SF) MODERATE RISK
12 BOXES (2,000+ SF) LOW RISK
13 BOXES (2,000+ SF) HIGH RISK
14 BOXES STORE (1-1,999 SF) MODERATE
15 BOXES STORE (1-1,999 SF) HIGH RISK
16 BOXES (61-150) LOW RISK
17 BOXES (61-150) MODERATE RISK
18 BOXES (31-60) MODERATE RISK
EXPECTED OUT
BOX DESCRIPTION
0 NEW STORE BOXES
1 BOXES STORE LOW RISK (1,1-999 SF)
2 BOXES LOW RISK (61 - 150)
3 BOXES (0-30) MODERATE RISK
4 BOXES (151 + ) HIGH RISK
5 BOXES (151 + ) LOW RISK
6 BOXES (151 + ) MODERATE RISK
7 BOXES (31-60) LOW RISK
您可以使用
str.extract
从BOX DESCRIPTION
中提取所需的模式:或者,您可以首先从
BOX DESCRIPTION
提取所需的模式,然后将其分配给BOXES
,然后replace
该模式并将结果分配回BOX DESCRIPTION
:结果:
相关问题 更多 >
编程相关推荐