从文本字符串中提取数字，并将其移动到dataframe中的单独列中

df = pd.read_csv(r'df.txt', header=None) df.columns = ['Test'] df = df.Test.str.split(expand=True) 0 1 2 3 4 5 6 0 well 1 20060201 3623.23 0.00 1300.00 None 1 well 1 20060202 3627.07 0.00 1305.00 None 2 well 1 20060203 3576.48 0.00 1305.00 None ... ... .. ... ... ... ... ... 42089 well14 20201114 0.00 0.00 0.00 None 42090 well14 20201115 0.00 0.00 0.00 None ... ... .. ... ... ... ... ... 51000 well 7 20201116 0.00 0.00 0.00 None 51001 well 7 20201117 0.00 0.00 0.00 None

0 1 2 3 4 0 well 1 20060201 3623.23 0.00 1300.00 1 well 1 20060202 3627.07 0.00 1305.00 2 well 1 20060203 3576.48 0.00 1305.00 ... ... .. ... ... ... ... 42089 well 14 20201114 0.00 0.00 0.00 42090 well 14 20201115 0.00 0.00 0.00 ... ... .. ... ... ... ... 51000 well 7 20201116 0.00 0.00 0.00 51001 well 7 20201117 0.00 0.00 0.00

1条回答

网友

1楼 · 发布于 2024-04-27 05:17:02

让我们试试：

# extract the names and digits
tmp = df[0].str.extract('^(.*\D)(\d+)?$')

# where the names are attached to digits
name_with_digits = tmp[1].notna()

# shift these values horizotally
df.loc[name_with_digits, 1:] = df.loc[name_with_digits, 1:].shift(axis=1)

# update the names
df.loc[name_with_digits,[0,1]] = tmp

# concatenate the names
df[0] = df[0] + ' ' + df[1].astype(str)

# drop unnecessary columns
df = df.drop([1,6], axis=1)

输出：

             0           2        3    4        5
0       well 1  20060201.0  3623.23  0.0  1300.00
1       well 1  20060202.0  3627.07  0.0  1305.00
2       well 1  20060203.0  3576.48  0.0  1305.00
42089  well 14    20201114      0.0  0.0      0.0
42090  well 14    20201115      0.0  0.0      0.0

相关问题更多 >

编程相关推荐

热门问题

热门文章