如何在Pandas DataFrame中将最后4位9999替换为0101(Python)
我有一个数据框,长得像这样:
OrdNo year
1 20059999
2 20070830
3 20070719
4 20030719
5 20039999
6 20070911
7 20050918
8 20070816
9 20069999
我想把这个数据框中最后四位数字是9999的地方,替换成0101,应该怎么做呢?
谢谢!
2 个回答
0
我写了一个脚本,里面解释了如何处理这个问题。需要注意的是,这个版本写得比较详细,可能可以简化,但我尽量让它易于理解,方便大家跟着做。
如果你是初学者,一个很好的练习方法是先在脑海中想出一些步骤来解决这个问题(或者把它写下来),然后去查阅相关库的文档,看看能不能找到好的解决方案。
import pandas as pd
# Creating dataframe
data = [[1, 20059999], [2, 20070830], [3, 20070719], [4, 20030719], [5, 20039999], [6, 20070911], [7, 20050918], [8, 20070816], [9, 20069999]]
df = pd.DataFrame(data, columns=['OrdNo', 'year'])
# Iterating through dataframe
for index, row in df.iterrows():
# Here we take the columns from the row we are in right now
OrdNo = row['OrdNo']
year = row['year']
# Taking last four digits from year int. We need to convert the year int to string to do this. -4: basically
# tells the code to start at the end (-), move 4 characters back (4) and return everything from that point to the
# end (:)
lastfour = str(year)[-4:]
# Check if last four digits are 9999 (as string, because lastfour is a string)
if lastfour == "9999":
# If true, replace the 9999 with 0101
# First we take the year but remove the last four digits (the 9999)
year = str(year)[:-4]
# Then we add 0101 to the year
newyear = year + "0101"
# Now convert it back to int
newyear = int(newyear)
# And put it back in the dataframe
# We use loc to find based on the OrdNo and then we replace the year column by our new value
df.loc[df['OrdNo'] == OrdNo, 'year'] = newyear
# Lets print the result
print(df.to_string(index=False))
2
假设你的 year
列是字符串类型(也就是文本):
df["year"] = df["year"].str.replace("(9999)$", "0101")
如果它是数字类型的话
df["year"] = pd.to_numeric(df["year"].astype(str).str.replace("(9999)$", "0101"), errors="coerce")