我想在不替换的情况下对pandas数据帧中的行进行采样。我的意思是这个。在for循环的每次迭代中,我从COMBINED
中抽取一定数量的行,而不进行替换。我希望确保超过50000次迭代后,不再对同一行进行采样。我下面的代码试图解决这个采样问题,但我得到了错误
COMBINED
、TEMP
、MERGED
、SAMPLE
、SAMPLE_2
和PROBABILITY_GENERATED_POISSON
是数据帧lst
是一个列表
请参阅下面我的代码:
#FOR LOOP TO SAMPLE FROM COMBINED BASED ON NUMBER OF EVENTS PER YEAR
#AVOIDING REPEATED SAMPLING OF SAME EVENTS
for i in range(50000):
#IF THERE ARE NO EVENTS FOR THAT PARTICULAR YEAR, THERE WILL BE NO EVENT NUMBER AND NO LOSS
if PROBABILITY_GENERATED_POISSON.iloc[i,:].item == 0:
lst.append(0)
#IF THERE ARE MORE THAN 0 EVENTS FOR THAT YEAR, FOLLOW THE BELOW PROCESS
else:
SAMPLE = COMBINED.sample(n = PROBABILITY_GENERATED_POISSON.iloc[i,:],
replace = False,
weights = LOSS_EVENT_SAMPLE_PROBABILITY,
axis = 0)
SAMPLE['Sample'] = i
#CREATE TEMP DATA FRAME WHICH CONSISTS OF ALL ROWS SAMPLED IN PREVIOUS ITERATIONS
#except FUNCTION IS FOR ERROR HANDLING - IT PREVENTS THE LOOP FROM STOPPING MIDWAY
try:
TEMP = pd.DataFrame(lst)
#PERFORM AN INNER JOIN - SELECTING COMMON ROWS FROM TEMP AND SAMPLE
MERGED = TEMP.merge(SAMPLE, how = "inner")
#AVOIDING DUPLICATION WITHIN LIST
#IF THERE ARE NO COMMON ROWS (nrow(MERGED) == 0), THEN INPUT SAMPLE INTO lst
if MERGED.shape[0] == 0:
lst.append(SAMPLE)
else:
#IF THERE ARE COMMON ROWS (nrow(MERGED) > 0), THEN SAMPLE AGAIN, BUT AFTER EXCLUDING THE COMMON ROWS FROM
#THE COMBINED DATA FRAME. BY EXCLUDING THE COMMON ROWS, WE ENSURE THAT WE ARE NOT SAMPLING ROWS WHICH
#WERE SAMPLED IN PREVIOUS ITERATIONS.
COMBINED_2 = COMBINED.subtract(SAMPLE)
SAMPLE_2 = COMBINED_2.sample(n = PROBABILITY_GENERATED_POISSON.iloc[i,:],
replace = False,
weights = LOSS_EVENT_SAMPLE_PROBABILITY,
axis = 0)
SAMPLE_2['Sample'] = i
lst.append(SAMPLE_2)
except:
continue
print(i)
我得到的错误附在图片上
我想就我的问题得到一些反馈
谢谢
以下是两种解决方法:
.sample
函数的解决方案.sample()
相同的简单算法的解决方案我修正了错误
PROBABILITY_GENERATED_POISSON
需要是一个列表相关问题 更多 >
编程相关推荐