合并两个数据帧(相同的列)、不同的时间戳

2024-04-25 04:00:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧,如下所示:

数据帧1

timestamp_read  base
1508025600009   A
1508025600088   G
1508025600156   C
1508025600200   T
1508025600257   T
1508025600307   C
1508025600403   G
1508025600476   G
1508025600550   D
1508025600596   G
1508025600606   D
1508025600658   G

数据帧2

timestamp_read  base    
1508025600009   A   
1508025600101   G   
1508025600104   C   
1508025600174   T   
1508025600233   T   
1508025600233   T   Additional T
1508025600238   C   
1508025600266   G   
1508025600268   G   Missing D
1508025600285   G   
1508025600393   D   
1508025600455   G   
1508025600460   A   Additional A

读取的时间戳是一个历元时间。DataFrame1和DataFrame2应该是相同的,但是它们不是相同的,因为diagostics在两台不同的机器上运行,所以存在一定程度的延迟。有时一台机器上的结果可能会丢失,而另一台机器上的结果可能会丢失,反之亦然。考虑到延迟差异,合并这两个数据帧的最佳方式是什么。我怀疑解决方案可能涉及大规模并行签名排序,但我渴望听到解决方案。你知道吗

期望输出:

timestamp_read  base    
1508025600009   A   
1508025600101   G   
1508025600104   C   
1508025600174   T   
1508025600233   T   
1508025600233   T   Additional T
1508025600238   C   
1508025600266   G   
1508025600268   G   Missing D
1508025600272   D   Synthetically generated timestamp based on 
                    distance from other points in original timeseries 
                    is optional.
1508025600285   G   
1508025600393   D   
1508025600455   G   
1508025600460   A   Additional A

Tags: 数据机器readbase时间差异解决方案timestamp