我有两个数据帧,如下所示:
数据帧1
timestamp_read base
1508025600009 A
1508025600088 G
1508025600156 C
1508025600200 T
1508025600257 T
1508025600307 C
1508025600403 G
1508025600476 G
1508025600550 D
1508025600596 G
1508025600606 D
1508025600658 G
数据帧2
timestamp_read base
1508025600009 A
1508025600101 G
1508025600104 C
1508025600174 T
1508025600233 T
1508025600233 T Additional T
1508025600238 C
1508025600266 G
1508025600268 G Missing D
1508025600285 G
1508025600393 D
1508025600455 G
1508025600460 A Additional A
读取的时间戳是一个历元时间。DataFrame1和DataFrame2应该是相同的,但是它们不是相同的,因为diagostics在两台不同的机器上运行,所以存在一定程度的延迟。有时一台机器上的结果可能会丢失,而另一台机器上的结果可能会丢失,反之亦然。考虑到延迟差异,合并这两个数据帧的最佳方式是什么。我怀疑解决方案可能涉及大规模并行签名排序,但我渴望听到解决方案。你知道吗
期望输出:
timestamp_read base
1508025600009 A
1508025600101 G
1508025600104 C
1508025600174 T
1508025600233 T
1508025600233 T Additional T
1508025600238 C
1508025600266 G
1508025600268 G Missing D
1508025600272 D Synthetically generated timestamp based on
distance from other points in original timeseries
is optional.
1508025600285 G
1508025600393 D
1508025600455 G
1508025600460 A Additional A
目前没有回答
相关问题 更多 >
编程相关推荐