Python中的匹配重叠间隔（字典）

from datetime import datetime # same_streams_old = { "Stream_1": "0:24:08.925167", "Stream_2": "0:24:08.990644", "Stream_3": "0:24:08.990644", "Stream_4": "0:24:12.118778", "Stream_5": "0:24:12.118778", "stream_6": "0:24:10.075066" } same_streams = { "Stream_1": "0:24:08.925167", "Stream_2": "0:24:12.118778", "Stream_3": "0:23:11.057711", "Stream_4": "0:24:12.118778", "Stream_5": "0:24:10.075066", "Stream_6": "0:24:08.990644" } keys = [] values = [] final_synced_video_files = [] final_non_synced_video_files = [] def get_time_diff(episode_run_time, episode_time): prev_episode_time = datetime.strptime(episode_run_time, '%H:%M:%S.%f') current_episode_time = datetime.strptime(episode_time, '%H:%M:%S.%f') time_diff = prev_episode_time - current_episode_time if current_episode_time > prev_episode_time: time_diff = current_episode_time - prev_episode_time return float(time_diff.seconds) for key, value in same_streams.items(): keys.append(key) values.append(value) for key in keys: for _key in keys: if key != _key: diff = get_time_diff(same_streams[key], same_streams[_key]) if diff <= 1.5: final_synced_video_files.append(key) else: pass final_synced_video_files = list(set(final_synced_video_files)) final_non_synced_video_files = list(set(keys) - set(final_synced_video_files)) print("Synced Files : {0}".format(final_synced_video_files)) print("Non Synced Files : {0}".format(final_non_synced_video_files))

1条回答

网友

1楼 · 发布于 2024-04-25 23:09:30

好的，下面的打印例程可能有点混乱，但是请记住，它们只是用于调试，完成后您可能不需要它们。我知道这个答案很长，但请仔细阅读。。你知道吗

简短回答：

您的问题可能来自这样一个事实：当您将字符串转换为datetime（将秒视为整数）时，会丢失十进制精度。但是，timedelta对象有一个名为total_seconds()的方法，它提供了亚秒级的分辨率。详见this或通用文档。只需将return的get_time_diff()语句改为

 return float(time_diff.total_seconds())

长答案的动机：

我不确定您尝试使用（非和）同步列表实现什么：您可能遇到流a与流b同步，而流c与d同步的情况，但c和d与a和b不同步。它们应该都在你的synced_list里吗？根据你想对列表做什么，我会考虑使用下面描述的同步矩阵，而不是你的列表，因为它们会丢失很多信息。你知道吗

长答案：

让我介绍一下同步矩阵的概念。它将给出您的哪些流彼此同步的完整描述：

THE SYNC MATRIX: A symmetric matrix; Cell (i,j) in the matrix is TRUE if, and only if, stream 'i' and 'j' are in sync. Else, the cell value is FALSE. Hence, the diagonal (.) is entirely TRUE because a stream is always in sync with itself.
     1 2 3 4
    ________
 1 | . T T F
 2 |   . T F
 3 |     . F
 4 |       .
"T" is true, and "F" is false: obviously from the example drawing above, stream 1 is in sync with stream 2, but not in sync with stream 4.

对于您的示例，创建这样一个同步矩阵非常简单：

def is_synced(key_1, key_2):    
    max_allowed_desync = 1.5
    return max_allowed_desync > get_time_diff(same_streams[key_1], same_streams[key_2])

keys = same_streams.keys()
keys.sort() # VERY IMPORTANT, for the synced matrix to be constructed correctly; also make 's' uppercase for "stream_6" in OP.

# The complete matrix ..
full_sync_matrix = [[is_synced(k1,k2) for k2 in keys] for k1 in keys]

# We can optimize (memory usage) to only get the half matrix, since it's symmetric anyway; also excluding the diagonal.
half_sync_matrix = [[is_synced(k1,k2) for k2 in keys[curr+1:]] for curr,k1 in enumerate(keys)]

现在，让我们实现两个打印/显示同步矩阵的函数：

# Print a HALFED sync matrix
def print_sync_half_matrix(sm):
    string = ""
    for i,row in enumerate(sm):
        string += "\n" + " "*i*2
        for col in row:
            string += " " + ("T" if col else "F")
    print(string)

# Print a COMPLETE sync_matrix
def print_sync_full_matrix(sm):
    string = ""
    for row in sm:
        string += "\n"
        for col in row:
            string += " " + ("T" if col else "F")
    print(string)

然后，对于您提供的数据集，我得到：

same_streams = {
  "Stream_1": "0:24:08.925167",
  "Stream_2": "0:24:08.990644",
  "Stream_3": "0:24:08.990644",
  "Stream_4": "0:24:12.118778",
  "Stream_5": "0:24:12.118778",
  "Stream_6": "0:24:10.075066"
} # note that "Stream_6" previously had a lower case 's'!

print_sync_half_matrix(half_sync_matrix)
#   1 2 3 4 5 6
# 1   T T F F T
# 2     T F F T
# 3       F F T
# 4         T F
# 5           F

请记住，对角线不包括在矩阵/打印中！这里的结果是正确的，正如输入所预期的那样。让我们打印出一些时差来获取更多的情报

for stream_key in same_stream:
    print("Stream_1 ~ "+stream_key+": "+str(get_time_diff(same_streams["Stream_1"], same_streams[stream_key])))

。。很快就会发现你的时间戳已经失去了小数精度：

Stream_1 ~ Stream_5: 3.0
Stream_1 ~ Stream_4: 3.0
# ...

如果我们查看datetime的文档，就会发现它将时间视为秒的整数。因此，当您从get_time_diff函数中的datetime对象请求seconds时，微秒精度将丢失。只需从deltatime方法.total_seconds()请求秒即可解决。。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章