使用Pandas合并Python中的两个电子表格,按“时间”列中最近的“时间”,值为XX:XX:XX格式

2024-04-27 22:54:28 发布

您现在位置:Python中文网/ 问答频道 /正文

编写我的第一个程序来组织一些数据,我已经练习python好几个月了。这个小程序将在不久的将来与我正在开发的RFID阅读器原型一起使用。我已经成功地获取了.txt和.xls,提取了相关信息,现在我想根据XX:XX:XX格式的时间戳对它们进行匹配。txt是我的GPS的读数,.xls包含RFID标签的标签信息

我只需要根据最近的时间戳将GPS位置与标签ID配对

请看一看我到目前为止所做的一切,并为您提供任何指导或建议

import numpy as np 
import pandas as pd
import xlrd

filename_xls = '11_4_TAG.xls'
df = pd.read_excel(filename_xls)
tag_data=(df[['Time', 'TagID']])
#print(tag_data)

filename = '11_4_GPS.txt'
df_gps = pd.read_csv(filename, delimiter=r"\s+", skiprows=17, skipfooter=3, engine='python', encoding="unicode_escape")
gps_data=(df_gps[['Latitude', 'Longitude', 'Time']])
#print(gps_data)

pd.merge_asof(tag_data, gps_data, on='Time', direction='nearest')
#print(pd.merge_asof)

我试过许多“on”、“direction”和“by”的变体

以下是我的两个新数据集的输出:

标签数据

WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
        Time                     TagID
0   10:18:32  E280116060000207A633DAB6
1   10:18:57  A15427AABB00112233445566
2   10:19:07  E280116060000207A6336C96
3   10:19:09  E280116060000207A6341969
4   10:19:34  E280116060000207A633E5B9
5   10:19:40  E280116060000207A633A846
6   10:19:56  A94439112233445566778899
7   10:20:01  E200001D52120136069068C0
8   10:20:05  E280116060000207A633DA16
9   10:20:07  A63367112233445566778899
10  10:20:12  E280116060000207A633A836
11  10:20:15  E280116060000207A633CBD9
12  10:20:18  E200001D5212006106702126
13  10:20:20  A39223112233445566778899
14  10:20:28  E280116060000207A633DCC6
15  10:20:50  A02257AABB00112233445566
16  10:22:24  E280116060000207A633DA26
17  10:22:44  E280116060000207A6336AC6
18  10:23:43  E280116060000207A633DA46
19  10:24:03  E280116060000207A6336CA6
20  10:24:22  E280116060000207A633DC96
21  10:28:01  C10002AABB00112233445566
22  10:28:05  013193AABB00112233445566
23  10:28:12  017072AABB00112233445566
24  10:28:22  023764AABB00112233445566
25  10:28:42  A15800AABB00112233445566
26  10:28:49  E280116060000207A6336CC6
27  10:28:51  E280116060000207A6344236
28  10:29:00  E280116060000207A6336CB6
29  10:29:01  E280116060000207A633CBB9
30  10:29:08  E280116060000207A6341959
31  10:29:11  A72546AABB00112233445566
32  10:29:15  A93853112233445566778899
33  10:29:15  A93853AABB00112233445566
34  10:30:46  A13832AABB00112233445566
35  10:30:52  A02533AABB00112233445566
36  10:30:58  00111160600002078899CBA9
37  10:31:23  A83503AABB00112206906A73
[Finished in 0.8s]

全球定位系统数据

WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
         Latitude      Longitude      Time
0    N43°03.6205'  W085°57.5513'  10:17:46
1    N43°03.6205'  W085°57.5512'  10:17:49
2    N43°03.6203'  W085°57.5514'  10:17:51
3    N43°03.6202'  W085°57.5511'  10:17:54
4    N43°03.6199'  W085°57.5518'  10:17:57
..            ...            ...       ...
342  N43°03.6162'  W085°57.5477'  10:33:03
343  N43°03.6163'  W085°57.5472'  10:33:06
344  N43°03.6168'  W085°57.5477'  10:33:09
345  N43°03.6167'  W085°57.5477'  10:33:11
346  N43°03.6163'  W085°57.5486'  10:33:14

[347 rows x 3 columns]
[Finished in 0.8s]

错误代码:

WARNING *** OLE2 inconsistency: SSCS size is 0 but SSAT size is non-zero
Traceback (most recent call last):
  File "C:\Users\Owner\Documents\Software\Python Programs\Data_parse\data_parse_xls.py", line 15, in <module>
    pd.merge_asof(tag_data, gps_data, on='Time', direction='nearest')
  File "C:\Users\Owner\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\reshape\merge.py", line 563, in merge_asof
    return op.get_result()
  File "C:\Users\Owner\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\reshape\merge.py", line 1483, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
  File "C:\Users\Owner\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\reshape\merge.py", line 884, in _get_join_info
    (left_indexer, right_indexer) = self._get_join_indexers()
  File "C:\Users\Owner\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\reshape\merge.py", line 1789, in _get_join_indexers
    return func(left_values, right_values, self.allow_exact_matches, tolerance)
  File "pandas\_libs\join.pyx", line 966, in pandas._libs.join.__pyx_fused_cpdef
TypeError: No matching signature found
[Finished in 0.8s]

Tags: inpandasdatasizetimeispackagesmerge
1条回答
网友
1楼 · 发布于 2024-04-27 22:54:28

正如@MhDG7所说,您只能在完全匹配的值上进行合并。因此,您的第一个任务是为gps_数据中的每个时间戳查找标记_数据中时间戳之间最接近的匹配。下面的关键行是这一行

tag_secs_for_gps_secs = [min(tag_data['Secs'], key = lambda tag_secs: abs(tag_secs - gps_secs))  for gps_secs in gps_data['Secs'] ]

正是这样。这里,对于以秒表示的每个gps时间(见下文),我们在tag_数据时间戳(也以秒表示)中找到一个元素,该元素最小化了两个时间戳之间的绝对距离

这是完整的脚本 首先,我们创建玩具数据

import sys
import pandas as pd
from io import StringIO
from datetime import datetime

gps_data_raw = StringIO(
    """
Latitude,Longitude,Time
N43°03.6162',W085°57.5477',10:33:03
N43°03.6163',W085°57.5472',10:33:06
N43°03.6168',W085°57.5477',10:33:09
N43°03.6167',W085°57.5477',10:33:11
N43°03.6163',W085°57.5486',10:33:14
    """)
gps_data = pd.read_csv(gps_data_raw)

tag_data_raw = StringIO(
    """
Time,TagID
10:33:01,C10002AABB00112233445566
10:33:05,013193AABB00112233445566
10:33:12,017072AABB00112233445566
10:33:22,023764AABB00112233445566
10:33:42,A15800AABB00112233445566
10:33:49,E280116060000207A6336CC6
10:33:51,E280116060000207A6344236
    """)
tag_data = pd.read_csv(tag_data_raw)

然后,我们将时间戳转换为datetime对象,并最终转换为(整数)秒

base_date = datetime(1900, 1, 1, 0, 0, 0)
gps_data['Time'] = pd.to_datetime(gps_data['Time'], format='%H:%M:%S')
tag_data['Time'] = pd.to_datetime(tag_data['Time'], format='%H:%M:%S')
gps_data['Secs'] = gps_data['Time'].apply(lambda t: int((t-base_date).total_seconds()))
tag_data['Secs'] = tag_data['Time'].apply(lambda t: int((t-base_date).total_seconds()))

然后我们匹配时间戳,将它们粘贴到数据帧中,并在该列上合并

tag_secs_for_gps_secs = [min(tag_data['Secs'], key = lambda tag_secs: abs(tag_secs - gps_secs))  for gps_secs in gps_data['Secs'] ]

gps_data['Nearest_tag_secs'] = tag_secs_for_gps_secs
merged_data = gps_data.merge(tag_data, left_on = 'Nearest_tag_secs', right_on = 'Secs')

resultimg dataframemerged_data可以使用一些清理,但在这里完全复制,以便您可以看到发生了什么

|    | Latitude     | Longitude     | Time_x              |   Secs_x |   Nearest_tag_secs | Time_y              | TagID                    |   Secs_y |
| -:|:      -|:       |:          |    -:|         -:|:          |:            -|    -:|
|  0 | N43°03.6162' | W085°57.5477' | 1900-01-01 10:33:03 |    37983 |              37981 | 1900-01-01 10:33:01 | C10002AABB00112233445566 |    37981 |
|  1 | N43°03.6163' | W085°57.5472' | 1900-01-01 10:33:06 |    37986 |              37985 | 1900-01-01 10:33:05 | 013193AABB00112233445566 |    37985 |
|  2 | N43°03.6168' | W085°57.5477' | 1900-01-01 10:33:09 |    37989 |              37992 | 1900-01-01 10:33:12 | 017072AABB00112233445566 |    37992 |
|  3 | N43°03.6167' | W085°57.5477' | 1900-01-01 10:33:11 |    37991 |              37992 | 1900-01-01 10:33:12 | 017072AABB00112233445566 |    37992 |
|  4 | N43°03.6163' | W085°57.5486' | 1900-01-01 10:33:14 |    37994 |              37992 | 1900-01-01 10:33:12 | 017072AABB00112233445566 |    37992 |

相关问题 更多 >