从文本Python中识别和提取日期的最佳方法？

网友
1楼 · 编辑于 2024-05-14 11:14:49

如果您能够识别实际包含日期信息的段，那么使用parsedatetime可以非常简单地解析它们。有几件事要考虑，即你的日期没有年，你应该选择一个地点。
>>> import parsedatetime >>> p = parsedatetime.Calendar() >>> p.parse("December 15th") ((2013, 12, 15, 0, 13, 30, 4, 319, 0), 1) >>> p.parse("9/18 11:59 pm") ((2014, 9, 18, 23, 59, 0, 4, 319, 0), 3) >>> # It chooses 2014 since that's the *next* occurence of 9/18
当你有无关的文本时，它并不总是完美地工作。
>>> p.parse("9/19 LAB: Serial encoding") ((2014, 9, 19, 0, 15, 30, 4, 319, 0), 1) >>> p.parse("9/19 LAB: Serial encoding (Section 2.2)") ((2014, 2, 2, 0, 15, 32, 4, 319, 0), 1)
老实说，这似乎是一个很简单的问题，可以为特定格式进行解析，并从每个句子中选出最有可能的一个。除此之外，这将是一个不错的机器学习问题。

网友
2楼 · 编辑于 2024-05-14 11:14:49

我也在寻找解决办法，但找不到，所以我和一个朋友建立了一个工具来解决这个问题。我想如果其他人觉得有用的话，我会回来分享的。
datefinder -- find and extract dates inside text

网友
3楼 · 编辑于 2024-05-14 11:14:49

import datefinder
string_with_dates = """
                    entries are due by January 4th, 2017 at 8:00pm
                    created 01/15/2005 by ACME Inc. and associates.
                    """
matches = datefinder.find_dates(string_with_dates)
for match in matches:
    print match

相关问题更多 >

编程相关推荐

热门问题

热门文章