Perl或Python：将日期从dd/mm/yyyy转换为yyyy-mm-dd

15 投票

8 回答

15470 浏览

数据工程师

提问于 2025-04-16 06:26

我在一个CSV文件里有很多日期，它们的格式是日/月/年，比如17/01/2010，我想把它们转换成年-月-日的格式，也就是2010-01-17。

我该怎么用Perl或者Python来实现这个转换呢？

字符串操作数据清洗 CSV文件处理日期格式转换 perl编程

8 个回答

使用 Time::Piece（从 5.9.5 版本开始就有），它和 Python 的解决方案非常相似，因为它提供了 strptime 和 strftime 这两个功能：

use Time::Piece;
my $dt_str = Time::Piece->strptime('13/10/1979', '%d/%m/%Y')->strftime('%Y-%m-%d');

或者

$ perl -MTime::Piece
print Time::Piece->strptime('13/10/1979', '%d/%m/%Y')->strftime('%Y-%m-%d');
1979-10-13
$

回答于 2025-04-16 由 Python大师

分享举报

如果你有的数据格式非常规范，只包含一个日期，格式是DD-MM-YYYY，那么这个方法就可以用：

# FIRST METHOD
my $ndate = join("-" => reverse split(m[/], $date));

这个方法可以处理像 $date 里有 "07/04/1776" 这样的日期，但对于 "this 17/01/2010 and that 01/17/2010 there" 就不行了。为了避免这个问题，可以用：

# SECOND METHOD
($ndate = $date) =~ s{
    \b
      ( \d \d   )
    / ( \d \d   )
    / ( \d {4}  )
    \b
}{$3-$2-$1}gx;

如果你想要一个更“语法化”的正则表达式，这样更容易维护和更新，可以使用这个：

# THIRD METHOD
($ndate = $date) =~ s{
    (?&break)

              (?<DAY>    (?&day)    )
    (?&slash) (?<MONTH>  (?&month)  )
    (?&slash) (?<YEAR>   (?&year)   )

    (?&break)

    (?(DEFINE)
        (?<break> \b     )
        (?<slash> /      )
        (?<year>  \d {4} )
        (?<month> \d {2} )
        (?<day>   \d {2} )
    )
}{
    join "-" => @+{qw<YEAR MONTH DAY>}
}gxe;

最后，如果你有Unicode数据，可能需要更加小心。

# FOURTH METHOD
($ndate = $date) =~ s{
    (?&break_before)
              (?<DAY>    (?&day)    )
    (?&slash) (?<MONTH>  (?&month)  )
    (?&slash) (?<YEAR>   (?&year)   )
    (?&break_after)

    (?(DEFINE)
        (?<slash>     /                  )
        (?<start>     \A                 )
        (?<finish>    \z                 )

        # don't really want to use \D or [^0-9] here:
        (?<break_before>
           (?<= [\pC\pP\pS\p{Space}] )
         | (?<= \A                )
        )
        (?<break_after>
            (?= [\pC\pP\pS\p{Space}]
              | \z
            )
        )
        (?<digit> \d            )
        (?<year>  (?&digit) {4} )
        (?<month> (?&digit) {2} )
        (?<day>   (?&digit) {2} )
    )
}{
    join "-" => @+{qw<YEAR MONTH DAY>}
}gxe;

你可以看看这四种方法在处理这些示例输入字符串时的表现：

my $sample  = q(17/01/2010);
my @strings =  (
    $sample,  # trivial case

    # multiple case
    "this $sample and that $sample there",

    # multiple case with non-ASCII BMP code points
    # U+201C and U+201D are LEFT and RIGHT DOUBLE QUOTATION MARK
    "from \x{201c}$sample\x{201d} through\xA0$sample",

    # multiple case with non-ASCII code points
    #   from both the BMP and the SMP 
    # code point U+02013 is EN DASH, props \pP \p{Pd}
    # code point U+10179 is GREEK YEAR SIGN, props \pS \p{So}
    # code point U+110BD is KAITHI NUMBER SIGN, props \pC \p{Cf}
    "\x{10179}$sample\x{2013}\x{110BD}$sample",
);

现在让 $date 作为一个 foreach 迭代器遍历那个数组，我们得到这个输出：

Original is:   17/01/2010
First method:  2010-01-17
Second method: 2010-01-17
Third method:  2010-01-17
Fourth method: 2010-01-17

Original is:   this 17/01/2010 and that 17/01/2010 there
First method:  2010 there-01-2010 and that 17-01-this 17
Second method: this 2010-01-17 and that 2010-01-17 there
Third method:  this 2010-01-17 and that 2010-01-17 there
Fourth method: this 2010-01-17 and that 2010-01-17 there

Original is:   from “17/01/2010” through 17/01/2010
First method:  2010-01-2010” through 17-01-from “17
Second method: from “2010-01-17” through 2010-01-17
Third method:  from “2010-01-17” through 2010-01-17
Fourth method: from “2010-01-17” through 2010-01-17

Original is:   17/01/2010–17/01/2010
First method:  2010-01-2010–17-01-17
Second method: 2010-01-17–2010-01-17
Third method:  2010-01-17–2010-01-17
Fourth method: 2010-01-17–2010-01-17

假设你确实想匹配非ASCII数字，比如：

   U+660  ARABIC-INDIC DIGIT ZERO
   U+661  ARABIC-INDIC DIGIT ONE
   U+662  ARABIC-INDIC DIGIT TWO
   U+663  ARABIC-INDIC DIGIT THREE
   U+664  ARABIC-INDIC DIGIT FOUR
   U+665  ARABIC-INDIC DIGIT FIVE
   U+666  ARABIC-INDIC DIGIT SIX
   U+667  ARABIC-INDIC DIGIT SEVEN
   U+668  ARABIC-INDIC DIGIT EIGHT
   U+669  ARABIC-INDIC DIGIT NINE

甚至可以是

 U+1D7F6  MATHEMATICAL MONOSPACE DIGIT ZERO
 U+1D7F7  MATHEMATICAL MONOSPACE DIGIT ONE
 U+1D7F8  MATHEMATICAL MONOSPACE DIGIT TWO
 U+1D7F9  MATHEMATICAL MONOSPACE DIGIT THREE
 U+1D7FA  MATHEMATICAL MONOSPACE DIGIT FOUR
 U+1D7FB  MATHEMATICAL MONOSPACE DIGIT FIVE
 U+1D7FC  MATHEMATICAL MONOSPACE DIGIT SIX
 U+1D7FD  MATHEMATICAL MONOSPACE DIGIT SEVEN
 U+1D7FE  MATHEMATICAL MONOSPACE DIGIT EIGHT
 U+1D7FF  MATHEMATICAL MONOSPACE DIGIT NINE

想象一下，你有一个用数学等宽字体写的日期，像这样：

$date = "\x{1D7F7}\x{1D7FD}/\x{1D7F7}\x{1D7F6}/\x{1D7F8}\x{1D7F6}\x{1D7F7}\x{1D7F6}";

Perl代码在这个上面可以正常工作：

Original is:   //
First method:  --
Second method: --
Third method:  --
Fourth method: --

我觉得你会发现Python的Unicode模型相当糟糕，它对抽象字符和字符串的支持不足，使得写这样的代码变得非常困难。

在Python中，写可读的正则表达式也很难，因为你不能把子表达式的声明和执行分开，(?(DEFINE)...) 这样的块在Python中不支持。实际上，Python甚至不支持Unicode属性。因为这个原因，Python并不适合处理Unicode正则表达式。

不过，如果你觉得Python和Perl相比已经很糟糕（确实是），那你试试其他语言吧。我还没找到一门语言在这方面比Python更好。

如你所见，当你从多种语言中寻找正则表达式解决方案时，会遇到真正的问题。首先，由于不同的正则表达式风格，解决方案很难比较。而且没有其他语言能在正则表达式的强大、表达能力和可维护性上与Perl相比。一旦涉及到任意Unicode，这种差异会更加明显。

所以如果你只想要Python的解决方案，那你应该只问这个。否则，这就是一个非常不公平的比赛，Python几乎总是会输；在Python中处理这样的事情太麻烦了，更不用说要做到既正确又干净了。这对Python来说要求太高了。

相比之下，Perl的正则表达式在这两方面都表现得很好。

回答于 2025-04-16 由 Python大师

分享举报

>>> from datetime import datetime
>>> datetime.strptime('02/11/2010', '%d/%m/%Y').strftime('%Y-%m-%d')
'2010-11-02'

>>> '-'.join('02/11/2010'.split('/')[::-1])
'2010-11-02'
>>> '-'.join(reversed('02/11/2010'.split('/')))
'2010-11-02'

或者有一种更“黑客”的方法（这种方法不检查值的有效性）：

回答于 2025-04-16 由 Python大师

分享举报

Perl或Python：将日期从dd/mm/yyyy转换为yyyy-mm-dd

8 个回答

撰写回答