读取Python中的SubRip文件，得到预期的位置

2条回答

网友

1楼 · 编辑于 2024-05-28 18:43:04

The SubRip .srt file format really only supports the Microsoft Windows text encoding default of CP-1252 (commonly, but incorrectly, referred to as ANSI). A Unicode byte order mark can be added to support any Unicode encoding with UTF-8 being preferred for its compatibility with CP-1252. However, a number of embedded hardware-based players only have support for non-Unicode fonts due to the licensing costs associated with the commercial fonts used.

但是，在CP-1252:

>>> print '\x41\x6d\xef\xbf\xbd\x72\x69\x63\x61'.decode('cp1252')
Amï¿½rica

因此，假设您从一个较大的文件中摘录了此内容，该文件以表示其编码的BOM（字节顺序标记）开头，但丢弃了该BOM（以及编码信息）。作为一种替代理论，原始文件（以及生成和读取它的软件）可能只是不符合SubRip标准，或者是为了与非Unicode字体集的硬件一起使用而生成的。在

网友

2楼 · 编辑于 2024-05-28 18:43:04

序列“ef-bf-bd”是U+FFFD（替换字符）的UTF-8，也就是说，一个特殊的代码，显示为“¨”，如您的问题中所述。因此，有些东西（Python？）必须用此代码替换原始字符。所以你的终端看起来没问题。在

UTF-8中的“é”字符U+00E9（带锐音符的拉丁文小写字母E）将改为“c3a9”。在

可以想象，您的原始字幕可能被编码为CP1252，其中“e”由代码0xe9表示。由于下一个字节是0x72（'r'），解析器可能将0xe9解释为不完整的UTF-8序列，因此将“e9”替换为“ef bf bd”（替换字符）。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

读取Python中的SubRip文件，得到预期的位置

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >