字节“string”中字符的正则表达式模式，特定标点Python3除外

2024-05-26 22:57:52 发布

您现在位置：Python中文网/ 问答频道 /正文

5993

网友

男 | 程序猿一只，喜欢编程写python代码。

我有一个文本文件，必须作为二进制读入才能工作。我正在尝试提取一些数据并放入csv文件。你知道吗

一些文本的示例如下所示：

b' "Title;""Date"";""Abstract"";""Patent Number"";""id"""\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t"The object of my invention is to lessen the rapidity or amount of this diminution \t which I do by the addition of a new step in the process of making the lamp \t as follows : After the lamphas been exhausted of air and hermetically sealed by the fusion of the exhaust-tube in the usual manner \t I connect the lamp"";""12234"";"";1.0" '

我想在“；”之间提取片段，并尝试了以下操作：

contentRegex = re.compile(b'\s{4,}"([\w+\s]+);(\d{4})\.\d;""([\w+\s+]+)"(.+[^;])')

它似乎工作得很好，除了最后一部分，它在看到第一个“；”之后继续抓取文本。所以下面的正则表达式模式似乎是错误的：

(.+[^;])

我很感激任何帮助！你知道吗

谢谢

Tags：文件 of csv the 数据 in 文本 abstract

1条回答

网友

1楼 · 发布于 2024-05-26 22:57:52

如果确实需要使用regexp，请尝试以下操作： ;(.*?); 或 ;([a-zA-Z"]*?);

字节“string”中字符的正则表达式模式，特定标点Python3除外

相关问题更多 >

编程相关推荐

热门问题

热门文章

字节“string”中字符的正则表达式模式，特定标点Python3除外

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >