如何在Python正则表达式上应用string方法

2024-06-11 18:41:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个降价文件,它有点坏:链接和图像太长有断线。我想把它们的断线去掉

示例:

发件人:

See for example the
[installation process for Ubuntu
Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

![https://diasporafoundation.org/assets/pages/about/network-
distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
distributed-e941dd3e345d022ceae909beccccbacd.png)

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_

收件人:

See for example the
[installation process for Ubuntu Trusty](https://wiki.diasporafoundation.org/Installation/Ubuntu/Trusty). The
project offers a Vagrant installation too, but the documentation only admits
that you know what you do, that you are a developer. If it is difficult to

![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)

_A pretty decentralized network (Source: <https://diasporafoundation.org/>)_

正如您在这段代码中看到的,我成功地用正确的模式匹配了所有链接和图像:https://regex101.com/r/uL8pO4/2

但是现在,在Python中,对正则表达式捕获的内容使用像string.trim()这样的字符串方法的语法是什么?

目前,我被这个问题困住了:

fix_newlines = re.compile(r'\[([\w\s*:/]*)\]\(([^()]+)\)')
# Capture the links and remove line-breaks from their urls
# Something like r'[\1](\2)'.trim() ??
post['content'] = fix_newlines.sub(r'[\1](\2)', post['content'])

编辑:我更新了示例以更明确地说明我的问题。

谢谢你的回答


Tags: thehttpsorgyouforthatpng链接
3条回答

这也适用于:

>>> s = """
...    ![https://diasporafoundation.org/assets/pages/about/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-
... distributed-e941dd3e345d022ceae909beccccbacd.png)
... """

>>> new_s = "".join(s.strip().split('\n'))
>>> new_s
'![https://diasporafoundation.org/assets/pages/about/network-distributed-e941dd3e345d022ceae909beccccbacd.png](data/images/network-distributed-e941dd3e345d022ceae909beccccbacd.png)'
>>> 

通常情况下,内置的字符串函数就可以了,而且比计算正则表达式更容易阅读。在本例中,strip删除前导和尾随空格,然后split返回换行符之间的项列表,join将它们放回一个字符串中

条带的工作原理类似于修剪的功能。由于需要修剪新线,请使用strip('\n')

fin.readline.strip('\n')

好吧,我终于找到了我要找的东西。通过下面的代码片段,我可以用正则表达式捕获一个字符串,然后对每个字符串应用处理

def remove_newlines(match):
    return "".join(match.group().strip().split('\n'))

links_pattern = re.compile(r'\[([\w\s*:/\-\.]*)\]\(([^()]+)\)')
post['content'] = links_pattern.sub(remove_newlines, post['content'])

谢谢你的回答,如果我的问题不够明确,我很抱歉

相关问题 更多 >