如何删除文件中两个重复块中的一个?

2024-06-16 12:41:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我有个难题。我知道有那么多的Python大师。所以请帮帮我。我有一个巨大的日志文件。格式如下:

[text hello world yadda

          lines lines lines

          exceptions]

[something i'm not interested in]

[text hello world yadda

          lines lines lines

          exceptions]

等等。。。 所以1区和3区是一样的。像这样的情况还有很多。我的问题是如何读取这个文件并在输出文件中只写入唯一的块?如果有副本,应该只写一次。有时在两个重复的块之间有多个块。我实际上是模式匹配,这是目前为止的代码。它只匹配模式,但对重复项不做任何操作。在

^{pr2}$

我不在乎这是否必须在这个代码中(删除重复)。它也可以在单独的.py文件中。没关系 这是日志文件的原始片段:

javax.xml.ws.soap.SOAPFaultException: Uncaught BPEL fault http://schemas.xmlsoap.org/soap/envelope/:Server     
    at org.apache.axis2.jaxws.marshaller.impl.alt.MethodMarshallerUtils.createSystemException(MethodMarshallerUtils.java:1326) ~[org.apache.axis2.jar:na]
    at org.apache.axis2.jaxws.marshaller.impl.alt.MethodMarshallerUtils.demarshalFaultResponse(MethodMarshallerUtils.java:1052) ~[org.apache.axis2.jar:na]
    at org.apache.axis2.jaxws.marshaller.impl.alt.DocLitBareMethodMarshaller.demarshalFaultResponse(DocLitBareMethodMarshaller.java:415) ~[org.apache.axis2.jar:na]
    at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.getFaultResponse(JAXWSProxyHandler.java:597) ~[org.apache.axis2.jar:na]
    at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.createResponse(JAXWSProxyHandler.java:537) ~[org.apache.axis2.jar:na]
    at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.invokeSEIMethod(JAXWSProxyHandler.java:403) ~[org.apache.axis2.jar:na]
    at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.invoke(JAXWSProxyHandler.java:188) ~[org.apache.axis2.jar:na]
com.hcentive.utils.exception.HCRuntimeException: Unable to Find User Profile:null
    at com.hcentive.agent.service.AgentServiceImpl.getAgentByUserProfile(AgentServiceImpl.java:275) ~[agent-service-core-4.0.0.jar:na]
    at com.hcentive.agent.service.AgentServiceImpl$$FastClassByCGLIB$$e3caddab.invoke(<generated>) ~[cglib-2.2.jar:na]
    at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:191) ~[cglib-2.2.jar:na]
    at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:689) ~[spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150) ~[spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:110) ~[spring-tx-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172) ~[spring-aop-3.1.2.RELEASE.jar:3.1.2.RELEASE]
    at org.springframework.security.access.intercept.aopalliance.MethodSecurityInterceptor.invoke(MethodSecurityInterceptor.java:64) ~[spring-security-core-3.1.2.RELEASE.jar:3.1.2.RELEASE]
javax.xml.ws.soap.SOAPFaultException: Uncaught BPEL fault http://schemas.xmlsoap.org/soap/envelope/:Server      
    at org.apache.axis2.jaxws.marshaller.impl.alt.MethodMarshallerUtils.createSystemException(MethodMarshallerUtils.java:1326) ~[org.apache.axis2.jar:na]
    at org.apache.axis2.jaxws.marshaller.impl.alt.MethodMarshallerUtils.demarshalFaultResponse(MethodMarshallerUtils.java:1052) ~[org.apache.axis2.jar:na]
    at org.apache.axis2.jaxws.marshaller.impl.alt.DocLitBareMethodMarshaller.demarshalFaultResponse(DocLitBareMethodMarshaller.java:415) ~[org.apache.axis2.jar:na]
    at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.getFaultResponse(JAXWSProxyHandler.java:597) ~[org.apache.axis2.jar:na]
    at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.createResponse(JAXWSProxyHandler.java:537) ~[org.apache.axis2.jar:na]
    at org.apache.axis2.jaxws.client.proxy.JAXWSProxyHandler.invokeSEIMethod(JAXWSProxyHandler.java:403) ~[org.apache.axis2.jar:na]  



And so on and on....

Tags: orgclientreleaseapachejavaatjarproxy
2条回答

可以使用以下命令删除重复块:

import re
yourstr = r'''
[text hello world yadda

      lines lines lines

      exceptions]

[something i'm not interested in]

[text hello world yadda

      lines lines lines

      exceptions]
'''
pat = re.compile(r'\[([^]]+])(?=.*\[\1)', re.DOTALL)
result = pat.sub('', yourstr)

请注意,只保留最后一个块,如果需要第一个块,则必须反转字符串并使用以下模式:

^{pr2}$

然后再把绳子倒过来。在

您可以使用hashlib中的哈希算法和如下所示的字典:{123456789:True} 这个值并不重要,但如果是一个大文件,dict会使它比列表快得多。在

不管怎样,你可以散列每个块,只要它不在字典中,就把它存储在字典中。如果它在字典中,则忽略该块。假设你的块结构完全相同。在

相关问题 更多 >