用Python从文件中提取文本

0 投票

4 回答

1877 浏览

提问于 2025-04-17 05:00

我有一个文本文件，我在我的Python代码中打开它。我想在文件中搜索，并提取出特定符号后面的文本。比如，我的文本文件名是File.txt，内容是：

你好，这只是一个虚拟文件，里面的信息毫无实质，我想提取出美元符号之间的信息。所以这里所有的$符号之间的内容都应该被提取出来，这样我可以随意处理它 $ 而剩下的部分将是第二组。

这是我代码的一个示例：

class FileExtract(object):
    __init__(self):
        pass

    def extractFile(self):
        file = open(File.txt)
        wholeFile = file.read()
        file.close()
        symCount = wholefile.count("$") 
        count = 0 #Will count the each $ as it finds it
        begin = False #determines which the $ has been found and begin to start copying word
        myWant = [] #will add the portion I want
        for word in wholeFile.split():
            while(count != symCount):
                if word != "$" and begin == False:
                    break
                if word == "$" and begin == False:
                    myWant.append(word)
                    begin = True
                    count = count + 1 #it found one of the total symbols
                    break
                elif word != "$" and begin == True:
                    myWant.append(word)
                    break
                elif word == "$" and begin == True:
                    begin = False
                    break
        print myWant

我希望它能打印出：

"$ in between here should be pulled out so I can do what ever I want to with it" 
"$ and the rest of this will be a second group."

这是我能想到的唯一提取文本的方法（我知道这很糟糕，请轻点，我只是刚开始学习）。问题是我的方法把它放进了一个列表里，我希望它能直接打印出字符串，保留空格、换行符等等。有没有什么建议或者我忽略的其他内置函数/方法可以帮助我？

正则表达式编程技巧数据提取字符串操作文件处理文本提取文本分析信息处理

4 个回答

其实这很简单。我们不需要用到分割（split）或者把结果存储在列表里：

def extractFile(self):
    file = open(File.txt)
    wholeFile = file.read()
    file.close()

    pos = wholeFile.find("$")
    while pos > 0:
        pos2 = wholeFile.find("$")

        if pos2 > 0:
            print wholeFile[pos:pos2]
        else:
            print wholeFile[pos:]
        pos = pos2

回答于 2025-04-17 由 Python大师

分享举报

你可以用 wholefile.split('$') 这个方法来处理文件内容，这样你会得到一个包含三个部分的列表：第一个$之前的内容、两个$之间的内容，以及第二个$之后的内容。（而且列表里不会有$符号。）

或者你也可以用 print '\n$'.join(wholefile.split('$')) 这个方法来打印结果。

如果你想要一个简单的函数，可以参考下面的代码：

def extract_file(filename):
    return '\n$'.join(open(filename).read().split('$'))

回答于 2025-04-17 由 Python大师

分享举报

s = "Hello, this is just a dummy file that has information with no substance at all and I want to pull the information between the dollar sign symbols. So all of this $ in between here should be pulled out so I can do what ever I want to with it $ and the rest of this will be a second group."

a = s.split("$")[1:]
print a

http://ideone.com/tt9np

当然，分隔符不会出现在结果中，但你自己加上去是非常简单的。

回答于 2025-04-17 由 Python大师

分享举报

用Python从文件中提取文本

4 个回答

撰写回答