Python读取整段文本而非逐行读取

1 投票
3 回答
2499 浏览
提问于 2025-04-19 08:20

这是我的代码:

with open(root_dir+"/trials/classify/training_queries.txt","r") as f:
        queries = f.readlines()
        #queries = f.read()

上面的代码是逐行读取文件内容,并对每一行进行处理,得出结果。

我想要一次性显示整个文件的内容(也就是一次性读取整个段落),请问有什么函数可以做到这一点?

我以为 queries = f.read() 可以帮忙,但它是一个字符一个字符地读取。

更新

示例输入:

Hell, the Orioles' Opening Day game could easily be the largest in history
if we had a stadium with 80,000 seats. But unfortunely the Yards (a
definitely excellent ballpark) only holds like 45,000 with 275 SRO spots.
Ticket sales for the entire year is moving fast. Bleacher seats are almost
gone for every game this year. Athist does not believe in any religion whether hinduis islam or chirstianism

输出场景:

使用 readLine() 的时候,它是逐行处理的。

我想要做的是考虑整个文件的内容。

代码片段:

if __name__ == '__main__':
    #CallDomainDetection().callDomainDetection(sys.argv[1])
    root_dir = os.getcwd()
    query_no = 1
    with open(root_dir+"/trials/classify/training_queries.txt","r") as f:
        #queries = f.readlines()  # this processes line in files
        queries = f.read()    # now it consider each character. 
    for qu in queries:
        CallDomainDetection().callDomainDetection(qu)
        if query_no == 40:
            break
        query_no += 1

3 个回答

2

你需要把“段落”定义为由一系列不为空的、没有分隔符的行组成的字符串,这些行之间用不为空的分隔符行隔开,和其他段落也是用不为空的分隔符行分开的。

def paragraphs(lines, is_separator=str.isspace, joiner=''.join):
   paragraph = [ ] 
   for line in lines:
     if is_separator(line):
       if paragraph:
         yield joiner(paragraph)
         paragraph = [ ]
     else:
      paragraph.append(line)
   if paragraph:
       yield joiner(paragraph)
if __name__ == '__main__':
 with open(root_dir+"/trials/classify/training_queries.txt","r") as f:
   queries = f.readlines()
   for p in paragraphs(queries): print repr(p)
2

queries = f.read() 这行代码会把整个文件的内容读进一个叫 queries 的字符串里。只有当你对这个字符串进行循环操作时,才会一个一个地拿到里面的字符(就像 for c in queries: 这样)。

你可以试试看

with open(root_dir+"/trials/classify/training_queries.txt","r") as f:
    queries = f.read()
    print(queries)

然后你会发现 queries 其实就是一个完整的字符串。

3

f.read() 是你需要的。你可能需要用两个换行符来分割它,这样可以把内容分成段落 - split('\n\n')。你描述的情况听起来像是在逐个字符地遍历字符串,这样的话就是一个一个字符地处理。

撰写回答