如何从文件路径中截取字符串的Python方法

2024-04-25 14:12:49 发布

您现在位置:Python中文网/ 问答频道 /正文

在python2中,如何限制从目录导入所有txt文件的字符串长度?比如字长=6000

import glob

raw_text = ""
path = "/workspace/simple/*.txt"

for filename in glob.glob(path):
    with open(filename, 'r') as f:
        for line in f:
            raw_text += line

words = raw_text.split()
print(words)

此代码仅输入所有txt文件并在屏幕上打印。如何将其限制为6000字并且只打印6000字?你知道吗


Tags: 文件path字符串textin目录txtfor
3条回答

这取决于你对一个词的定义。如果只是用空格隔开的文本,那就相当简单了:当单词经过时数一数,当你有足够的单词时就停下来。例如:

    word_limit = 6000
    word_count = 0
    for line in f:
        word_count += len(line.split())
        if word_count > word_limit:
            break
        raw_text += line

如果您希望精确6000个单词,可以修改循环,从最后一行中获取足够的单词,使其精确到6000个单词。你知道吗

如果你想让它更有效一点,那么把原始文本放到循环中,一行一行地构建单词

        line_words = line.split()
        words.extend(line_words)

在这种情况下,您将要使用len(行单词)进行检查。你知道吗

假设你想从每个文件中得到6000个或更少的单词?你知道吗

import glob, sys

path = sys.argv[1]
count = int(sys.argv[2]) if len(sys.argv) > 2 else 60
words = []

for file in glob.glob(path):
    with open(file) as f: 
        words += f.read().split()[:count]

print(words)

>>>python test.py "/workspace/simple/*.txt" 6000

您还可以为要归档的单词设置词典:

import glob, sys

path = sys.argv[1]
count = int(sys.argv[2]) if len(sys.argv) > 2 else 60
fwords = {}

for file in glob.glob(path):
    with open(file) as f: 
        fwords[file] = f.read().split()[:count]

print(fwords)

如果你只想要有字数的文件

for file in glob.glob(path):
    with open(file) as f: 
        tmp = f.read().split()
        if len(tmp) == count :  # only the count 
            fwords[file] = tmp
import glob

raw_text = ""
path = "/workspace/simple/*.txt"

for filename in glob.glob(path):
    with open(filename, 'r') as f:
            for line in f:
                if len(raw_text.split())< N:  ###here you put your number
                    raw_text += line
                else:
                    break
words = raw_text.split()
print(words)

相关问题 更多 >