无法将多行文本作为一个项目添加到列表中

2024-06-08 23:35:17 发布

您现在位置:Python中文网/ 问答频道 /正文

为了调试的目的,我试着通过post列出一个scrapy crawl输出列表。你知道吗

这是我的密码:

post_list = []

with open('last_crawl_output.txt','r') as f:
    crawl_output = f.read()

# Find first 'referer' that indicates start of scrapy crawl AFTER initial crawl of search results page
iter = re.finditer("referer", crawl_output)
referer_list = [m.start(0) for m in iter]

# Find indicator of crawl finished.
iter2 = re.finditer("scrapy", crawl_output)
closing_list = [m.start(0) for m in iter2]

del referer_list[0]

pos1 = referer_list[0]

for pos1 in referer_list:
    # Get largest scrapy index after each referer index.
    pos2_index = bisect.bisect(closing_list, pos1)
    # Get post from positions.
    pos2 = closing_list[pos2_index+1]
    post = crawl_output[pos1:pos2-21]

我也尝试过使用post_list.append(post),但没有效果。你知道吗

[编辑]

下面是一些示例输出。你知道吗

我想添加到post_listhere的字符串

这是我得到的。这里是post_list,添加了帖子:output

当我使用insert时,它用\n分隔


Tags: ofinforoutputindexfindpoststart
2条回答

我决定用我的方法来解决这个列表问题,就像这样:

# Splits post by newline, adds to list
post_lines = post.split('\n')

# Add the words "Next Post" to differentiate each post. 
post_lines.append('Next Post')

# Print each line, and get perfect formatting.
for line in post_lines:
    print line

一个更好的解决方案是在字典中添加帖子。这样可以保留格式并使用较少的代码。你知道吗

post_count = 0
post_dict = {}

for pos1 in referer_list:

    post_count += 1

    pos2_index = bisect.bisect(closing_list, pos1)
    pos2 = closing_list[pos2_index+1]

    post = crawl_output[pos1:pos2-21]

    post_dict[post_count] = post

相关问题 更多 >