合并Ruby、Python、JavaScript和Java两个文本文件的最简单的脚本方法？

<li><a href="/article/"><button class="showarticle"/><span class="author">Thomas Friedman</span> - <span class="title">The World Is Flat</span></a></li> <li><a href="/article/"><button class="showarticle"/><span class="author">Michael Dagleish</span> - <span class="title">Scotland In Wartime</span></a></li> <li><a href="/article/"><button class="showarticle"/><span class="author">Dr. Raymond Kinsella</span> - <span class="title">Progress In Cancer Treatments</span></a></li> ...

<li><a href="/article/thomas-friedman-the-world-is-flat"><button class="showarticle"/><span class="author">Thomas Friedman</span> - <span class="title">The World Is Flat</span></a></li> <li><a href="/article/michael-dagleish-scotland-in-wartime"><button class="showarticle"/><span class="author">Michael Dagleish</span> - <span class="title">Scotland In Wartime</span></a></li> <li><a href="/article/dr-raymond-kinsella-progress-in-cancer-treatments"><button class="showarticle"/><span class="author">Dr. Raymond Kinsella</span> - <span class="title">Progress In Cancer Treatments</span></a></li>

3条回答

网友

1楼 · 编辑于 2024-04-25 22:46:46

红宝石一行：

File.open("joined.txt","w") { |f| f.puts ['file1.txt', 'file2.txt'].map{ |s| IO.read(s) }}

网友

2楼 · 编辑于 2024-04-25 22:46:46

这在任何语言中都很容易。这里是pseudo Python；我省略了lxml位，因为我没有访问它们的权限，并且无法完全记住语法。不过，这并不难。

with open(...) as htmls, open(...) as slugs, open(...) as output:
    for html, slug in zip(htmls, slugs):
        root = lxml.etree.fromstring(html)
        # do some fiddling with lxml to get the name

        slug = slug.split("-")[(len(name.split()):]
        # add in the extra child in lxml

        output.write(root.tostring())

有趣的特点：

这不会一次读取整个文件；它会逐块读取（好吧，逐行读取，但Python会缓冲它）。如果文件很大，但可能不相关，则很有用。
lxml可能有点过头了，这取决于html字符串的格式有多严格。如果保证它们是相同的并且格式都很好，那么使用简单的字符串操作可能会更容易。另一方面，lxml非常快，并且提供了更多的灵活性。

网友

3楼 · 编辑于 2024-04-25 22:46:46

您需要zip函数，这在大多数语言中都是可用的。其目的是对两个或多个阵列进行并行处理。
在Ruby中，它将是这样的：

f1 = File.readlines('file1.txt')
f2 = File.readlines('file2.txt')

File.open('file3.txt','w') do |output_file|

    f1.zip(f2) do |a,b|
        output_file.puts a.sub('/article/','/article/'+b)
    end

end

对于压缩多个数组，可以执行f1.zip(f2,f3,...) do |a,b,c,...|

相关问题更多 >

编程相关推荐

热门问题

热门文章