python脚本将文件分为两部分,分别命名

2024-06-09 06:31:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图用d3.js和crossfilter编写一个地图可视化代码,现在我有一个大文件和一些有害的行,破坏了整个过程。你知道吗

我想创建一个文件,将我的输入数据分成两半,这样我就可以缩小问题的来源,从而消除它,同时保持我的理智。你知道吗

输入数据如下所示:

http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI2ODE3ODgifQ.1u6YvzMuu_HbWqRaMwFd8zYNP43w7wYFnRbl5r2qSoY,C# Developer,Connectus,Chesterton,52.202499,0.131237,United Kingdom,statistics,1
http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI2ODk1ODIifQ.jxcx56YcDm-4nmB8VvoIGQKew4yquszeaPon60hcDKs,Senior Java Developer,Redhill,Godstow,51.784375,-1.308003,United Kingdom,java|metadata,1
http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI2OTEyMjIifQ.qK3xtYQDxRpKJkNargPu6Jef4njm2fSZnNIVulRHoqA,Software Development Manager,Spring Technology ,Woolstone,52.042198,-0.7047,United Kingdom,software development|sdlc|data analysis,1
http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI4NDM1MzgifQ.pYnBX-APPdB3edTRC_M8x6usmBq_GfIxcdZOXSLJN04,Data Scientists Python R Scala Java or Matlab,Aspire Data Recruitment,East Boldon,54.94452,-1.42815,United Kingdom,data science|java|python|scala|matlab|analysis,1
http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI4NzM4NTMifQ.mgRKEZh-0GLUXQmZ9Bp6H10haZNAieIKAH1uoWV63YU,Data Analyst - Programmatic Tech Company,Ultimate Asset Limited,London,51.50853,-0.12574,United Kingdom,data analysis|analysis|statistics,1

因此,在我的想法中,我会平均分配它,这样我就可以:

http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI2ODE3ODgifQ.1u6YvzMuu_HbWqRaMwFd8zYNP43w7wYFnRbl5r2qSoY,C# Developer,Connectus,Chesterton,52.202499,0.131237,United Kingdom,statistics,1
http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI2ODk1ODIifQ.jxcx56YcDm-4nmB8VvoIGQKew4yquszeaPon60hcDKs,Senior Java Developer,Redhill,Godstow,51.784375,-1.308003,United Kingdom,java|metadata,1
http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI2OTEyMjIifQ.qK3xtYQDxRpKJkNargPu6Jef4njm2fSZnNIVulRHoqA,Software Development Manager,Spring Technology ,Woolstone,52.042198,-0.7047,United Kingdom,software development|sdlc|data analysis,1

还有这个:

http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI2OTEyMjIifQ.qK3xtYQDxRpKJkNargPu6Jef4njm2fSZnNIVulRHoqA,Software Development Manager,Spring Technology ,Woolstone,52.042198,-0.7047,United Kingdom,software development|sdlc|data analysis,1
http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI4NDM1MzgifQ.pYnBX-APPdB3edTRC_M8x6usmBq_GfIxcdZOXSLJN04,Data Scientists Python R Scala Java or Matlab,Aspire Data Recruitment,East Boldon,54.94452,-1.42815,United Kingdom,data science|java|python|scala|matlab|analysis,1
http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI4NzM4NTMifQ.mgRKEZh-0GLUXQmZ9Bp6H10haZNAieIKAH1uoWV63YU,Data Analyst - Programmatic Tech Company,Ultimate Asset Limited,London,51.50853,-0.12574,United Kingdom,data analysis|analysis|statistics,1

例如。你知道吗

starting_input.csv这样的约定命名它们就变成了:

starting_input_a.csv

以及

starting_input_b.csv

然后当我想再次运行它时:

starting_input_aa.csv

以及

starting_input_ab.csv

等等。你知道吗

你能理解我的想法吗?你知道吗

我试过这个:

splitLen = 20         # 20 lines per file
outputBase = 'output' # output.1.txt, output.2.txt, etc.

# This is shorthand and not friendly with memory
# on very large files, but it works.
input = open('input.txt', 'r').read().split('\n')

at = 1
for lines in range(0, len(input), splitLen):
    # First, get the list slice
    outputData = input[lines:lines+splitLen]

    # Now open the output file, join the new slice with newlines
    # and write it out. Then close the file.
    output = open(outputBase + str(at) + '.txt', 'w')
    output.write('\n'.join(outputData))
    output.close()

    # Increment the counter
    at += 1

但没用


Tags: csvprojecthttpinputdatawwwanalysisunited
1条回答
网友
1楼 · 发布于 2024-06-09 06:31:01

这里有一个提示。你知道吗

把文件读两遍就行了。一次得到线数,然后再得到上半部分和下半部分。你知道吗

简单的例子。给出5行示例输入:

$ cat /tmp/f1.txt
http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI2ODE3ODgifQ.1u6YvzMuu_HbWqRaMwFd8zYNP43w7wYFnRbl5r2qSoY,C# Developer,Connectus,Chesterton,52.202499,0.131237,United Kingdom,statistics,1
http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI2ODk1ODIifQ.jxcx56YcDm-4nmB8VvoIGQKew4yquszeaPon60hcDKs,Senior Java Developer,Redhill,Godstow,51.784375,-1.308003,United Kingdom,java|metadata,1
http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI2OTEyMjIifQ.qK3xtYQDxRpKJkNargPu6Jef4njm2fSZnNIVulRHoqA,Software Development Manager,Spring Technology ,Woolstone,52.042198,-0.7047,United Kingdom,software development|sdlc|data analysis,1
http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI4NDM1MzgifQ.pYnBX-APPdB3edTRC_M8x6usmBq_GfIxcdZOXSLJN04,Data Scientists Python R Scala Java or Matlab,Aspire Data Recruitment,East Boldon,54.94452,-1.42815,United Kingdom,data science|java|python|scala|matlab|analysis,1
http://www.edsa-project.eu/adzuna/eyJhbGciOiJIUzI1NiJ9.eyJzIjoia0EtLWlpVHhUMUNtSFM0SzE4TUVzUSIsImkiOiIzMzI4NzM4NTMifQ.mgRKEZh-0GLUXQmZ9Bp6H10haZNAieIKAH1uoWV63YU,Data Analyst - Programmatic Tech Company,Ultimate Asset Limited,London,51.50853,-0.12574,United Kingdom,data analysis|analysis|statistics,1

你可以这样做:

def divide(fn):
    # get total lines in file
    with open(fn) as f:
        lc=sum(1 for _ in f)

    with open(fn) as fin:
        # top half of file:
        for i, line in enumerate(fin):
            print line
            if i>=lc/2:
                break
        # middle
        print "======="
        # remainder
        for line in fin:
            print line

这将打印文件顶部的3行,然后是“==========”分隔符,然后是示例的最后2行。你知道吗

不用打印,您可以将“a”和“b”添加到两个文件的基本名称中。重新应用到生成的文件,直到完成。你知道吗

相关问题 更多 >