将文本文件分成两个不同的部分

2024-05-15 13:52:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我编写了一个简单的脚本,从JSON文件中收集标题列表,并生成一个包含该列表的文本文件

结果如下:

Animal geography
Autobiogeography
Chorography
Economic geography
Footloose industry
Geomorphometry
Health geography
Human geography
Military geography
Philosophy of geography
Physical geography
Political geography
Regional geography
Satirical cartography
Settlement geography
Transport geography
Vernacular geography
Visual geography
Category:Cartography
Category:Economic geography
Category:Geodemography
Category:Human geography
Category:Military geography
Category:Physical geography
Category:Political geography
Category:Regional geography
Category:Settlement geography
Category:Topography
Category:Toponymy
Category:Transportation geography
Category:Vernacular geography
Category:Geography by place  

问题:

我现在面临的问题是如何将文本文件分为两部分:

第一部分是文本文件,包含:

Animal geography
Autobiogeography
Chorography
Economic geography
Footloose industry
Geomorphometry
Health geography
Human geography
Military geography
Philosophy of geography
Physical geography
Political geography
Regional geography
Satirical cartography
Settlement geography
Transport geography
Vernacular geography
Visual geography

以及第二个文本文件,其中包含以单词类别开头的文本文件:

Category:Cartography
Category:Economic geography
Category:Geodemography
Category:Human geography
Category:Military geography
Category:Physical geography
Category:Political geography
Category:Regional geography
Category:Settlement geography
Category:Topography
Category:Toponymy
Category:Transportation geography
Category:Vernacular geography
Category:Geography by place  

我完全不知道怎么做。请给我建议

抱歉,标题太混乱了。我不知道如何解释我的问题

谢谢你

编辑

例如,我已经从这个API(https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3ABranches%20of%20geography&cmlimit=100)中提取了所有标题:

{  
   "batchcomplete":"",
   "query":{  
      "categorymembers":[  
         {  
            "pageid":5259784,
            "ns":0,
            "title":"Animal geography"
         },
         {  
            "pageid":8670379,
            "ns":0,
            "title":"Autobiogeography"
         },
         {  
            "pageid":4254743,
            "ns":0,
            "title":"Chorography"
         },
         {  
            "pageid":177512,
            "ns":0,
            "title":"Economic geography"
         },
         {  
            "pageid":7907104,
            "ns":0,
            "title":"Footloose industry"
         },
         {  
            "pageid":5155886,
            "ns":0,
            "title":"Geomorphometry"
         },
         {  
            "pageid":2596739,
            "ns":0,
            "title":"Health geography"
         },
         {  
            "pageid":13372,
            "ns":0,
            "title":"Human geography"
         },
         {  
            "pageid":1794929,
            "ns":0,
            "title":"Military geography"
         },
         {  
            "pageid":5886597,
            "ns":0,
            "title":"Philosophy of geography"
         },
         {  
            "pageid":23263,
            "ns":0,
            "title":"Physical geography"
         },
         {  
            "pageid":1845092,
            "ns":0,
            "title":"Political geography"
         },
         {  
            "pageid":711230,
            "ns":0,
            "title":"Regional geography"
         },
         {  
            "pageid":42099944,
            "ns":0,
            "title":"Satirical cartography"
         },
         {  
            "pageid":33566568,
            "ns":0,
            "title":"Settlement geography"
         },
         {  
            "pageid":9710174,
            "ns":0,
            "title":"Transport geography"
         },
         {  
            "pageid":24644075,
            "ns":0,
            "title":"Vernacular geography"
         },
         {  
            "pageid":5329197,
            "ns":0,
            "title":"Visual geography"
         },
         {  
            "pageid":716309,
            "ns":14,
            "title":"Category:Cartography"
         },
         {  
            "pageid":2021084,
            "ns":14,
            "title":"Category:Economic geography"
         },
         {  
            "pageid":2245786,
            "ns":14,
            "title":"Category:Geodemography"
         },
         {  
            "pageid":1111700,
            "ns":14,
            "title":"Category:Human geography"
         },
         {  
            "pageid":7774333,
            "ns":14,
            "title":"Category:Military geography"
         },
         {  
            "pageid":2153059,
            "ns":14,
            "title":"Category:Physical geography"
         },
         {  
            "pageid":1898464,
            "ns":14,
            "title":"Category:Political geography"
         },
         {  
            "pageid":6645804,
            "ns":14,
            "title":"Category:Regional geography"
         },
         {  
            "pageid":44706236,
            "ns":14,
            "title":"Category:Settlement geography"
         },
         {  
            "pageid":6517504,
            "ns":14,
            "title":"Category:Topography"
         },
         {  
            "pageid":1086902,
            "ns":14,
            "title":"Category:Toponymy"
         },
         {  
            "pageid":41335672,
            "ns":14,
            "title":"Category:Transportation geography"
         },
         {  
            "pageid":24727902,
            "ns":14,
            "title":"Category:Vernacular geography"
         }
      ]
   }
}

如果你能给我指出解决这个问题的正确方向,我真的很感激

感谢大家的帮助和指导。


Tags: 标题titlenshuman文本文件politicalcategoryphysical
3条回答

要测试文件中的行是否以“Category:”开头,只需执行以下操作:

with open("file.txt", "r") as f:
    for line in f.read().splitlines():
        if line[0:8] == "Category":
            <here your code that writes "Category:" lines in a new file>
        else:
            <here your code that writes all other lines in a new file>

谢谢李凯因斯基让我用“in”

f1 = open('List.text', 'r')
f2 = open('WordWithCat.text', 'w')
f3 = open('WordwithoutCat.text', 'w')
query = 'Category:'
lines = f1.read().splitlines()

for  line in lines:

    if query in line:
        f2.write(line+'\n')

    else:

        f3.write(line+'\n')

结果没有我想象的那么复杂。谢谢大家的帮助和指导

你可以试试这个:

with open('file.txt', 'r') as f:

    data = []
    category = []

    lines = f.readlines()

    for line in lines:
        if line.startswith('Category'):
            category.append(line)
        else:
            data.append(line)

    cat_file = open('category.txt', 'w')
    data_file = open('data.txt', 'w')

    cat_file.write(''.join(category))
    data_file.write(''.join(data))

    cat_file.close()
    data_file.close()

这将逐行读取文件file.txt,并测试它是否以“Category”开头。如果是这样,它会将行添加到category数组,如果不是,则添加到data数组

处理完文件后,程序合并所有行并将它们写入category.txt和data.txt

希望能有所帮助

相关问题 更多 >