如何将文本文件转换为json文件?

2024-04-27 16:38:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我是python新手,我想将文本文件转换为json文件。 下面是它的样子:

#Q Three of these animals hibernate. Which one does not?
^ Sloth
A Mouse
B Sloth
C Frog
D Snake

#Q What is the literal translation of the Greek word Embioptera, which denotes an order of insects, also known as webspinners?
^ Lively wings
A Small wings
B None of these
C Yarn knitter
D Lively wings

#Q There is a separate species of scorpions which have two tails, with a venomous sting on each tail.
^ False
A True
B False

Contd
.
.
.
.

^表示答案

我希望它是json格式的,如下所示。 例如:

{
    "questionBank": [
      {
        "question": "Grand Central Terminal, Park Avenue, New York is the worlds", 
         "a": "largest railway station", 
         "b": "Longest railway station", 
         "c": "highest railway station", 
         "d": "busiest railway station", 
         "answer": "largest railway station"
      }, 
      {
        "question": "Eritrea, which became the 182nd member of the UN in 1993, is in the continent of", 
         "a": "Asia", 
         "b": "Africa",
         "c": "Europe", 
         "d": "Oceania", 
         "answer": "Africa"
      },   Contd.....
 ]
  }

我遇到过一些类似的帖子,下面是我尝试过的:

dataset = "file.txt"
data = []
with open(dataset) as ds:
    for line in ds:
        line = line.strip().split(",")
        print(line)

其输出为:

['']
['#Q What part of their body do the insects from order Archaeognatha use to spring up into the air?']
['^ Tail']
['A Antennae']
['B Front legs']
['C Hind legs']
['D Tail']
['']
['#Q What is the literal translation of the Greek word Embioptera', ' which denotes an order of insects', ' also known as webspinners?']
['^ Lively wings']
['A Small wings']
['B None of these']
['C Yarn knitter']
['D Lively wings']
['']

Contd.... 

包含逗号的句子由python列表分隔。我尝试使用.join,但没有得到预期的结果。
请让我知道如何处理这个问题


Tags: ofthewhichisaslineorderwhat
2条回答

我没有一次处理一行,而是使用正则表达式模式方法

这也更可靠,因为如果输入数据的格式不好,它会出错,而不是默默地忽略缺少字段的分组

PATTERN = r"""[#]Q (?P<question>.+)\n\^ (?P<answer>.+)\nA (?P<option_a>.+)\nB (?P<option_b>.+)\n(?:C (?P<option_c>.+)\n)?(?:D (?P<option_d>.+))?"""


def parse_qa_group(qa_group):
    """
    Extact question, answer and 2 to 4 options from input string and return as a dict.
    """
    # "group" here is a set of question, answer and options.
    matches = PATTERN.search(qa_group)

    # "group" here is a regex group.
    question = matches.group('question')
    answer = matches.group('answer')

    try:
        c = matches.group('option_c')
    except IndexError:
        c = None
    try:
        d = matches.group('option_d')
    except IndexError:
        d = None

    results = {
        "question": question,
        "answer": answer,
        "a": matches.group('option_a'),
        "b": matches.group('option_b')
    }
    if c:
        results['c'] = c

        if d:
            results['d'] = d

    return results


# Split into groups using the blank line.
qa_groups = question_answer_str.split('\n\n')

# Process each group, building up a list of all results.
all_results = [parse_qa_group(qa_group) for qa_group in qa_groups]

print(json.dumps(all_results, indent=4))

详情请参阅mygist。阅读有关regex Grouping的更多信息

我忽略了阅读文本和编写JSON文件

dataset = "text.txt"
question_bank = []

with open(dataset) as ds:
    for i, line in enumerate(ds):
        line = line.strip("\n")
        if len(line) == 0:
            question_bank.append(question)
            question = {}
        elif line.startswith("#Q"):
            question = {"question": line}
        elif line.startswith("^"):
            question['answer'] = line.split(" ")[1]
        else:
            key, val = line.split(" ", 1)
            question[key] = val
    question_bank.append(question)

print({"questionBank":question_bank})

#for storing json file to local directory
final_output = {"questionBank":question_bank}

with open("output.json", "w") as outfile:
    outfile.write(json.dumps(final_output, indent=4))

相关问题 更多 >