使用python将字符移动到字符串末尾的正则表达式

2024-05-19 03:40:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试将一种奇怪的文本文件格式(我很确定它是Clojure哈希映射)转换为json,以加载到熊猫数据帧中。我已经编写了一个函数来实现这一点,但是它用5行代码解决了特定的问题,而且我一辈子都无法用正则表达式在一行代码中实现这一点

这是我正在使用的python环境和包的版本信息: Python版本:3.7.7 熊猫版本:1.1.0 json版本:2.0.9 重新版本:2.2.1

下面是一些示例数据和我编写的函数:

import pandas as pd
import json
import re

data = '[{:lat 38.43222, :lon 27.146801, :name "Izmir", :source "Biraben, as digitized by Buntgen and 
by Atanasiu", :year 1837} {:lat 36.80083, :lon 10.1799965, :name "Tunis", :source "Biraben, as 
digitized by Buntgen and by Atanasiu", :year 1837} {:lat 30.076834, :lon 31.251078, :name "Kairo", 
:source "Biraben, as digitized by Buntgen and by Atanasiu", :year 1841} {:lat 32.116657, :lon 
20.066666, :name "Benghazi", :source "Biraben, as digitized by Buntgen and by Atanasiu", :year 1856} 
{:lat 32.116657, :lon 20.066666, :name "Benghazi", :source "Biraben, as digitized by Buntgen and by 
Atanasiu", :year 1857} {:lat 33.88694, :lon 35.513046, :name "Beyrouth", :source "Biraben, as 
digitized by Buntgen and by Atanasiu", :year 1859} {:lat 41.14995, :lon -8.6102295, :name "Porto", 
:source "Biraben, as digitized by Buntgen and by Atanasiu", :year 1899} {:lat 41.14995, :lon 
-8.6102295, :name "Porto", :source "Biraben, as digitized by Buntgen and by Atanasiu", :year 1900}]'

def clojure_to_json(clojure_text):
    # The pattern is wrap the words in quotes and move the colon after the closed quotation marks
    clojure_text = clojure_text.replace('{:lat', '{"lat":')
    clojure_text = clojure_text.replace(':lon', '"lon":')
    clojure_text = clojure_text.replace(':name', '"name":')
    clojure_text = clojure_text.replace(':year', '"year":')
    clojure_text = clojure_text.replace(':source', '"source":')
    clojure_text = clojure_text.replace('} {', '} , {')
    return clojure_text



json_data = json.loads(clojure_to_json(data))
df = pd.DataFrame(json_data)
print(df)

谢谢你的帮助


Tags: andtextnamejsonsourcebyasyear
1条回答
网友
1楼 · 发布于 2024-05-19 03:40:33

一个regexp,但在问题的评论中不如@Justin简洁:

import re

data = """\
[{:lat 38.43222, :lon 27.146801, :name "Izmir", :source "Biraben, as digitized by Buntgen and \
by Atanasiu", :year 1837} {:lat 36.80083, :lon 10.1799965, :name "Tunis", :source "Biraben, as \
digitized by Buntgen and by Atanasiu", :year 1837} {:lat 30.076834, :lon 31.251078, :name "Kairo", \
:source "Biraben, as digitized by Buntgen and by Atanasiu", :year 1841} {:lat 32.116657, :lon \
20.066666, :name "Benghazi", :source "Biraben, as digitized by Buntgen and by Atanasiu", :year 1856} \
{:lat 32.116657, :lon 20.066666, :name "Benghazi", :source "Biraben, as digitized by Buntgen and by \
Atanasiu", :year 1857} {:lat 33.88694, :lon 35.513046, :name "Beyrouth", :source "Biraben, as \
digitized by Buntgen and by Atanasiu", :year 1859} {:lat 41.14995, :lon -8.6102295, :name "Porto", \
:source "Biraben, as digitized by Buntgen and by Atanasiu", :year 1899} {:lat 41.14995, :lon \
-8.6102295, :name "Porto", :source "Biraben, as digitized by Buntgen and by Atanasiu", :year 1900}]\
"""
# I copy-pasted your code from SO and it added unwanted newlines, my version is paste-safe

pattern = re.compile(
    r"""
    \{
    :lat\ (?P<lat>(?:-)?\d+.\d+),
    \ 
    :lon\ (?P<lon>(?:-)?\d+.\d+),
    \ 
    :name\ "(?P<name>[^"]+)",
    \ 
    :source\ "(?P<source>[^"]+)",
    \ 
    :year\ (?P<year>\d+)
    \}
    """
    , re.VERBOSE)


def clojure_to_json(clojure_text):
    return [
        {
            key: match.group(index)
            for key, index in pattern.groupindex.items()
        }
        for match in pattern.finditer(clojure_text)
    ]


result = clojure_to_json(data)
print(result)

提供所需的JSON:

[{'lat': '38.43222', 'lon': '27.146801', 'name': 'Izmir', 'source': 'Biraben, as digitized by Buntgen and by Atanasiu', 'year': '1837'}, {'lat': '36.80083', 'lon': '10.1799965', 'name': 'Tunis', 'source': 'Biraben, as digitized by Buntgen and by Atanasiu', 'year': '1837'}, {'lat': '30.076834', 'lon': '31.251078', 'name': 'Kairo', 'source': 'Biraben, as digitized by Buntgen and by Atanasiu', 'year': '1841'}, {'lat': '32.116657', 'lon': '20.066666', 'name': 'Benghazi', 'source': 'Biraben, as digitized by Buntgen and by Atanasiu', 'year': '1856'}, {'lat': '32.116657', 'lon': '20.066666', 'name': 'Benghazi', 'source': 'Biraben, as digitized by Buntgen and by Atanasiu', 'year': '1857'}, {'lat': '33.88694', 'lon': '35.513046', 'name': 'Beyrouth', 'source': 'Biraben, as digitized by Buntgen and by Atanasiu', 'year': '1859'}, {'lat': '41.14995', 'lon': '-8.6102295', 'name': 'Porto', 'source': 'Biraben, as digitized by Buntgen and by Atanasiu', 'year': '1899'}, {'lat': '41.14995', 'lon': '-8.6102295', 'name': 'Porto', 'source': 'Biraben, as digitized by Buntgen and by Atanasiu', 'year': '1900'}]

相关问题 更多 >

    热门问题