我正在尝试将一种奇怪的文本文件格式(我很确定它是Clojure哈希映射)转换为json,以加载到熊猫数据帧中。我已经编写了一个函数来实现这一点,但是它用5行代码解决了特定的问题,而且我一辈子都无法用正则表达式在一行代码中实现这一点
这是我正在使用的python环境和包的版本信息: Python版本:3.7.7 熊猫版本:1.1.0 json版本:2.0.9 重新版本:2.2.1
下面是一些示例数据和我编写的函数:
import pandas as pd
import json
import re
data = '[{:lat 38.43222, :lon 27.146801, :name "Izmir", :source "Biraben, as digitized by Buntgen and
by Atanasiu", :year 1837} {:lat 36.80083, :lon 10.1799965, :name "Tunis", :source "Biraben, as
digitized by Buntgen and by Atanasiu", :year 1837} {:lat 30.076834, :lon 31.251078, :name "Kairo",
:source "Biraben, as digitized by Buntgen and by Atanasiu", :year 1841} {:lat 32.116657, :lon
20.066666, :name "Benghazi", :source "Biraben, as digitized by Buntgen and by Atanasiu", :year 1856}
{:lat 32.116657, :lon 20.066666, :name "Benghazi", :source "Biraben, as digitized by Buntgen and by
Atanasiu", :year 1857} {:lat 33.88694, :lon 35.513046, :name "Beyrouth", :source "Biraben, as
digitized by Buntgen and by Atanasiu", :year 1859} {:lat 41.14995, :lon -8.6102295, :name "Porto",
:source "Biraben, as digitized by Buntgen and by Atanasiu", :year 1899} {:lat 41.14995, :lon
-8.6102295, :name "Porto", :source "Biraben, as digitized by Buntgen and by Atanasiu", :year 1900}]'
def clojure_to_json(clojure_text):
# The pattern is wrap the words in quotes and move the colon after the closed quotation marks
clojure_text = clojure_text.replace('{:lat', '{"lat":')
clojure_text = clojure_text.replace(':lon', '"lon":')
clojure_text = clojure_text.replace(':name', '"name":')
clojure_text = clojure_text.replace(':year', '"year":')
clojure_text = clojure_text.replace(':source', '"source":')
clojure_text = clojure_text.replace('} {', '} , {')
return clojure_text
json_data = json.loads(clojure_to_json(data))
df = pd.DataFrame(json_data)
print(df)
谢谢你的帮助
一个regexp,但在问题的评论中不如@Justin简洁:
提供所需的JSON:
相关问题 更多 >
编程相关推荐