我正在尝试将包含我的网站URL的csv文件转换为基于URL中目录的json树结构。更复杂的是,一个URL结构的深度(目录的NB)可能因URL而异,所以我必须有一个递归函数来处理所有的可能性。你知道吗
例如,以下是我的URL列表:
https://example.com/
https://example.com/page1.html
https://example.com/cocktails/receipe/page1.html
https://example.com/cocktails/receipe/page2.html
https://example.com/cocktails/page3.html
https://example.com/article/magazine
https://example.com/article/mood/page1.html
我想要的是一个JSON树,如下所示:
{
"name": "/",
"children": [{
"name": "page1.html"
},
{
"name": "cocktails",
"children": [{
"name": "recipe",
"children": [{
"name": "page1.html"
},
{
"name": "page2.html"
}
]
},
{
"name": "page3.html"
}
]
},
{
"name": "article",
"children": [{
"name": "mood",
"children": [{
"name": "page1.html"
}]
},
{
"name": "magazine"
}
]
}
]
}
我用Python开始了一段代码,但是在处理子对象的递归方式中遇到了障碍
import json
import re, csv
from collections import OrderedDict
def run() :
root = OrderedDict({
"name": "/",
"children": [],
})
rows = csv.DictReader(open("test.csv"))
for row in rows :
link = row['url']
suffix = re.sub("https?://[^/]*","", link)
parts = [x for x in re.split("[/\?]", suffix) if x != ""]
if len(parts) ==0 :
continue
if len(parts) == 1:
p = parts[0]
if p not in root :
root[p]["children"].append(create_row(p, row))
else :
page = parts[-1]
parts = parts[:-1]
"""
SOME CODE HERE
"""
data = json.dumps(root, indent=4, sort_keys=False)
open("readme.json", "w").write(data)
def create_row(key, row) :
return {"name": key,
"url": row['link'].strip()
}
def key_exists(folders, key) :
return [x for x in folders if x['name'] == key] > 0
if __name__ == "__main__" :
run()
这里不需要递归。您可以通过遍历路径并在运行时附加子级来构建树。你知道吗
伪代码:
下面的程序给出了您的预期输出,我希望它对您来说不太复杂。你知道吗
输出
相关问题 更多 >
编程相关推荐