有没有更好的方法可以将字符串转换为Python中的数据集？

2 投票

5 回答

638 浏览

提问于 2025-04-16 11:54

我刚刚在Python上完成了一个课程作业，运行得很好，我对此很满意，但代码看起来实在太丑了！我已经提交了这段代码，因为我们并不需要根据外观来评分，只要它能正常运行就行。不过，我希望能得到一些关于如何将字符串转换为数据集的建议，以便在未来的项目中使用。

输入是一个由节点和边组成的网格，举个例子：

"4:(1,2;4),(2,6;3),(3,7;15),(4,8;1),(5,6;1),(6,7;1),(5,9;9),(6,10;2),(7,11;1),(8,12;23),(9,10;5),(9,13;7),(10,14;6),(11,15;3),(12,16;3),(13,14;4),(15,16;7)"

在这个例子中，冒号前面的第一个数字是网格的大小（4x4），而(1,2;4)表示从节点1到节点2的边，费用是4。接下来的代码将这个输入转换成一个数组，其中array[0]是网格的大小，array[1]是一个字典，格式是(node1,node2)=费用。

def partitionData(line):
finalDic = dict()
#partition the data around the formating
line = line.split(":")
line[1] = line[1].split("),(")
#clean up data some more
line[1][0] = line[1][0][1:]
end = len(line[1])-1
line[1][end] = line[1][end][:len(line[1][end])-2]
#simplify data and organize into a list
for i in range(len(line[1])):
    line[1][i] = line[1][i].split(",")
    line[1][i][1] = line[1][i][1].split(";")
    #clean up list
    for j in range(len(line[1][i])):
        line[1][i].append(line[1][i][1][j])
    del line[1][i][1]
#convert everything to integer to simplify algorithm
for i in range(len(line[1])):
    for j in range(len(line[1][i])):
        line[1][i][j] = int(line[1][i][j])
line[0] = int(line[0])
newData = dict()
for i in range(len(line[1])):
    newData[(line[1][i][0],line[1][i][1])] = line[1][i][2]
line[1] = newData
for i in line[1]:
    if not ((min(i),max(i)) in finalDic):
        finalDic[(min(i),max(i))] = line[1][i]
    else:
        print "There is a edge referenced twice!"
        exit()  
line[1] = finalDic
return line

我一开始写的代码更简洁，但没有考虑到数字可能会大于9。我觉得现在的代码非常丑，肯定有更好看的方法来实现这个功能。

代码优化数据结构字典字符串转换数组处理数据集网格表示边和节点

5 个回答

已经有人提出了一些解决方案：

使用解析器：太复杂了
使用正则表达式：我喜欢这个，但需要懂正则
使用ast模块：很有趣，但也需要了解它

。

我考虑用最简单的方法来解决这个问题，方便初学者理解。而且，我的解决方案表明，Python自带的功能就足够完成这个任务。

。

首先，我会给你展示一下你代码的修改版本，WhiteDawn，这样你就能看到一些非常基础的点，明白它们可以用Python的特性来简化。

比如说，seq是一个序列，seq[len(seq)-1]是它的最后一个元素，但seq[-1]也是最后一个元素。顺便提一下，你的代码里有个错误：我觉得是

line[1][end] = line[1][end][:len(line[1][end])-1]
# not:
line[1][end] = line[1][end][:len(line[1][end])-2]

否则在执行时会出错

另外要注意一个很棒的函数enumerate()

你还需要学习列表的切片：如果li = [45, 12, 78, 96]，那么li[2:3] = [2, 5, 8]会把li变成li = [45, 12, 2, 5, 8, 96]

y = "4:(1,2;4),(2,6;3),(3,7;15),(4,8;1),(5,6;1),(6,7;1),(5,9;9),(6,10;2),(7,11;1),(8,12;23),(9,10;5),(9,13;7),(10,14;6),(11,15;3),(12,16;3),(13,14;4),(15,16;7)"


def partitionData(line):
    finalDic = dict()

    #partition the data around the formating
    print 'line==',line
    line = line.split(":")
    print '\ninstruction :  line = line.split(":")'
    print 'line==',line
    print 'len of line==',len(line),'  (2 strings)'

    print '---------------------'
    line[1] = line[1].split("),(")
    print '\ninstruction :  line[1] = line[1].split("),(")'
    print 'line[1]==',line[1]

    #clean up data some more
    line[1][0] = line[1][0][1:]
    print 'instruction :  line[1][0] = line[1][0][1:]'
    line[1][-1] = line[1][-1][0:-1]
    print 'instruction :  line[1][-1] = line[1][-1][0:-1]'
    print 'line[1]==',line[1]

    print '---------------------'
    #simplify data and organize into a list
    for i,x in enumerate(line[1]):
        line[1][i] = x.split(",")
        line[1][i][1:] = line[1][i][1].split(";")
    print 'loop to clean the data in line[1]'
    print 'line[1]==',line[1]
    print '---------------------'
    #convert everything to integer to simplify algorithm
    print 'convert everything to integer to simplify algorithm'
    for i,x in enumerate(line[1]):
        line[1][i] = map(int,x)

    line[0] = int(line[0])
    print 'line==',line
    print '---------------------'
    newData = dict()
    for a,b,c in line[1]:
        newData[(a,b)] = c
    line[1] = newData
    print 'line==',line



    print '---------------------'
    for i in line[1]:
        print 'i==',i,'  (min(i),max(i))==',(min(i),max(i))
        if not ((min(i),max(i)) in finalDic):
            finalDic[(min(i),max(i))] = line[1][i]
        else:
            print "There is a edge referenced twice!"
            exit()
    line[1] = finalDic
    print '\nline==',line
    return line


print partitionData(y)

。

其次，我的解决方案：

y = "4:(1,2;4),(2,6;3),(3,7;15),(4,8;1),(5,6;1),(6,7;1),(5,9;9),(6,10;2),(7,11;1),(8,12;23),(9,10;5),(9,13;7),(10,14;6),(11,15;3),(12,16;3),(13,14;4),(15,16;7)"


# line[1]== {(1, 2): 4, (5, 9): 9, (2, 6): 3, (6, 7): 1, (4, 8): 1, (5, 6): 1, (6, 10): 2, (9, 10): 5, (13, 14): 4, (11, 15): 3, (10, 14): 6, (9, 13): 7, (12, 16): 3, (7, 11): 1, (3, 7): 15, (8, 12): 23, (15, 16): 7}

def partitionData(line):
    finalDic = dict()
    #partition the data around the formating
    print '\nline==',line

    line = line.split(":")
    print '\ninstruction:\n   line = line.split(":")'
    print 'result:\n   line==',line
    print '\n----------------------------------------------------'

    print '\nline[1]==',line[1]

    line[1] = line[1][1:-1].replace(";",",")
    print '\ninstruction:\n   line[1] = line[1][1:-1].replace(";",",")'
    print 'result:\n   line[1]==',line[1]

    line[1] = [ x.split(",") for x in line[1].split("),(") ]
    print '\ninstruction:\n   line[1] = [ x.split(",") for x in line[1].split("),(") ]'
    print 'result:\n   line[1]==',line[1]

    line = [int(line[0]),dict( ((int(a),int(b)),int(c)) for (a,b,c) in line[1] ) ]
    print '\ninstruction:\n   line = [int(line[0],dict( ((int(a),int(b)),int(c)) for (a,b,c) in line[1] ) ]'
    print 'result:\n   line[1]==',line[1]         


    for i in line[1]:
        if not ((min(i),max(i)) in finalDic):
            finalDic[(min(i),max(i))] = line[1][i]
        else:
            print "There is a edge referenced twice!"
            exit()
    line[1] = finalDic
    print '\nline[1]==',line[1]


    return line


print partitionData(y)

我让最后的FinalDict保持不变，因为我不明白这段代码的作用。如果i是一对整数，那么(min(i),max(i))其实就是这对整数本身。

回答于 2025-04-16 由 Python大师

分享举报

import re
data = "4:(1,2;4),(2,6;3),(3,7;15),(4,8;1),(5,6;1),(6,7;1),(5,9;9),(6,10;2),(7,11;1),(8,12;23),(9,10;5),(9,13;7),(10,14;6),(11,15;3),(12,16;3),(13,14;4),(15,16;7)"
temp = data.split(":")    # split into grid size and rest
array = [int(temp[0]),{}] # first item: grid size
# split the rest of the string (from the second to the second-to-last characters)
# along the delimiters ");("
for item in temp[1][1:-1].split("),("):
    numbers = re.split("[,;]", item)          # split item along delimiters , or ;
    k1, k2, v = (int(num) for num in numbers) # and convert to int
    array[1][(k1,k2)] = v                     # populate the array
print array

[4, {(1, 2): 4, (5, 9): 9, (2, 6): 3, (6, 7): 1, (4, 8): 1, (5, 6): 1, (6, 10):2, (9, 10): 5, (13, 14): 4, (11, 15): 3, (10, 14): 6, (9, 13): 7, (12, 16): 3, (7, 11): 1, (3, 7): 15, (8, 12): 23, (15, 16): 7}]

结果是

回答于 2025-04-16 由 Python大师

分享举报

import re

# regular expression for matching a (node1,node2;cost)
EDGE = re.compile(r'\((\d+),(\d+);(\d+)\)')

def parse(s):
    # Separate size from the list of edges
    size, edges = s.split(':')

    # Build a dictionary
    edges = dict(
        # ...where key is (node1,node2) and value is (cost)
        # (all converted to integers)
        ((int(node1),int(node2)),int(cost))

        # ...by iterating the edges using the regular expression
        for node1,node2,cost in EDGE.findall(edges))

    return int(size),edges

例子：

>>> test = "4:(1,2;4),(2,6;3),(3,7;15),(4,8;1),(5,6;1),(6,7;1),(5,9;9),(6,10;2),(7,11;1),(8,12;23),(9,10;5),(9,13;7),(10,14;6),(11,15;3),(12,16;3),(13,14;4),(15,16;7)"
>>> parse(test)
(4, {(1, 2): 4, (5, 9): 9, (2, 6): 3, (6, 7): 1, (4, 8): 1, (5, 6): 1, (6, 10): 2, (9, 10): 5, (13, 14): 4, (11, 15): 3, (10, 14): 6, (9, 13): 7, (12, 16): 3, (7, 11): 1, (3, 7): 15, (8, 12): 23, (15, 16): 7})

回答于 2025-04-16 由 Python大师

分享举报

有没有更好的方法可以将字符串转换为Python中的数据集？

5 个回答

撰写回答