解析numpy数组的stringrepresentation

2024-04-24 13:37:01 发布

您现在位置:Python中文网/ 问答频道 /正文

如果我只有numpy.array的字符串表示:

>>> import numpy as np
>>> arr = np.random.randint(0, 10, (10, 10))
>>> print(arr)  # this one!
[[9 4 7 3]
 [1 6 4 2]
 [6 7 6 0]
 [0 5 6 7]]

如何将其转换回numpy数组?实际上手动插入,并不复杂,但我正在寻找一种编程方法。在

{cdex>用单个数字替换空白^实际上是有效的:

^{pr2}$

它可以转换为几乎相同的数组(数据类型可能丢失,但没关系):

>>> import ast
>>> np.array(ast.literal_eval(sub)[0])
array([[8, 6, 2, 4, 0, 2],
       [3, 5, 8, 4, 5, 6],
       [4, 6, 3, 3, 0, 3]])

但对于多位数整数和浮点运算则失败:

>>> re.sub('\s+', ',', """[[ 0.  1.  6.  9.  1.  4.]
... [ 4.  8.  2.  3.  6.  1.]]
... """)
'[[,0.,1.,6.,9.,1.,4.],[,4.,8.,2.,3.,6.,1.]],'

因为它们在开头有一个额外的,。在

解决方案不一定需要基于regex,任何其他适用于无桥(不使用...)bool/int/float/complex数组的1-4维的方法都可以。在


Tags: 方法字符串importnumpyasnprandom数组
2条回答

更新:

np.array(ast.literal_eval(re.sub(r'\]\s*\[',
                                 r'],[',
                                 re.sub(r'(\d+)\s+(\d+)', 
                                        r'\1,\2', 
                                        a.replace('\n','')))))

测试:

^{pr2}$

旧答案:

我们可以尝试使用熊猫:

import io
import pandas as pd

In [294]: pd.read_csv(io.StringIO(a.replace('\n', '').replace(']', '\n').replace('[','')),
                      delim_whitespace=True, header=None).values
Out[294]:
array([[ 0.96725219,  0.01808783,  0.63087793,  0.45407222,  0.30586779,  0.04848813,  0.01797095],
       [ 0.87762897,  0.07705762,  0.33049588,  0.91429797,  0.5776607 ,  0.18207652,  0.2355932 ],
       [ 0.68803166,  0.31540537,  0.92606902,  0.83542726,  0.43457601,  0.44952604,  0.35121332],
       [ 0.14366487,  0.23486924,  0.16421432,  0.27709387,  0.19646975,  0.8243488 ,  0.37708642],
       [ 0.07594925,  0.36608386,  0.02087877,  0.07507932,  0.40005067,  0.84625563,  0.62827931],
       [ 0.63662663,  0.41408688,  0.43447501,  0.22135816,  0.58944708,  0.66456168,  0.5871466 ],
       [ 0.16807584,  0.70981667,  0.18597074,  0.02034372,  0.94706437,  0.61333699,  0.8444439 ]])

注意:它可能只适用于没有...(省略号)的2D数组

这里有一个非常手动的解决方案:

import re
import numpy

def parse_array_str(array_string):
    tokens = re.findall(r'''             # Find all...
                            \[         | # opening brackets,
                            \]         | # closing brackets, or
                            [^\[\]\s]+   # sequences of other non-whitespace characters''',
                        array_string,
                        flags = re.VERBOSE)
    tokens = iter(tokens)

    # Chomp first [, handle case where it's not a [
    first_token = next(tokens)
    if first_token != '[':
        # Input must represent a scalar
        if next(tokens, None) is not None:
            raise ValueError("Can't parse input.")
        return float(first_token)  # or int(token), but not bool(token) for bools

    list_form = []
    stack = [list_form]

    for token in tokens:
        if token == '[':
            # enter a new list
            stack.append([])
            stack[-2].append(stack[-1])
        elif token == ']':
            # close a list
            stack.pop()
        else:
            stack[-1].append(float(token))  # or int(token), but not bool(token) for bools

    if stack:
        raise ValueError("Can't parse input - it might be missing text at the end.")

    return numpy.array(list_form)

或者是一个不太手动的解决方案,基于检测插入逗号的位置:

^{pr2}$

相关问题 更多 >