我有一个非常长的序列列表(假设每个长度为16),由0和1组成。e、 g
s = ['0100100000010111', '1100100010010101', '1100100000010000', '0111100011110111', '1111100011010111']
现在我想把每一位都当作一个特征,所以我需要把它转换成numpy数组或pandas数据帧。为了做到这一点,我需要用逗号分隔序列中存在的所有位,这对于大型数据集是不可能的。你知道吗
所以我试着生成字符串中的所有位置:
slices = []
for j in range(len(s[0])):
slices.append((j,j+1))
print(slices)
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9), (9, 10), (10, 11), (11, 12), (12, 13), (13, 14), (14, 15), (15, 16)]
new = []
for i in range(len(s)):
seq = s[i]
for j in range(len(s[i])):
## I have tried both of these LOC but couldn't figure out
## how it could be done
new.append([s[slice(*slc)] for slc in slices])
new.append(s[j:j+1])
print(new)
预期o/p:
new = [[0,1,0,0,1,0,0,0,0,0,0,1,0,1,1,1], [1,1,0,0,1,0,0,0,1,0,0,1,0,1,0,1], [1,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0], [0,1,1,1,1,0,0,0,1,1,1,1,0,1,1,1], [1,1,1,1,1,0,0,0,1,1,0,1,0,1,1,1]]
提前谢谢!!你知道吗
在一行中,没有
for
循环:还是比列表理解慢一点
使用
np.array
构造函数和列表理解:相关问题 更多 >
编程相关推荐