如何分解元组列表？问题的回答

如何分解元组列表？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

定义 因子分解：将每个唯一的对象映射为唯一的整数。通常，映射到的整数范围是从零到n-1，其中n是唯一对象的数目。两种变化也是典型的。类型1是按照唯一对象的标识顺序进行编号的位置。类型2首先对唯一对象进行排序，然后应用与类型1相同的过程。 设置 考虑元组列表<code>tups</code> <pre><code>tups = [(1, 2), ('a', 'b'), (3, 4), ('c', 5), (6, 'd'), ('a', 'b'), (3, 4)] </code></pre> 我想把它分解成 ^{pr2}$ 我知道有很多方法可以做到这一点。但是，我想尽可能有效地完成这项工作。在 <hr/> 我尝试过的 <code>pandas.factorize</code>并得到一个错误。。。在 <pre><code>pd.factorize(tups)[0] --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-84-c84947ac948c> in <module>() ----> 1 pd.factorize(tups)[0] //anaconda/envs/3.6/lib/python3.6/site-packages/pandas/core/algorithms.py in factorize(values, sort, order, na_sentinel, size_hint) 553 uniques = vec_klass() 554 check_nulls = not is_integer_dtype(original) --> 555 labels = table.get_labels(values, uniques, 0, na_sentinel, check_nulls) 556 557 labels = _ensure_platform_int(labels) pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_labels (pandas/_libs/hashtable.c:21804)() ValueError: Buffer has wrong number of dimensions (expected 1, got 2) </code></pre> <hr/> 或者<code>numpy.unique</code>得到错误的结果。。。在 <pre><code>np.unique(tups, return_inverse=1)[1] array([0, 1, 6, 7, 2, 3, 8, 4, 5, 9, 6, 7, 2, 3]) </code></pre> <hr/> 我可以在元组的散列上使用它们中的任何一个 <pre><code>pd.factorize([hash(t) for t in tups])[0] array([0, 1, 2, 3, 4, 1, 2]) </code></pre> <hr/> 耶！这就是我想要的。。。有什么问题吗？在 第一个问题 看看这项技术的性能下降 <pre><code>lst = [10, 7, 4, 33, 1005, 7, 4] %timeit pd.factorize(lst * 1000)[0] 1000 loops, best of 3: 506 µs per loop %timeit pd.factorize([hash(i) for i in lst * 1000])[0] 1000 loops, best of 3: 937 µs per loop </code></pre> 第二个问题 哈希不能保证是唯一的！在 <hr/> 问题 什么是对元组列表进行因子分解的超快速方法？在 <hr/> 时间 两个轴都在log空间中 <a href="https://i.stack.imgur.com/zIlh0.png" rel="noreferrer"><img src="https://i.stack.imgur.com/zIlh0.png" alt="enter image description here"/></a> <code>code</code> <pre><code>from itertools import count def champ(tups): d = {} c = count() return np.array( [d[tup] if tup in d else d.setdefault(tup, next(c)) for tup in tups] ) def root(tups): return pd.Series(tups).factorize()[0] def iobe(tups): return np.unique(tups, return_inverse=True, axis=0)[1] def get_row_view(a): void_dt = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:]))) a = np.ascontiguousarray(a) return a.reshape(a.shape[0], -1).view(void_dt).ravel() def diva(tups): return np.unique(get_row_view(np.array(tups)), return_inverse=1)[1] def gdib(tups): return pd.factorize([str(t) for t in tups])[0] from string import ascii_letters def tups_creator_1(size, len_of_str=3, num_ints_to_choose_from=1000, seed=None): c = len_of_str n = num_ints_to_choose_from np.random.seed(seed) d = pd.DataFrame(np.random.choice(list(ascii_letters), (size, c))).sum(1).tolist() i = np.random.randint(n, size=size) return list(zip(d, i)) results = pd.DataFrame( index=pd.Index([100, 1000, 5000, 10000, 20000, 30000, 40000, 50000], name='Size'), columns=pd.Index('champ root iobe diva gdib'.split(), name='Method') ) for i in results.index: tups = tups_creator_1(i, max(1, int(np.log10(i))), max(10, i // 10)) for j in results.columns: stmt = '{}(tups)'.format(j) setup = 'from __main__ import {}, tups'.format(j) results.set_value(i, j, timeit(stmt, setup, number=100) / 100) results.plot(title='Avg Seconds', logx=True, logy=True) </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

如何分解元组列表？

1 个回答

相关Python问题