如何使用组的不同组合，同时试图获得最多的浏览量问题的回答

如何使用组的不同组合，同时试图获得最多的浏览量

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<a href="https://pandas.pydata.org/" rel="nofollow noreferrer">^{<cd1>}</a>绝对是处理详细表格数据的goto库。对于那些寻求非<code>pandas</code>选项的人，您可以构建自己的映射和还原函数。我使用这些术语的含义如下： <ul> <li>映射：重新组织按所需查询分组的数据</li> <li>归约函数：一种聚合函数，用于将多个值归并成一个值</li> </ul> <code>pandas</code>类似的groupby/聚合概念。你知道吗 给定的 用单个分隔符替换多个空格的已清理数据，例如<code>","</code>。你知道吗 <pre><code>%%file "test.txt" status,gender,age_range,occ,rating ma,M,young,student,PG ma,F,adult,teacher,R sin,M,young,student,PG sin,M,adult,teacher,R ma,M,young,student,PG sin,F,adult,teacher,R </code></pre> 代码 <pre><code>import csv import collections as ct </code></pre> 步骤1：读取数据 <pre><code>def read_file(fname): with open(fname, "r") as f: reader = csv.DictReader(f) for line in reader: yield line iterable = [line for line in read_file("test.txt")] iterable </code></pre> 输出 <pre><code>[OrderedDict([('status', 'ma'), ('gender', 'M'), ('age_range', 'young'), ('occ', 'student'), ('rating', 'PG')]), OrderedDict([('status', 'ma'), ('gender', 'F'), ('age_range', 'adult'), ...] ... ] </code></pre> 第2步：重新映射数据 <pre><code>def mapping(data, column): """Return a dict of regrouped data.""" dd = ct.defaultdict(list) for d in data: key = d[column] value = {k: v for k, v in d.items() if k != column} dd[key].append(value) return dict(dd) mapping(iterable, "gender") </code></pre> 输出 <pre><code>{'M': [ {'age_range': 'young', 'occ': 'student', 'rating': 'PG', ...}, ...] 'F': [ {'status': 'ma', 'age_range': 'adult', ...}, ...] } </code></pre> 第3步：减少数据 <pre><code>def reduction(data): """Return a reduced mapping of Counters.""" final = {} for key, val in data.items(): agg = ct.defaultdict(ct.Counter) for d in val: for k, v in d.items(): agg[k][v] += 1 final[key] = dict(agg) return final reduction(mapping(iterable, "gender")) </code></pre> 输出 <pre><code>{'F': { 'age_range': Counter({'adult': 2}), 'occ': Counter({'teacher': 2}), 'rating': Counter({'R': 2}), 'status': Counter({'ma': 1, 'sin': 1})}, 'M': { 'age_range': Counter({'adult': 1, 'young': 3}), 'occ': Counter({'student': 3, 'teacher': 1}), 'rating': Counter({'PG': 3, 'R': 1}), 'status': Counter({'ma': 2, 'sin': 2})} } </code></pre> 演示 有了这些工具，您可以构建数据管道并查询数据，将一个函数的结果输入到另一个函数中： <pre><code># Find the top age range amoung males pipeline = reduction(mapping(iterable, "gender")) pipeline["M"]["age_range"].most_common(1) # [('young', 3)] # Find the top ratings among teachers pipeline = reduction(mapping(iterable, "occ")) pipeline["teacher"]["rating"].most_common() # [('R', 3)] # Find the number of married people pipeline = reduction(mapping(iterable, "gender")) sum(v["status"]["ma"] for k, v in pipeline.items()) # 3 </code></pre> 总的来说，您可以根据如何定义缩减函数来定制输出。你知道吗 注意，这个通用过程的代码比<a href="https://stackoverflow.com/questions/48680608/function-to-return-the-highest-count-value-using-a-rule">former example</a>更冗长，尽管它对许多数据列有强大的应用。<code>pandas</code>简洁地封装了这些概念。虽然学习曲线最初可能更陡峭，但它可以大大加快数据分析。你知道吗 <hr/> 细节 <ol> <li>读取数据-我们使用<a href="https://docs.python.org/3/library/csv.html#csv.DictReader" rel="nofollow noreferrer">^{<cd6>}</a>解析清理文件的每一行，它将头名称作为字典的键来维护。这种结构便于按名称访问列。你知道吗</li> <li>重新映射数据-我们将数据分组为字典。 <ul> <li>键是选定/查询列中的项，例如<code>"M"</code>、<code>"F"</code>。你知道吗</li> <li>每个值都是一个字典列表。每个字典表示一行所有剩余的列数据（不包括键）。你知道吗</li> </ul></li> <li>Reduce data—我们通过将所有列出的字典的相关条目制成表格，来聚合重新映射的数据的值。将<a href="https://docs.python.org/3/library/collections.html#collections.defaultdict" rel="nofollow noreferrer">^{<cd9>}</a>和<a href="https://docs.python.org/3/library/collections.html#collections.Counter" rel="nofollow noreferrer">^{<cd10>}</a>组合在一起可以构建一个优秀的简化数据结构，其中<code>defaultdict</code>的新条目初始化<code>Counter</code>，而重复的条目只是记录观察结果。你知道吗</li> </ol> 应用程序 管道是可选的。在这里，我们将构建一个处理串行请求的函数： <pre><code>def serial_reduction(iterable, val_queries): """Return a `Counter` that is reduced after serial queries.""" q1, *qs = val_queries val_to_key = {v:k for k, v in iterable[0].items()} values_list = mapping(iterable, val_to_key[q1])[q1] counter = ct.Counter() # Process queries for dicts in each row and build a counter for q in qs: try: for row in values_list[:]: if val_to_key[q] not in row: continue else: reduced_vals = {v for v in row.values() if v not in qs} for val in reduced_vals: counter[val] += 1 except KeyError: raise ValueError("'{}' not found. Try a new query.".format(q)) return counter c = serial_reduction(iterable, "ma M young".split()) c.most_common() # [('student', 2), ('PG', 2)] serial_reduction(iterable, "ma M young teacher".split()) # ValueError: 'teacher' not found. Try a new query. </code></pre>

如何使用组的不同组合，同时试图获得最多的浏览量

1 个回答

相关Python问题