<p>我有一组句子,我想把它们分组,这样一个组中的所有行应该共享一个特定的单词。然而,一个句子可以属于许多组,因为它有很多单词。在</p>
<p>所以在下面的例子中,应该有这样一个组:</p>
<ul>
<li>“温度”组,包括所有行(0、1、2、3和4)</li>
<li>包含第2行和第4行的“冻结”组</li>
<li>包含行0、1、2和3的“the”组</li>
<li>仅包含行0的“metal”组。在</li>
<li>数据集中每隔一个单词分组</li>
</ul>
<pre><code>import pandas as pd
# An example data set
df = pd.DataFrame({"sentences": [
"two long pieces of metal fixed together, each of which bends a different amount when they are both heated to the same temperature",
"the temperature at which a liquid boils",
"a system for measuring temperature that is part of the metric system, in which water freezes at 0 degrees and boils at 100 degrees",
"a unit for measuring temperature. Measurements are often expressed as a number followed by the symbol °",
"a system for measuring temperature in which water freezes at 32º and boils at 212º"
]})
# Create a new series which is a list of words in each "sentences" column
df['words'] = df['sentences'].apply(lambda sentence: sentence.split(" "))
# Try to group by this new column
df.groupby('words').count()
# TypeError: unhashable type: 'list'
</code></pre>
<p><strike>但是我的代码抛出了一个错误,如图所示。</strike>(见下文)
由于我的任务有点复杂,我知道它可能不仅仅涉及调用groupby()。有人能帮我用熊猫做单词组吗?在</p>
<p><em>编辑</em>通过返回<code>tuple(sentence.split())</code>解决错误后,我尝试打印结果,但它似乎没有做任何事情。我想它可能只是把每一行放在一个组中:</p>
^{pr2}$