擅长:python、mysql、java
<blockquote>
<p>1.Should I perform 'one-hot encoding' on these variables before building a regression model?</p>
</blockquote>
<p>是的,你应该对分类变量进行热编码。您可以像下面这样使用:</p>
<pre><code>columns_to_category = ['sex', 'smoking','DEATH_EVENT']
df[columns_to_category] = df[columns_to_category].astype('category') # change datetypes to category
df = pd.get_dummies(df, columns=columns_to_category) # One hot encoding the categories
</code></pre>
<blockquote>
<p>2.If so, only one hot encoding is sufficient or should I perform even
label encoding?</p>
</blockquote>
<p>我想一个热编码就足够了</p>
<blockquote>
<p>3.Also, I observe the values are in various ranges, so should I even scale the data set before applying the regression model?</p>
</blockquote>
<p>是的,您可以使用<code>StandardScaler()</code>或<code>MinMaxScaler()</code>获得更好的结果,然后反向缩放预测。此外,请确保您单独缩放测试和训练,而不是合并,因为在现实生活中,您的测试将无法实现,因此您需要相应地缩放以避免此类错误</p>