擅长:python、mysql、java
<h3><code>np.array_split</code></h3>
<p>如果您想推广到<code>n</code>拆分,<code>np.array_split</code>是您的朋友(它可以很好地处理数据帧)</p>
<pre><code>fractions = np.array([0.6, 0.2, 0.2])
# shuffle your input
df = df.sample(frac=1)
# split into 3 parts
train, val, test = np.array_split(
df, (fractions[:-1].cumsum() * len(df)).astype(int))
</code></pre>
<hr/>
<h3><code>train_test_split</code></h3>
<p>使用<a href="http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split" rel="noreferrer">^{<cd4>}</a>进行分层拆分的多风解决方案</p>
<pre><code>y = df.pop('diagnosis').to_frame()
X = df
</code></pre>
<p/>
<pre><code>X_train, X_test, y_train, y_test = train_test_split(
X, y,stratify=y, test_size=0.4)
X_test, X_val, y_test, y_val = train_test_split(
X_test, y_test, stratify=y_test, test_size=0.5)
</code></pre>
<p>其中<code>X</code>是功能的数据帧,<code>y</code>是标签的单列数据帧</p>