使用pandas数据框进行主成分分析

Question

我该如何从pandas数据框中的数据计算主成分分析呢？

Answer 1

在编程中，有时候我们会遇到一些问题，想要找到解决办法。比如，有人可能在使用某个工具或编程语言时，遇到了错误或者不明白的地方。这时候，大家会去一个叫StackOverflow的网站上提问，看看有没有人能帮忙。

在这个网站上，很多人会分享他们的经验和解决方案。你可以看到各种各样的问题和答案，甚至有些人会提供代码示例，帮助你更好地理解。

总之，StackOverflow是一个很好的地方，可以让你学习到很多编程知识，也能找到解决问题的方法。

import pandas
from sklearn.decomposition import PCA
import numpy
import matplotlib.pyplot as plot

df = pandas.DataFrame(data=numpy.random.normal(0, 1, (20, 10)))

# You must normalize the data before applying the fit method
df_normalized=(df - df.mean()) / df.std()
pca = PCA(n_components=df.shape[1])
pca.fit(df_normalized)

# Reformat and view results
loadings = pandas.DataFrame(pca.components_.T,
columns=['PC%s' % _ for _ in range(len(df_normalized.columns))],
index=df.columns)
print(loadings)

plot.plot(pca.explained_variance_ratio_)
plot.ylabel('Explained Variance')
plot.xlabel('Components')
plot.show()

Answer 2

大多数的 sklearn 对象都能很好地和 pandas 数据框一起使用，这样的做法对你有用吗？

import pandas as pd
import numpy as np
from sklearn.decomposition import PCA

df = pd.DataFrame(data=np.random.normal(0, 1, (20, 10)))

pca = PCA(n_components=5)
pca.fit(df)

你可以直接访问这些组件，方法是

pca.components_

使用pandas数据框进行主成分分析

2 个回答

撰写回答