将SKLearn 20_新闻组数据集加载到Pandas DataFram

2024-04-25 16:49:56 发布

您现在位置:Python中文网/ 问答频道 /正文

Python:我试图加载sklearn.20\u新闻组数据集sklearn.utils.Bunch进入熊猫数据帧。在

I downloaded datasets the below link

categories = ["alt.atheism", "alt.atheism" ,"comp.os.ms-windows.misc" , "comp.sys.ibm.pc.hardware",
                "comp.sys.mac.hardware" , "comp.windows.x","misc.forsale", "rec.autos","rec.motorcycles",
                "rec.sport.baseball","rec.sport.hockey", "sci.crypt","sci.electronics", "sci.med","sci.space",
                "soc.religion.christian","talk.politics.guns" ,"talk.politics.mideast","talk.politics.misc" ,"talk.religion.misc"]

docs_to_train = sklearn.datasets.load_files("/home/Documents03-04-2019/dataset/20_newsgroups", 
                                      description    = None, 
                                      categories     = categories,
                                      load_content   = True,
                                      encoding       = 'ISO-8859-1',
                                      shuffle        = True,
                                      random_state   = 42)

The below code I treid.

docs_to_train.keys()
data1           = pd.DataFrame(docs_to_train.data, columns=docs_to_train.target_names])
data1['Target'] = pd.Series(data1=docs_to_train.target, index=data1.index)

期望输出 我运行了下面类似的代码,它的工作方式与我需要的新闻组类似,数据帧格式。在

^{pr2}$

Tags: to数据docstrainsklearndatasetsmisccategories
1条回答
网友
1楼 · 发布于 2024-04-25 16:49:56

有几个关键字引用了不相关的代码:您编写了cancer或{},而不是data1,并且有一个不匹配的]。在

试试这个:

data1 = pd.DataFrame(docs_to_train.data, columns=[docs_to_train.target_names])
data1['Target'] = pd.Series(data=docs_to_train.target, index=data1.index)

如果这不起作用,试着用这个代替第二行:

^{pr2}$

相关问题 更多 >