Python:如何读取后缀为“.data”的url

2024-05-23 19:34:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从这个url-“https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data”读取数据到一个数据帧中。你知道吗

我用过这个技巧:

 park_df = pd.read_html('https://archive.ics.uci.edu/ml/machine-learning- 
 databases/parkinsons/parkinsons.data', header=0, flavor='bs4')

但我得到一个错误,如下所示:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-18-804373f977ab> in <module>()
----> 1 park_df = pd.read_html('https://archive.ics.uci.edu/ml/machine- 
learning-databases/parkinsons/parkinsons.data', header=0, flavor='bs4')

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\html.py in 
read_html(io, match, flavor, header, index_col, skiprows, attrs, 
parse_dates, tupleize_cols, thousands, encoding, decimal, converters, 
na_values, keep_default_na, displayed_only)
    985                   decimal=decimal, converters=converters, 
na_values=na_values,
    986                   keep_default_na=keep_default_na,
--> 987                   displayed_only=displayed_only)

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\html.py in 
_parse(flavor, io, match, attrs, encoding, displayed_only, **kwargs)
    813             break
    814     else:
--> 815         raise_with_traceback(retained)
    816 
    817     ret = []

~\AppData\Local\Continuum\anaconda3\lib\site- 
packages\pandas\compat\__init__.py in raise_with_traceback(exc, traceback)
    402         if traceback == Ellipsis:
    403             _, _, traceback = sys.exc_info()
--> 404         raise exc.with_traceback(traceback)
    405 else:
    406     # this version of raise is a syntax error in Python 3

ValueError: No tables found

你能告诉我我做错了什么,还有什么更好的选择吗。请打开url来检查数据的外观,标题在第一行(包含列名),数据在后面。你知道吗


Tags: inhttpsioonlyhtmlicsraiseedu
1条回答
网友
1楼 · 发布于 2024-05-23 19:34:24

函数^{}用于将html表转换为数据帧,对于转换csv格式,请使用^{}

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/parkinsons/parkinsons.data'
df = pd.read_csv(url)

print (df.head())

             name  MDVP:Fo(Hz)  MDVP:Fhi(Hz)  MDVP:Flo(Hz)  MDVP:Jitter(%)  \
0  phon_R01_S01_1      119.992       157.302        74.997         0.00784   
1  phon_R01_S01_2      122.400       148.650       113.819         0.00968   
2  phon_R01_S01_3      116.682       131.111       111.555         0.01050   
3  phon_R01_S01_4      116.676       137.871       111.366         0.00997   
4  phon_R01_S01_5      116.014       141.781       110.655         0.01284   

   MDVP:Jitter(Abs)  MDVP:RAP  MDVP:PPQ  Jitter:DDP  MDVP:Shimmer  ...  \
0           0.00007   0.00370   0.00554     0.01109       0.04374  ...   
1           0.00008   0.00465   0.00696     0.01394       0.06134  ...   
2           0.00009   0.00544   0.00781     0.01633       0.05233  ...   
3           0.00009   0.00502   0.00698     0.01505       0.05492  ...   
4           0.00011   0.00655   0.00908     0.01966       0.06425  ...   

   Shimmer:DDA      NHR     HNR  status      RPDE       DFA   spread1  \
0      0.06545  0.02211  21.033       1  0.414783  0.815285 -4.813031   
1      0.09403  0.01929  19.085       1  0.458359  0.819521 -4.075192   
2      0.08270  0.01309  20.651       1  0.429895  0.825288 -4.443179   
3      0.08771  0.01353  20.644       1  0.434969  0.819235 -4.117501   
4      0.10470  0.01767  19.649       1  0.417356  0.823484 -3.747787   

    spread2        D2       PPE  
0  0.266482  2.301442  0.284654  
1  0.335590  2.486855  0.368674  
2  0.311173  2.342259  0.332634  
3  0.334147  2.405554  0.368975  
4  0.234513  2.332180  0.410335  

[5 rows x 24 columns]

相关问题 更多 >