如何从pandas read\u html中读取并平铺/规范化一系列表?

2024-05-21 00:26:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在阅读关于read_html熊猫功能的文章,因为我正在从web中提取一些表,所以当我这样做时:

import pandas as pd
url_mcc = 'link.com.html'
dfs = pd.read_html(url_mcc)
dfs

我得到以下列表:

[                                        Presentation  \
 0  0.4 mg/mL, 1 mL single-dose vial, package of 2...   
 1  1 mg/mL, 1 mL single-dose vial, package of 25 ...   

   Availability and Estimated Shortage Duration  \
 0             Available for NDC 00517-0401-25.   
 1                                    Available   

                                  Related Information  \
 0  American Regent is currently releasing the 0.4...   
 1  American Regent is currently releasing the 1mg...   

    Shortage Reason (per FDASIA)  
 0  Demand increase for the drug  
 1                         Other  ,
                                         Presentation  \
 0  0.1 mg/mL; 10 mL Luer-Jet Prefilled Syringe (N...   

   Availability and Estimated Shortage Duration  Related Information  \
 0                            Product available                  NaN   

    Shortage Reason (per FDASIA)  
 0  Demand increase for the drug  ,
                                         Presentation  \
 0  0.1 mg/mL; 10 mL Ansyr syringe (NDC 0409-1630-10)   
 1  0.05 mg/mL; 5 mL Ansyr syringe (NDC 0409-9630-05)   
 2  0.1 mg/mL; 5 mL Lifeshield syringe (NDC 0409-4...   
 3  0.1 mg/mL; 10 mL Lifeshield syringe (NDC 0409-...   

         Availability and Estimated Shortage Duration  \
 0  Next delivery: Late October. Estimated recover...   
 1         Next delivery: TBD Estimated recovery: TBD   
 2                                          Available   
 3                                          Available   

                                  Related Information  \
 0  Please check with your wholesaler for availabl...   
 1  Please check with your wholesaler for availabl...   
 2               Shortage per Manufacturer: Available   
 3               Shortage per Manufacturer: Available   

   Shortage Reason (per FDASIA)  
 0                        Other  
 1                        Other  
 2                        Other  
 3                        Other  ,
                                Presentation  \
 0  0.4 mg/mL, 20 mL vial (NDC 0641-6006-10)   

   Availability and Estimated Shortage Duration  \
 0           West-Ward has available inventory.   

                                  Related Information  \
 0  Additional lots are scheduled to be manufactur...   

    Shortage Reason (per FDASIA)  
 0  Demand increase for the drug  ]

如您所见的列表(或表格?)有重复的列:PresentationAvailability and Estimated Shortage DurationRelated InformationShortage Reason (per FDASIA),因为网站有3个不同的表具有相同的列。因此,我的问题是如何将所有不同的表或列表平铺或规范化为一个表或列表,大致如下:

[                                        Presentation  \
 0  0.4 mg/mL, 1 mL single-dose vial, package of 2...   
 1  1 mg/mL, 1 mL single-dose vial, package of 25 ...   
 2  1 mg/mL; 10 mL Luer-Jet Prefilled Syringe (N... 
 3  0.1 mg/mL; 10 mL Ansyr syringe (NDC 0409-1630-10)   
 4  0.05 mg/mL; 5 mL Ansyr syringe (NDC 0409-9630-05)   
 5  0.1 mg/mL; 5 mL Lifeshield syringe (NDC 0409-4...   
 6  0.1 mg/mL; 10 mL Lifeshield syringe (NDC 0409-...   



   Availability and Estimated Shortage Duration  \
 0             Available for NDC 00517-0401-25.   
 1                                    Available  
 2                            Product available                  NaN   
 0  Next delivery: Late October. Estimated recover...   
 1         Next delivery: TBD Estimated recovery: TBD   
 2                                          Available   
 3                                          Available  
 0  0.4 mg/mL, 20 mL vial (NDC 0641-6006-10)   

   Availability and Estimated Shortage Duration  \
 0           West-Ward has available inventory.   


    Shortage Reason (per FDASIA)  
 0  Demand increase for the drug  


                                  Related Information  \
 0  American Regent is currently releasing the 0.4...   
 1  American Regent is currently releasing the 1mg...   
 0  Please check with your wholesaler for availabl...   
 1  Please check with your wholesaler for availabl...   
 2               Shortage per Manufacturer: Available   
 3               Shortage per Manufacturer: Available   
 0  Additional lots are scheduled to be manufactur...   


    Shortage Reason (per FDASIA)  
 0  Demand increase for the drug  
 1                         Other  ,



    Shortage Reason (per FDASIA)  
 0  Demand increase for the drug  ,
 0                        Other  
 1                        Other  
 2                        Other  
 3                        Other  ,

Tags: andtheformlavailablereasonothersyringe
2条回答

如果dfsDataFrames的列表,我想您需要^{}

df = pd.concat(dfs)

还可以使用参数ignore_index=True来避免索引中的重复:

df = pd.concat(dfs, ignore_index=True) 

样品:

df1 = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9]})

#print (df1)

df2 = pd.DataFrame({'A':[3,4,6],
                   'B':[2,3,4],
                   'C':[3,6,0]})

#print (df2)

df3 = pd.DataFrame({'A':[4,7,9],
                   'B':[3,4,5],
                   'C':[5,1,9]})

#print (df3)

dfs = [df1,df2,df3]
print (dfs)
[   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9,    A  B  C
0  3  2  3
1  4  3  6
2  6  4  0,    A  B  C
0  4  3  5
1  7  4  1
2  9  5  9]
df = pd.concat(dfs)
print (df)
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9
0  3  2  3
1  4  3  6
2  6  4  0
0  4  3  5
1  7  4  1
2  9  5  9

df1 = pd.concat(dfs, ignore_index=True) 
print (df1)
   A  B  C
0  1  4  7
1  2  5  8
2  3  6  9
3  3  2  3
4  4  3  6
5  6  4  0
6  4  3  5
7  7  4  1
8  9  5  9

相关问题 更多 >