Pandas在一个细胞内访问一个细胞

2024-05-19 02:53:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用以下链接: https://www.bu.edu/phpbin/course-search/section/?t=casma124

为了关注2020年秋季,我对数据帧进行了索引。你可以看到有数字显示有多少“开放席位”。如果你检查这些数字,你会发现它们在主数字下的卖出量较小。我的python代码输出以下内容:

Section  Open Seats        Instructor Type Location              Schedule  \
0       A1         NaN  Enrique Jariwala  LEC  SCI B23  TR 11:00 am-12:15 pm   
1       A1         NaN  Enrique Jariwala  NaN     ROOM     M 8:00 pm-9:45 pm   
2       B1         NaN  Enrique Jariwala  LEC  SCI B23    TR 5:00 pm-6:15 pm   
3       B1         NaN  Enrique Jariwala  NaN     ROOM     M 8:00 pm-9:45 pm   
4       D1         NaN  Enrique Jariwala  DIS  PSY B39   W 11:15 am-12:05 pm   
5       D2         NaN  Enrique Jariwala  DIS  PSY B39    W 12:20 pm-1:10 pm   
6       D3         NaN  Enrique Jariwala  DIS  PSY B39     W 1:25 pm-2:15 pm   
7       D4         NaN  Enrique Jariwala  DIS  PSY B39     W 2:30 pm-3:20 pm   
8       D5         NaN  Enrique Jariwala  DIS  CAS 218    R 12:30 pm-1:20 pm   
9       D6         NaN  Enrique Jariwala  DIS  CGS 421     R 2:00 pm-2:50 pm   
10      D7         NaN  Enrique Jariwala  DIS  PRB 146     R 3:35 pm-4:25 pm   
11      D8         NaN  Enrique Jariwala  DIS  PRB 150     R 6:30 pm-7:20 pm   
12      DX         NaN  Enrique Jariwala  DIS      NaN             ARR 0: am   
13      L1         NaN  Enrique Jariwala  LAB  SCI 134    M 11:15 am-2:00 pm   
14      L2         NaN  Enrique Jariwala  LAB  SCI 134     T 6:30 pm-9:15 pm   
15      L3         NaN  Enrique Jariwala  LAB  SCI 134    W 8:00 am-10:45 am   
16      L4         NaN  Enrique Jariwala  LAB  SCI 134    W 11:15 am-2:00 pm   
17      L5         NaN  Enrique Jariwala  LAB  SCI 134     W 2:30 pm-5:15 pm   
18      L6         NaN  Enrique Jariwala  LAB  SCI 134     W 6:30 pm-9:15 pm   
19      L7         NaN  Enrique Jariwala  LAB  SCI 134    R 12:30 pm-3:15 pm   
20      L8         NaN  Enrique Jariwala  LAB  SCI 134     R 6:30 pm-9:15 pm   
21      LX         NaN  Enrique Jariwala  LAB      NaN             ARR 0: am   

您可以看到,所有打开的座位都显示为NaN值。是否有一个功能,我可以使用访问的数字。我想要这个号码而不是NaN。这是我的上下文代码

def init_dataframe():

    html_dataframe = pd.read_html(wanted_class_url(course_input))
    dataframe_concatenate = pd.concat(html_dataframe)
    dataframe_semester = html_dataframe[-1]
    dataframe_locate_class = dataframe_semester.loc[:, ]

    return dataframe_locate_class

谢谢你的帮助


Tags: dataframehtmllab数字nanamclassdis
1条回答
网友
1楼 · 发布于 2024-05-19 02:53:51

这里有一个有趣的问题:数据帧显示NaN而不是数字的原因是网站在加载后,只有HTML部分确实是空的。只有在脚本view-section.js运行后(在本地浏览器中),才会填充值。因此,为了从脚本中获取相同的数据,您必须检索与网站相同的数据。草图:

检索每个“部分”的开放式座椅。幸运的是,端点openseats.php接受如下课程代码数组:

https://www.bu.edu/phpbin/summer/rpc/openseats.php?sections[]=2020SPRGCASMA124%20B7

(显然,无论您要求使用哪种代码,它都会返回所有课程的开放座位。所以现在一个查询就足够了。)

结果是以下JSON对象:

{"time_secs":0.20295810699463,"results":{"2020SPRGCASMA124 A1":"133","2020SPRGCASMA124 A2":"133","2020SPRGCASMA124 A3":"134","2020SPRGCASMA124 B1":"60","2020SPRGCASMA124 B2":"60","2020SPRGCASMA124 B3":"60","2020SPRGCASMA124 B4":"40","2020SPRGCASMA124 B5":"60","2020SPRGCASMA124 B6":"60","2020SPRGCASMA124 B7":"60","2020SPRGCASMA193 A1":"100","2020SPRGCASMA213 A1":"112","2020SPRGCASMA213 B1":"23","2020SPRGCASMA213 B2":"23","2020SPRGCASMA213 B3":"22","2020SPRGCASMA213 B4":"22","2020SPRGCASMA213 B5":"22","2020SPRGCASMA213 C1":"37","2020SPRGCASMA213 C2":"37","2020SPRGCASMA213 C3":"38"}}

将其转换为数据帧,现在只需.join(..)两个数据帧。但是等等,您的原始表缺少神秘的课程代码。不幸的是,这些只出现在某些表单元格的data-section="..."属性中

非常不幸的是,目前获取这些信息的最佳方法是自己进行HTML解析。切入点:from bs4 import BeautifulSoup(+这里有许多现有问题)

我希望这能让你开始

相关问题 更多 >

    热门问题