我不太会写代码。使用slenium和beautifulsoup,我在网页上的几十个脚本中找到了我想要的脚本。我在找剧本[17]。当执行这些代码时,脚本[17]给出如下结果
我的代码的最后一部分
html=driver.page_source
soup=BeautifulSoup(html, "html.parser")
scripts=soup.find_all("script")
x=scripts[17]
print(x)
结果,输出 注意:日期列表在脚本前面[17]。滑动杆。虚拟数据
虚拟数据
<script language="JavaScript"> var theHlp='/yardim/matris.asp';var theTitle = 'Piyasa Değeri';var theCaption='Cam (TL)';var lastmod = '';var h='<a class=hisselink href=../Hisse/HisseAnaliz.aspx?HNO=';var e='<a class=hisselink href=../endeks/endeksAnaliz.aspx?HNO=';var d='<center><font face=symbol size=1 color=#FF0000><b>ß</b></font></center>';var u='<center><font face=symbol size=1 color=#008000><b>İ</b></font></center>';var n='<center><font face=symbol size=1 color=#00A000><b>=</b></font></center>';var fr='<font color=#FF0000>';var fg='<font color=#008000>';var theFooter=new Array();var theCols = new Array();theCols[0] = new Array('Hisse',4,50);theCols[1] = new Array('2012.12',1,60);theCols[2] = new Array('2013.03',1,60);theCols[3] = new Array('2013.06',1,60);theCols[4] = new Array('2013.09',1,60);theCols[5] = new Array('2013.12',1,60);theCols[6] = new Array('2014.03',1,60);theCols[7] = new Array('2014.06',1,60);theCols[8] = new Array('2014.09',1,60);theCols[9] = new Array('2014.12',1,60);theCols[10] = new Array('2015.03',1,60);theCols[11] = new Array('2015.06',1,60);theCols[12] = new Array('2015.09',1,60);theCols[13] = new Array('2015.12',1,60);theCols[14] = new Array('2016.03',1,60);theCols[15] = new Array('2016.06',1,60);theCols[16] = new Array('2016.09',1,60);theCols[17] = new Array('2016.12',1,60);theCols[18] = new Array('2017.03',1,60);theCols[19] = new Array('2017.06',1,60);theCols[20] = new Array('2017.09',1,60);theCols[21] = new Array('2017.12',1,60);theCols[22] = new Array('2018.03',1,60);theCols[23] = new Array('2018.06',1,60);theCols[24] = new Array('2018.09',1,60);theCols[25] = new Array('2018.12',1,60);theCols[26] = new Array('2019.03',1,60);theCols[27] = new Array('2019.06',1,60);theCols[28] = new Array('2019.09',1,60);theCols[29] = new Array('2019.12',1,60);theCols[30] = new Array('2020.03',1,60);var theRows = new Array();
theRows[0] = new Array ('<b>'+h+'30>ANA</B></a>','1,114,919,783.60','1,142,792,778.19','1,091,028,645.38','991,850,000.48','796,800,000.38','697,200,000.34','751,150,000.36','723,720,000.33','888,000,000.40','790,320,000.36','883,560,000.40','927,960,000.42','737,040,000.33','879,120,000.40','914,640,000.41','927,960,000.42','1,172,160,000.53','1,416,360,000.64','1,589,520,000.72','1,552,500,000.41','1,972,500,000.53','2,520,000,000.67','2,160,000,000.58','2,475,000,000.66','2,010,000,000.54','2,250,000,000.60','2,077,500,000.55','2,332,500,000.62','3,270,000,000.87','2,347,500,000.63');
theRows[1] = new Array ('<b>'+h+'89>DEN</B></a>','55,200,000.00','55,920,000.00','45,960,000.00','42,600,000.00','35,760,000.00','39,600,000.00','40,200,000.00','47,700,000.00','50,460,000.00','45,300,000.00','41,760,000.00','59,340,000.00','66,600,000.00','97,020,000.00','81,060,000.00','69,300,000.00','79,800,000.00','68,400,000.00','66,900,000.00','66,960,000.00','71,220,000.00','71,520,000.00','71,880,000.00','60,600,000.00','69,120,000.00','62,640,000.00','57,180,000.00','89,850,000.00','125,100,000.00','85,350,000.00');
theRows[2] = new Array ('<b>'+h+'269>SIS</B></a>','4,425,000,000.00','4,695,000,000.00','4,050,000,000.00','4,367,380,000.00','4,273,120,000.00','3,644,720,000.00','4,681,580,000.00','4,913,000,000.00','6,188,000,000.00','5,457,000,000.00','6,137,000,000.00','5,453,000,000.00','6,061,000,000.00','6,954,000,000.00','6,745,000,000.00','6,519,000,000.00','7,851,500,000.00','8,548,500,000.00','9,430,000,000.00','9,225,000,000.00','10,575,000,000.00','11,610,000,000.00','9,517,500,000.00','13,140,000,000.00','12,757,500,000.00','13,117,500,000.00','11,677,500,000.00','10,507,500,000.00','11,857,500,000.00','9,315,000,000.00');
theRows[3] = new Array ('<b>'+h+'297>TRK</B></a>','1,692,579,200.00','1,983,924,800.00','1,831,315,200.00','1,704,000,000.00','1,803,400,000.00','1,498,100,000.00','1,803,400,000.00','1,884,450,000.00','2,542,160,000.00','2,180,050,000.00','2,069,200,000.00','1,682,600,000.00','1,619,950,000.00','1,852,650,000.00','2,040,600,000.00','2,315,700,000.00','2,641,200,000.00','2,938,800,000.00','3,599,100,000.00','4,101,900,000.00','5,220,600,000.00','5,808,200,000.00','4,689,500,000.00','5,375,000,000.00','3,787,500,000.00','4,150,000,000.00','3,662,500,000.00','3,712,500,000.00','4,375,000,000.00','3,587,500,000.00');
var thetable=new mytable();thetable.tableWidth=650;thetable.shownum=false;thetable.controlaccess=true;thetable.visCols=new Array(true,true,true,true,true);thetable.initsort=new Array(0,-1);thetable.inittable();thetable.refreshTable();</script>
我的目的是将此输出提取到表中,并将其保存为csv文件。如何提取我想要的脚本
所有日期都应位于顶部,所有名称都应位于最右侧,所有值都应介于两者之间
Hisse 2012.12 2013.3 2013.4 ...
ANA 1,114,919,783.60 1,142,792,778.19 1,091,028,645.38 ...
DEN 55,200,000.00 55,920,000.00 45,960,000.00 ....
.
.
.
解决方案
自定义函数
process_scripts()
将生成您要查找的内容。我使用下面给出的虚拟数据(最后)。首先,我们检查代码是否符合预期,因此我们创建一个数据帧来查看输出一,。处理单个脚本
输出:
二,。正在处理所有脚本
您可以使用自定义定义函数
process_scripts()
处理所有脚本。代码如下所示我在GoogleColab上做了这件事,它按预期工作
三,。以操作系统不可知的方式创建路径
为基于windows或unix的系统创建路径可能非常不同。下面向您展示了一种实现这一点的方法,而不必担心将运行代码的操作系统。我在这里用了^{} library 。不过,我建议你也看看^{} library
四,。代码:自定义函数
process_scripts()
在这里,我们使用
regex
(正则表达式)库和pandas
以表格格式组织数据,然后写入csv文件。tqdm
库用于在处理多个脚本时为您提供一个漂亮的progressbar。请查看代码中的注释,了解如果不是从jupyter notebook
运行它,该怎么办。os
库用于路径操作和创建输出目录五,。虚拟数据
根据您问题中提供的
<script>
,您可以执行类似于下面代码的操作,以获得每个名称ANA, DEN ..
的日期列表:这只是一个示例代码,所以没有那么漂亮:)
希望这对你有帮助
相关问题 更多 >
编程相关推荐