请求.get显示不同于Chrome开发人员T的HTML

2024-04-25 05:29:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用python(特别是jupyter笔记本)开发一个web抓取工具,它可以抓取一些不动产页面,并保存价格、地址等数据

对于我挑选的一个页面来说,它工作得很好,但是当我试图抓取这个页面时:sreality.cz(抱歉,该页面是捷克语,但实际内容现在不是那么重要)使用reguests.获取()我得到这个结果:

<!doctype html> <html lang="{{ html.lang }}" ng-app="sreality" ng-controller="MainCtrl"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width,initial-scale=1,minimal-ui"> <!--- Nastaveni meta pres JS a ne pres Angular, aby byla nastavena default hodnota pro agenty co nezvladaji PhantomJS ---> <title ng:bind-template="{{metaSeo.title}}">Sreality.cz • reality a nemovitosti z celé ČR</title> <meta name="description" content="Největší nabídka nemovitostí v ČR. Nabízíme byty, domy, novostavby, nebytové prostory, pozemky a další reality k prodeji i pronájmu. Sreality.cz"> <meta property="og:title" content="Sreality.cz • reality a nemovitosti z celé ČR"> <meta property="og:type" content="website"> <meta property="og:image" content="https://www.sreality.cz/img/sreality-logo-og.png"> <meta property="og:description" content="Největší nabídka nemovitostí v ČR. Nabízíme byty, domy, novostavby, nebytové prostory, pozemky a další reality k prodeji i pronájmu. Sreality.cz"> <meta property="og:url" content="https://www.sreality.cz/"> <meta ng-if="metaStatus.value" name="szn:status" content="{{metaStatus.value}}"> <meta http-equiv="imagetoolbar" content="no"> <link rel="icon" sizes="16x16 32x32 48x48 64x64" href="/img/icons/favicon.ico"> <link rel="apple-touch-icon" sizes="57x57" href="/img/icons/apple-touch-icon-57x57.png?3"> <link rel="apple-touch-icon" sizes="60x60" href="/img/icons/apple-touch-icon-60x60.png?3"> <link rel="apple-touch-icon" sizes="72x72" href="/img/icons/apple-touch-icon-72x72.png?3"> <link rel="apple-touch-icon" sizes="76x76" href="/img/icons/apple-touch-icon-76x76.png?3"> <link rel="apple-touch-icon" sizes="114x114" href="/img/icons/apple-touch-icon-114x114.png?3"> <link rel="apple-touch-icon" sizes="120x120" href="/img/icons/apple-touch-icon-120x120.png?3"> <link rel="apple-touch-icon" sizes="144x144" href="/img/icons/apple-touch-icon-144x144.png?3"> <link rel="apple-touch-icon" sizes="152x152" href="/img/icons/apple-touch-icon-152x152.png?3"> <link rel="apple-touch-icon" sizes="180x180" href="/img/icons/apple-touch-icon-180x180.png?3"> <link rel="icon" type="image/png" sizes="192x192" href="/img/icons/android-chrome-192x192.png"> <link rel="icon" type="image/png" sizes="32x32" href="/img/icons/favicon-32x32.png"> <link rel="icon" type="image/png" sizes="96x96" href="/img/icons/favicon-96x96.png"> <link rel="icon" type="image/png" sizes="16x16" href="/img/icons/favicon-16x16.png"> <link rel="manifest" href="/img/icons/android-chrome-manifest.json"> <meta name="msapplication-TileColor" content="#2b5797"> <meta name="msapplication-TileImage" content="/img/icons/ms-icon-144x144.png"> <meta name="msapplication-config" content="/img/icons/browserconfig.xml" /> <link rel="alternate" type="application/rss+xml" ng-href="{{ rss.url }}" ng-if="rss.url"> <link ng-repeat="lang in metaSeo.languages" rel="alternate" hreflang="{{lang.code}}" ng-href="{{lang.url}}"> <link rel="stylesheet" href="/css/all.css?2e96626"> <!-- Begin Inspectlet Embed Code --> <script type="text/javascript" id="inspectletjs"> window.__insp = window.__insp || []; __insp.push(['wid', 821249485]); __insp.push(["virtualPage"]); (function() { function ldinsp(){if(typeof window.__inspld != "undefined") return; window.__inspld = 1; var insp = document.createElement('script'); insp.type = 'text/javascript'; insp.async = true; insp.id = "inspsync"; insp.src = ('https:' == document.location.protocol ? 'https' : 'http') + '://cdn.inspectlet.com/inspectlet.js'; var x = document.getElementsByTagName('script')[0]; x.parentNode.insertBefore(insp, x); }; setTimeout(ldinsp, 500); document.readyState != "complete" ? (window.attachEvent ? window.attachEvent('onload', ldinsp) : window.addEventListener('load', ldinsp, false)) : ldinsp(); })(); </script> <!-- End Inspectlet Embed Code --> <!--[if lte IE 8]> <script> document.createElement('popover'); document.createElement('mortgage'); document.createElement('vendor'); document.createElement('hp-signpost'); document.createElement('category-switcher'); document.createElement('feedback'); document.createElement('bottom'); document.createElement('panorama'); document.createElement('panorama-prev'); document.createElement('sphere-viewer'); document.createElement('sphere-viewer-prev'); document.createElement('save-filter'); </script> <![endif]--> <!-- Statistiky --> <script src="https://h.imedia.cz/js/dot-small.js" type="text/javascript"></script> <script type="text/javascript"> (function() { try { // Při přesměrování na hashbang URL (IE8-9) ztrácíme referrer, // který je potřeba pro správné počítání statistik. if (window.sessionStorage) { // někdo může mít DOM storage zakázaný var l = document.createElement('a'); l.href = document.referrer; var referrerHostname = l.hostname; if (window.location.hostname != referrerHostname) { window.sessionStorage.setItem('referrer', l.href); } } // Starý android (< 4.0) v kombinaci s angularem špatně pracuje s hashem v URL. // Považuje ho za součást query případně path. // Na takových zařízech se budeme tvářit, že žádný hash nebyl. if (parseInt((/android (\d+)/.exec(window.navigator.userAgent.toLowerCase()) || [])[1], 10) < 4) { var hrefWithoutHashbang = window.location.href.replace('/#!', ''); var hashIndex = hrefWithoutHashbang.indexOf('#'); if (hashIndex != -1) { window.location.replace(hrefWithoutHashbang.substring(0, hashIndex)); } } } catch (e) {} })(); </script> <!-- API mapy.cz --> <script type="text/javascript" src="https://api4.mapy.cz/loader.js"></script> <script type="text/javascript">Loader.load(null, {poi: true, pano: true})</script> <!-- Login reklama --> <script src="https://i.imedia.cz/js/im3.js" type="text/javascript"></script> <script src="https://1.im.cz/software/promo/promo-sbrowser.js"></script> <!-- Rozkopírování SID cookie --> <script src="https://h.imedia.cz/js/sid.js"></script> <!-- Login --> <script src="https://login.szn.cz/js/api/login.js"></script> <script> login.cfg({ serviceId: "sreality" }); </script> <!-- KONFIGURACE --> <script src="/js/conf/config.js?2e96626"></script> <script src="/js/advert.js"></script> <script src="/js/all.js?2e96626"></script> <script type="text/javascript"> if (window.DOT) { var dotCfg = { service: 'sreality' }; if (window.SrealityABTest && window.SrealityABTest.getVariant()) { dotCfg.abtest = window.SrealityABTest.getVariant(); } DOT.cfg(dotCfg); } </script> <noscript> <meta http-equiv="refresh" content="0;url=?_escaped_fragment_="/> </noscript> <meta name="fragment" content="!" ng-if="metaSeo.showMetaFragment" /> </head> <!--[if IE 8]> <body class="ie8"> <![endif]--> <!--[if IE 9]> <body class="notie8 ie9"> <![endif]--> <!--[if gt IE 9]><!--> <body class="notie8 notie9 lang-{{html.lang}}"> <!--<![endif]--> <div loading-line></div> <div page-layout> <div ng-view></div> </div> </body> </html>

虽然这与我在Chrome的开发工具中看到的页面不同,但有一部分代码在这里(整个代码不适合这里,uploadtext由于某些原因无法正常工作):

我可以从第一个html代码中看到请求.get下载页面运行的一些脚本可能会导致html不同。在

我已经尝试过使用urllib,但是结果html doc还是一样的。在

有没有办法下载我在Chromes的开发工具中打开页面时看到的html,这样我就可以抓取它了?在


Tags: appleimgpngjslinkscriptcontentwindow
1条回答
网友
1楼 · 发布于 2024-04-25 05:29:25

如果最终数据来自您所追求的那个页面,那么使用selenium和BeautifulSoup可以非常容易地获得它。它给你所有公寓的链接。在

^{1}$

相关问题 更多 >