如何将这个字符串化的javascript变量读入Python?

2024-05-19 02:11:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从https://www.simpliowebstudio.com/wp-content/uploads/2014/07/aWfyh1将_pageData从https://www.simpliowebstudio.com/wp-content/uploads/2014/07/aWfyh1读取到Python 2.7.11中,以便使用以下代码进行处理:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
""" Testing _pageData processing. """

import urllib2
import re
import ast
import json
import yaml

BASE_URL = 'https://www.simpliowebstudio.com/wp-content/uploads/2014/07/aWfyh1'

def main():
    """ Do the business. """
    response = urllib2.urlopen(BASE_URL, None)
    results = re.findall('var _pageData = \\"(.*?)\\";</script>', response.read())
    first_result = results[0]
    # These all fail
    data = ast.literal_eval(first_result)
    # data = yaml.load(first_result)
    # data = json.loads(first_result)

if __name__ == '__main__':
    main()

但得到以下错误:

^{pr2}$

var\u pageData的格式如下:

"[[1,true,true,true,true,true,true,true,true,,\"at\",\"\",\"\",1450364255674,\"\",\"en_US\",false,[]\n,\"https://www.google.com/maps/d/viewer?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",\"https://www.google.com/maps/d/embed?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",\"https://www.google.com/maps/d/edit?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",\"https://www.google.com/maps/d/thumbnail?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",,,true,\"https://www.google.com/maps/d/print?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",\"https://www.google.com/maps/d/pdf?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",\"https://www.google.com/maps/d/viewer?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",false,false,\"/maps/d\",\"maps/sharing\",\"//www.google.com/intl/en_US/help/terms_maps.html\",true,\"https://docs.google.com/picker\",[]\n,false,true,[[[\"//www.gstatic.com/mapspro/images/google-my-maps-logo-regular-001.png\",143,25]\n,[\"//www.gstatic.com/mapspro/images/google-my-maps-logo-regular-2x-001.png\",286,50]\n]\n,[[\"//www.gstatic.com/mapspro/images/google-my-maps-logo-small-001.png\",113,20]\n,[\"//www.gstatic.com/mapspro/images/google-my-maps-logo-small-2x-001.png\",226,40]\n]\n]\n,1,\"https://www.gstatic.com/mapspro/_/js/k\\u003dmapspro.gmeviewer.en_US.8b9lQX3ifcs.O/m\\u003dgmeviewer_base/rt\\u003dj/d\\u003d0/rs\\u003dABjfnFWonctWGGtD63MaO3UZxCxF6UPKJQ\",true,true,false,true,\"US\",false,true,true,5,false]\n,[\"mf.map\",\"zBghbRiSwHlg.k2ATNtn6BCk0\",\"Hollywood, FL\",\"\",[-80.16005,26.01043,-80.16005,26.01043]\n,[-80.16005,26.01043,-80.16005,26.01043]\n,[[,\"zBghbRiSwHlg.kq4rrF9BNRIg\",\"Untitled layer\",\"\",[[[\"https://mt.googleapis.com/vt/icon/name\\u003dicons/onion/22-blue-dot.png\\u0026scale\\u003d1.0\"]\n,[]\n,1,1,[[,[26.01043,-80.16005]\n]\n,\"MDZBMzJCQjRBOTAwMDAwMQ~CjISKmdlby1tYXBzcHJvLm1hcHNob3AtbGF5ZXItNDUyOWUwMTc0YzhkNmI2ZBgAKAAwABIZACBawIJBU4Fe8v7vNSoAg0dtnhhVotEBLg\",\"vdb:\",\"zBghbRiSwHlg.kq4rrF9BNRIg\",[26.01043,-80.16005]\n,[0,-32]\n,\"06A32BB4A9000001\"]\n,[[\"Hollywood, FL\"]\n]\n,[]\n]\n]\n,,1.0,true,true,,,,[[\"zBghbRiSwHlg.kq4rrF9BNRIg\",1,,,,\"https://mapsengine.google.com/map/kml?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\\u0026lid\\u003dzBghbRiSwHlg.kq4rrF9BNRIg\",,,,,0,2,true,[[[\"06A32BB4A9000001\",[[[26.01043,-80.16005]\n]\n]\n,[]\n,[]\n,0,[[\"name\",[\"Hollywood, FL\"]\n,1]\n,,[]\n,[]\n]\n,,0]\n]\n,[[[\"https://mt.googleapis.com/vt/icon/name\\u003dicons/onion/22-blue-dot.png\\u0026filter\\u003dff\\u0026scale\\u003d1.0\",[16,32]\n,1.0]\n,[[\"0000FF\",0.45098039215686275]\n,5000]\n,[[\"0000FF\",0.45098039215686275]\n,[\"000000\",0.25098039215686274]\n,3000]\n]\n]\n]\n]\n]\n,[]\n,,,,,1]\n]\n,[2]\n,,,\"mapspro\",\"zBghbRiSwHlg.k2ATNtn6BCk0\",,true,false,false,\"\",2,false,\"https://mapsengine.google.com/map/kml?mid\\u003dzBghbRiSwHlg.k2ATNtn6BCk0\",3807]\n]\n"

我尝试在使用之前替换\”和\n并解码\uxxx,但没有成功。我也尝试过,但没有成功。在

谢谢。在


Tags: httpsimportcomfalsetruepngwwwgoogle
2条回答

我开始使用ast.literal.eval(...),因为我在(搞错了吗?)给人的印象是javascript数组和Python列表是相互兼容的,所以我所要做的就是销毁pageData。在

但是,我没有注意到Python不喜欢,,truefalse或{}。修好它们就行了(谢谢两位炼金术士和托比亚斯·k)

因此,以下方法似乎有效:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
""" Testing _pageData processing. """

import urllib2
import re
import ast
import json
import yaml

BASE_URL = 'https://www.simpliowebstudio.com/wp-content/uploads/2014/07/aWfyh1'

def main():
    """ Do the business. """
    response = urllib2.urlopen(BASE_URL, None)
    results = re.findall('var _pageData = \\"(.*?)\\";</script>', response.read())
    first_result = results[0]
    first_result = first_result.replace(',,,,,,', ',None,None,None,None,None,')
    first_result = first_result.replace(',,,,,', ',None,None,None,None,')
    first_result = first_result.replace(',,,,', ',None,None,None,')
    first_result = first_result.replace(',,,', ',None,None,')
    first_result = first_result.replace(',,', ',None,')
    first_result = first_result.replace('[,', '[None,')                    
    first_result = first_result.replace('\\"', '\'')
    first_result = first_result.replace('\\n', '')    
    first_result = first_result.replace('true', 'True')
    first_result = first_result.replace('false', 'False')  
    data = ast.literal_eval(first_result)
    for entry in  data:
        print entry

if __name__ == '__main__':
    main()

字符串中似乎有三种语法错误:

  • ,后接,
  • [后接,
  • ,后接]

假设那些应该是null元素(或者''?),您只需像处理,,的情况一样替换原始字符串中的那些,但忽略了其他字符串。另外,您必须执行两次,,替换,否则您将错过{}之类的情况。然后,可以使用json.loads加载JSON字符串。在

>>> s = "your messed up json string"
>>> s = re.sub(r",\s*,",  ", null,", s)
>>> s = re.sub(r",\s*,",  ", null,", s)
>>> s = re.sub(r"\[\s*,", "[ null,", s)
>>> s = re.sub(r",\s*\]", ", null]", s)
>>> json.loads(s)

相关问题 更多 >

    热门问题