回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我有一个HTML文件,我可以用Python卷曲和下载。但是,我不知道如何从中获得我想要的数据。我用BS从XML文件中获取值,但从来没有这样的东西。以下是我试图阅读和获取的文件部分:</p>
<pre><code><script>
var AC = {};
AC.org_json =
{
"id": "manager",
"children": [
{
"id": "employee1",
"children": [],
"data": {
"direct_reports": 0,
"badge_color": "F",
"badge_url": "https://someurl",
"full_name": "Employee1 Name",
"job_title": "Employee Job Title",
"department_name": "IT",
"building": "SITE1",
"phone": null,
"expanded": false
}
},
{
"id": "employee2",
"children": [],
"data": {
"direct_reports": 0,
"badge_color": "F",
"badge_url": "https://someurl",
"full_name": "Employee2 Name",
"job_title": "Employee Job Title",
"department_name": "IT",
"building": "SITE1",
"phone": null,
"expanded": false
}
},
......continues for however many entries there are.
</script>
</code></pre>
<p>目标是获取每个条目的“id”和“职务”。我只是需要一些帮助,从正确的方向开始。感谢您的帮助。非常感谢。你知道吗</p>
<p>编辑:
我能够从HTML文件中分离出标签中的数据。你知道吗</p>
<pre><code>from bs4 import BeautifulSoup
#opens data file
get_data = open(html,'r').read()
soup = BeautifulSoup(get_data)
title = soup.find("div", id="content")
json_data = title.find_next("script")
print json_data
</code></pre>
<p>它给出了上面的精确输出。下一个问题是如何从这些数据中获取值?
如果我这样做了:</p>
<pre><code>data = json.loads(json_data)
print data
</code></pre>
<p>然后我得到:<code>ValueError: No JSON object could be decoded</code></p>