关于使用urllib(也可能是beautifulsou)的web抓取

2024-05-08 00:05:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我的网站:link

我要分析的标记:START-<p id="p-1">,FINISH-</p>

我的代码:

from urllib import urlopen
from bs4 import BeautifulSoup
import re

html = urlopen('http://mansci.journal.informs.org/gca?gca=mansci%3B6%2F2%2F141&gca=mansci%3B6%2F2%2F149&gca=mansci%3B6%2F2%2F165&gca=mansci%3B6%2F2%2F172&gca=mansci%3B6%2F2%2F187&gca=mansci%3B6%2F2%2F191&gca=mansci%3B6%2F2%2F197&gca=mansci%3B6%2F2%2F205&gca=mansci%3B6%2F2%2F215&submit=Get+All+Checked+Abstracts').read()

a = re.compile('<p id="p-1">(.*)</p>')
b = re.findall(a,html)

我遇到的问题是,我的代码看起来一行一行,我不知道如何解析整个段落。你知道吗


Tags: 代码from标记importreid网站html
1条回答
网友
1楼 · 发布于 2024-05-08 00:05:33

使用beautifulsoup,然后执行以下操作:

from urllib2 import urlopen
from bs4 import BeautifulSoup

soup = BeautifulSoup(urlopen(your_url).read())
print soup.find('p', {'id': 'p-1'}).text

给。。。你知道吗

The possibility of measuring does not necessarily lead to the presentation of relevant information for decision-making in business. This is demonstrated by reference to accounting methods and to profit computation in particular. Accounting processes have become formalized to the point where they misrepresent financial results and position; the probability that resources will be used efficiently and that equity between parties of interest will be served is materially reduced by lack of care in the definition of significant concepts and the concurrent acceptance of procedures which have directly opposite justifications and consequences. As the speed of information processing increases and computational refinements develop, a corresponding effort is necessary to redefine in operationally relevant terms, or to sharpen the definition of, such key concepts as profit, capital, cost. The history of the development of accounting and auxiliary calculations illustrates the consequences of permitting a measuring and communicating system to become institutionalized. Some suggestions for improving the relevance of accounting and similar information are made.

相关问题 更多 >