“TypeError:应为字符串或缓冲区”，带有BeautifulSoup

2024-05-20 02:04:16 发布

您现在位置：Python中文网/ 问答频道 /正文

4027

网友

男 | 程序猿一只，喜欢编程写python代码。

我对Python和beauthoulsoup比较陌生。我需要简单地解析某个请求的响应，我使用urllib2（用于请求和响应）和BeautifulSoup4进行解析。在

我以前用过这些没有任何问题。然而，对于这个特殊的项目，我奇怪地得到了错误。在

以下是我编写的代码的一部分：

class WebLogin(object):
def __init__(self, username, password, targetSite, loginUrl):
    # url for website we want to log in to
    self.base_url = targetSite;
    self.loginUrl = self.base_url + loginUrl;

    # user supplied username and password
    self.username = username;
    self.password = password;

    # file for storing cookies
    self.cookie_file = 'login.cookies'

    # set up a cookie jar to store cookies
    self.cj = cookielib.MozillaCookieJar(self.cookie_file)
    # set up opener to handle cookies, redirects etc
    self.opener = urllib2.build_opener(
        urllib2.HTTPRedirectHandler(),
        urllib2.HTTPHandler(debuglevel=0),
        urllib2.HTTPSHandler(debuglevel=0),
        urllib2.HTTPCookieProcessor(self.cj)
    );

    # pretend we're a web browser and not a python script
    self.opener.addheaders = [
    ('User-agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:35.0) Gecko/20100101 Firefox/35.0')
    ];

    # open the front page of the website to set and save initial cookies and to retrieve the csrf token (__FK is the csrf token here) required for login
    response = self.opener.open(self.base_url);
    self.cj.save();
    print response.info().get('Content-Encoding');
    initialResponse = response.read();
    print "\nResponse received for \n\n",initialResponse;

    # parsing the response for csrf token
    soup = BeautifulSoup(initialResponse);

现在我在印刷的时候 print“\n收到的响应为\n\n”，initialResponse；上面，我得到了一个正确的HTML响应。但是，当我试着 soup=BeautifulSoup（initialResponse）；在上面，我得到了以下错误：

Error

请告诉我，这里出什么事了？我错过了什么？为什么我不能用响应。读取() ? 在

我尝试过使用.decode（'utf-8'）以防它在摆弄东西，但这并没有改善情况。在

如果上述快照不清晰，则再次出现错误：

Traceback (most recent call last):
File "flipkartLogin.py", line 64, in WebLogin(username, password, targetSite, loginUrl);
File "flipkartLogin.py", line 43, in init soup = BeautifulSoup(initialResponse);
File "/Library/Python/2.7/site-packages/bs4/init.py", line 172, in init self._feed()
File "/Library/Python/2.7/site-packages/bs4/init.py", line 185, in _feed self.builder.feed(self.markup)
File "/Library/Python/2.7/site-packages/bs4/builder/_htmlparser.py", line 146, in feed parser.feed(markup)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py", line 114, in feed self.goahead(0)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py", line 158, in goahead k = self.parse_starttag(i)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py", line 324, in parse_starttag self.handle_starttag(tag, attrs)
File "/Library/Python/2.7/site-packages/bs4/builder/_htmlparser.py", line 48, in handle_starttag self.soup.handle_starttag(name, None, None, dict(attrs))
File "/Library/Python/2.7/site-packages/bs4/init.py", line 298, in handle_starttag self.currentTag, self.previous_element)
File "/Library/Python/2.7/site-packages/bs4/element.py", line 749, in init self.name, attrs)
File "/Library/Python/2.7/site-packages/bs4/builder/init.py", line 160, in _replace_cdata_list_attribute_values values = whitespace_re.split(value) TypeError: expected string or buffer

Tags： to in py self init packages feed line

0条回答

目前没有回答

“TypeError:应为字符串或缓冲区”，带有BeautifulSoup

相关问题更多 >

编程相关推荐

热门问题

热门文章

“TypeError:应为字符串或缓冲区”，带有BeautifulSoup

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >