mechanize响应中没有Content-Disposition头
我需要在Celery任务中从AdWords的账单页面下载一个csv文件。但是我对我的实现有什么问题一点头绪都没有,所以需要你的帮助。
首先,我登录了:
browser = mechanize.Browser()
browser.open('https://accounts.google.com/ServiceLogin')
browser.select_form(nr=0)
browser['Email'] = g_email
browser['Passwd'] = g_password
browser.submit()
browser.set_handle_robots(False)
billing_resp = browser.open('https://adwords.google.com/')
现在我已经在账单页面了。接下来,我从结果页面提取了令牌和ID,分析了Chrome调试工具中的请求头和操作网址,现在我想发送一个POST请求来获取我的csv文件。Chrome中的响应头是:
content-disposition:attachment; filename="myclientcenter.csv.gz"
content-length:307479
content-type:application/x-gzip; charset=UTF-8
使用mechanize库:
data = {
'__u': effectiveUserId,
'__c': customerId,
'token': token,
}
browser.addheaders = [
('accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),
('content-type', 'application/x-www-form-urlencoded'),
("accept-encoding", "gzip,deflate,sdch"),
('user-agent', "Mozilla/5.0"),
('referer', "https://adwords.google.com/mcm/Mcm?__u=8183865359&__c=3069937889"),
('origin', "https://adwords.google.com"),
]
browser.set_handle_refresh(True)
browser.set_debug_responses(True)
browser.set_debug_redirects(True)
browser.set_handle_referer(True)
browser.set_debug_http(True)
browser.set_handle_equiv(True)
browser.set_handle_gzip(True)
response = browser.open(
'https://adwords.google.com/mcm/file/ClientSummary/',
data='&'.join(['='.join(pair) for pair in data.items()]),
)
但是!响应中的Content-Length头是0,并且没有Content-Disposition。这是为什么呢?我该怎么做才能让它正常工作?
我尝试使用Requests库,但连登录阶段都没能通过……
1 个回答
0
我现在已经找到了自己问题的答案(感谢我的团队领导)。
主要的错误在于这个不正确的请求数据:
data = {
'__u': effectiveUserId,
'__c': customerId,
'token': token,
}
我们再试一次,使用正确的解决方案。
# Open Google login page and log in.
browser = mechanize.Browser()
try:
browser.open('https://accounts.google.com/ServiceLogin')
browser.select_form(nr=0)
browser['Email'] = 'email@adwords.login'
browser['Passwd'] = 'password'
browser.submit()
except HTTPError:
raise AdWordsException("Can't find the Google login form")
我们现在已经登录,可以深入了解了。
try:
browser.set_handle_robots(False)
billing_resp = browser.open('https://adwords.google.com/')
except HTTPError:
raise AdWordsException("Can't open AdWords dashboard page")
# Welcome to the AdWords billing dashboard. We can get
# session-unique token from this page for the further POST-request
token_re = re.search(r"token:\'(.{41})\'", billing_resp.read())
if token_re is None:
raise AdWordsException("Can't parse the token")
# It's time for some magic now. We have to construct proper mcsSelector
# serialized data structure. This is GWT-RPC wire protocol hell.
# Paste your specific version from web debugger.
MCS_TEMPLATE = (
"7|0|49|https://adwords.google.com/mcm/gwt/|18FBB090A5C26E56AC16C9DF0689E720|"
"com.google.ads.api.services.common.selector.Selector/1054041135|"
"com.google.ads.api.services.common.date.DateRange/1118087507|"
"com.google.ads.api.services.common.date.Date/373224763|"
"java.util.ArrayList/4159755760|java.lang.String/2004016611|ClientName|"
"ExternalCustomerId|PrimaryUserLogin|PrimaryCompanyName|IsManager|"
"SalesChannel|Tier|AccountSettingTypes|Labels|Alerts|CostWithCurrency|"
"CostUsd|Clicks|Impressions|Ctr|Conversions|ConversionRate|SearchCtr|"
"ContentCtr|BudgetAmount|BudgetStartDate|BudgetEndDate|BudgetPercentSpent|"
"BudgetType|RemainingBudget|ClientDateTimeZoneId|"
"com.google.ads.api.services.common.selector.OrderBy/524388450|"
"SearchableData|"
"com.google.ads.api.services.common.sorting.SortOrder/2037387810|"
"com.google.ads.api.services.common.pagination.Paging/363399854|"
"com.google.ads.api.services.common.selector.Predicate/451365360|"
"SeedObfuscatedCustomerId|"
"com.google.ads.api.services.common.selector.Predicate$Operator/2293561107|"
"java.util.Arrays$ArrayList/2507071751|[Ljava.lang.String;/2600011424|"
"3069937889|ExcludeSeeds|true|ClientTraversal|DIRECT|"
"com.google.ads.api.services.common.selector.Summary/3224078220|included|1|"
"2|3|4|5|"
"{report_date}|5|{report_date}" # take a note of this
"|6|26|7|8|7|9|7|10|7|11|7|12|7|13|7|14|7|15|7|16|7|17|7|18|7|19|7|20|7|21|"
"7|22|7|23|7|24|7|25|7|26|7|27|7|28|7|29|7|30|7|31|7|32|7|33|6|0|0|0|6|2|34|"
"35|36|0|34|9|-35|37|100|0|6|0|6|3|38|39|40|2|41|42|1|43|38|44|40|0|41|42|1|"
"45|38|46|-45|41|42|1|47|0|0|6|0|6|1|48|6|0|49|6|0|0|"
)
# To take stats for today
report_date = datetime.date.today()
mcs_selector = MCS_TEMPLATE.format(
report_date='%s|%s|%s' % (
report_date.day,
report_date.month,
report_date.year
),
)
data = urllib.urlencode({
'token': token_re.group(1),
'mcsSelector': mcs_selector,
})
# And... it finally works! Token and proper mcsSelector is all we need.
# POST-request with this data returns zipped csv file for us with
# current balance state and another info that's not available via AdWords API
zipped_csv = browser.open(
'https://adwords.google.com/mcm/file/ClientSummary',
data=data
)
# Unpack it and use as you wish.
with gzip.GzipFile(mode='r', fileobj=zipped_csv) as csv_io:
try:
csv = StringIO.StringIO(csv_io.read())
except IOError:
raise AdWordsException("Can't get CSV file from response")
finally:
browser.close()