从Microsoft托管代理Azure Pipelines中的URL下载文件

2024-04-24 08:27:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在Azure YAML管道中运行Python脚本任务。通过浏览器访问URL时,会下载JSON文件。 URL-https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519

到目前为止我所做的-->

- task: PythonScript@0
  name: pythonTask
  inputs:
    scriptSource: 'inline'
    script: |

      url = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"

      import webbrowser
      webbrowser.open(url)
      print("The web browser opened and the file is downloaded")

一旦浏览器打开URL,文件应在本地自动下载。 但是,在运行上述管道时,我似乎无法在代理计算机上的任何位置找到该文件。我也没有得到任何错误

我正在使用Windows-2019 Microsoft托管代理

如何在代理计算机中找到下载的文件路径

或者有没有其他方法可以从URL下载文件而不必打开浏览器


Tags: 文件httpscomidurl代理管道download
1条回答
网友
1楼 · 发布于 2024-04-24 08:27:51

How can I find the downloaded file-path inside the agent machine?

请尝试以下Python脚本:

steps:
- task: PythonScript@0
  displayName: 'Run a Python script'
  inputs:
    scriptSource: inline
    script: |
     import urllib.request
     

     
     url = 'https://www.some_url.com/downloads'
     
     path = r"$(Build.ArtifactStagingDirectory)/filename.xx"
     urllib.request.urlretrieve(url, path)

steps:
- script: 'pip install wget'
  displayName: 'Command Line Script'

- task: PythonScript@0
  displayName: 'Run a Python script'
  inputs:
    scriptSource: inline
    script: |
     import wget
     
     print('Beginning file download with wget module')
     
     url = 'https://www.some_url.com/downloads'
     path = r"$(Build.ArtifactStagingDirectory)"
     wget.download(url, path)

然后,该文件将以Python脚本下载到目标路径

这里有一个关于use Python download files from url的博客

更新:

url:microsoft.com/en-us/download/confirmation.aspx?id=56519需要打开网页,文件将自动下载

因此,当您使用wget或urllib.request时,您将得到403错误

您可以更改为使用站点url手动下载json文件

enter image description here

例如:url:https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20210329.json

import urllib.request

url = 'https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20210329.json'

path = r"$(Build.ArtifactStagingDirectory)\agent.json"
urllib.request.urlretrieve(url, path)

更新2:

您可以使用Python脚本在网站上下载

示例:

steps:
- script: |
   pip install bs4
   
   pip install lxml
  workingDirectory: '$(build.sourcesdirectory)'
  displayName: 'Command Line Script'

- task: PythonScript@0
  displayName: 'Run a Python script'
  inputs:
    scriptSource: inline
    script: |
     from bs4 import BeautifulSoup
     from urllib.request import Request, urlopen
     import re
     import urllib.request
     
     
     req = Request("https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519" , headers={'User-Agent': 'Mozilla/5.0'})
     html_page = urlopen(req).read()
     
     a=""
     soup = BeautifulSoup(html_page, "lxml")
     
     for link in soup.find_all('a' , id="c50ef285-c6ea-c240-3cc4-6c9d27067d6c"):
         
          a= link.get('href')
          print(a)
     
     
     
     path = r"$(Build.sourcesdirectory)\agent.json"
     urllib.request.urlretrieve(a, path)

结果:

enter image description here

更新3:

获取下载URL的另一种方法:

steps:
- script: 'pip install requests'
  displayName: 'Command Line Script'

- task: PythonScript@0
  displayName: 'Run a Python script'
  inputs:
    scriptSource: inline
    script: |
     import requests
     import re
     import urllib.request
     
     rq= requests.get("https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519")
      
     t = re.search("https://download.microsoft.com/download/.*?\.json", rq.text )
      
     
     
     a= t.group()
     
     print(a)
     
     path = r"$(Build.sourcesdirectory)\agent.json"
     urllib.request.urlretrieve(a, path)
     

相关问题 更多 >