如何从shtml链接集合中获取或下载pdf？

2条回答

网友

1楼 · 编辑于 2024-04-23 07:35:44

这很有帮助！你知道吗

install.packages("rvest")
install.packages("httr")
install.packages("readxl")
update.packages("tibble")

library(rvest)
library(httr)
library(readxl)

setwd("C:/Users/Andreas/Desktop/481064 A.F. - Master Thesis - Election Outcome Prediction/Full Repository Austrian Bundestag")
my_data <- read_excel("StenographischeProto.xlsx")
View(my_data)

session <- html_session("https://www.uscis.gov/sites/default/files/files/form/i-765.pdf")

# save pdf to test.pdf
writeBin(session$response$content,"test.pdf")

网友

2楼 · 编辑于 2024-04-23 07:35:44

基本工作流程是：

您需要使用css或xpath找到pdf下载按钮。你知道吗
使用Rselenium to simulate the download action；或者获取href属性并使用rvest向该链接发出请求，然后使用writeBin()将二进制响应写入磁盘

要下载pdf文件，我将以政府表格为例：

pdf网址：https://www.uscis.gov/sites/default/files/files/form/i-765.pdf

library(rvest)
library(httr)

session <- html_session("https://www.uscis.gov/sites/default/files/files/form/i-765.pdf")

# save pdf to test.pdf
writeBin(session$response$content,"test.pdf")

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何从shtml链接集合中获取或下载pdf？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >