一位客户,负责收集和分析定期审查秘书处为关塔那摩在押人员提供的网页。

nyt-prb-scraper的Python项目详细描述


>;来自their website:“定期审查 秘书处负责制定和管理定期审查程序 为符合条件的关塔那摩湾被拘留者,包括提供个人 被拘留者代表。”

用法

PRB对关塔那摩监狱进行了三种不同形式的审查 被拘留者的文件:初步审查、档案审查和全面审查 回顾。从技术上讲,第四种类型,即随后的全面审查,是 可用。到目前为止,还没有发布任何后续的完整评论。

初步审查

initial_review --csv > initial_review.csv
initial_review --json > initial_review.json
initial_review --tsv > initial_review.tsv

文件审查

file_review --csv > file_review.csv
file_review --json > file_review.json
file_review --tsv > file_review.tsv

全面审查

full_review --csv > full_review.csv
full_review --json > full_review.json
full_review --tsv > full_review.tsv

架构

为每个文档返回一行或一个对象。每个文档都包含 文档特定的字段,如type_nametype_idurl以及nameisn为来自的每个文档生成唯一的ID isn-type_id-hearing_or_review_date

[{"review_type":"full-review","review_url":"http://www.prs.mil/Review-Information/Initial-Review/","hearing_or_review_date":"2014-11-05","denial":null,"name":"Abdel Malik Ahmed Abdel Wahab Al Rahabi","type_id":"1","url":"http:\/\/www.prs.mil\/Portals\/60\/Documents\/ISN037\/141105_U_ISN037_GOVERNMENT'S_UNCLASSIFIED_SUMMARY_PUBLIC.pdf","type_name":"Government's Unclassified Summary","id":"037-initial-review-1-2014-11-05","isn":"037","denied":false,"notification_date":"2014-08-26","final_determination_date":"2014-12-05"},{"review_type":"full-review","review_url":"http://www.prs.mil/Review-Information/Initial-Review/","hearing_or_review_date":"2014-11-05","denial":null,"name":"Abdel Malik Ahmed Abdel Wahab Al Rahabi","type_id":"2","url":"http:\/\/www.prs.mil\/Portals\/60\/Documents\/ISN037\/141105_U_ISN037_PR_STATEMENT_PRB.pdf","type_name":"Opening Statements of Detainee's Representatives","id":"037-initial-review-2-2014-11-05","isn":"037","denied":false,"notification_date":"2014-08-26","final_determination_date":"2014-12-05"},{"review_type":"full-review","review_url":"http://www.prs.mil/Review-Information/Initial-Review/","hearing_or_review_date":"2014-11-05","denial":null,"name":"Abdel Malik Ahmed Abdel Wahab Al Rahabi","type_id":"3","url":"http:\/\/www.prs.mil\/Portals\/60\/Documents\/ISN037\/141216_U_ISN037_DETAINEE_WRITTEN_SUBMISSION_PUBLIC.pdf","type_name":"Detainee's Written Submission","id":"037-initial-review-3-2014-11-05","isn":"037","denied":false,"notification_date":"2014-08-26","final_determination_date":"2014-12-05"},{"review_type":"full-review","review_url":"http://www.prs.mil/Review-Information/Initial-Review/","hearing_or_review_date":"2014-11-05","denial":null,"name":"Abdel Malik Ahmed Abdel Wahab Al Rahabi","type_id":"4","url":"http:\/\/www.prs.mil\/LinkClick.aspx?fileticket=RFOMdQD69k4%3d&tabid=8447&portalid=60&mid=20067","type_name":"Transcript of Public Session","id":"037-initial-review-4-2014-11-05","isn":"037","denied":false,"notification_date":"2014-08-26","final_determination_date":"2014-12-05"},{"review_type":"full-review","review_url":"http://www.prs.mil/Review-Information/Initial-Review/","hearing_or_review_date":"2014-11-05","denial":null,"name":"Abdel Malik Ahmed Abdel Wahab Al Rahabi","type_id":"5","url":"http:\/\/www.prs.mil\/Portals\/60\/Documents\/ISN037\/141105_U_ISN037_TRANSCRIPT_OF_DETAINEE_SESSION_PUBLIC.pdf","type_name":"Transcript of Detainee Session","id":"037-initial-review-5-2014-11-05","isn":"037","denied":false,"notification_date":"2014-08-26","final_determination_date":"2014-12-05"},{"review_type":"full-review","review_url":"http://www.prs.mil/Review-Information/Initial-Review/","hearing_or_review_date":"2014-11-05","denial":null,"name":"Abdel Malik Ahmed Abdel Wahab Al Rahabi","type_id":"6","url":"http:\/\/www.prs.mil\/LinkClick.aspx?fileticket=s0XT-7qYc94%3d&tabid=8447&portalid=60&mid=20067","type_name":"Unclassified Summary of Final Determination","id":"037-initial-review-6-2014-11-05","isn":"037","denied":false,"notification_date":"2014-08-26","final_determination_date":"2014-12-05"}]

输出

scraper可以返回csv、json或tsv。如果没有选项,则为默认值 传递的是csv。

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
如何用java表示这个数学函数的算法   Java/Stream帮助:仅使用streams将嵌套的映射列表转换为映射   使用Selenium连接到数据库时发生java未知主机异常   java如何了解jvm内存使用:“堆内存”和“堆外内存”   java Oracle BI报告导入模板   java如何使用Spring将xml转换为bean?   java线程。join()以保证执行顺序   java从THINGSPEAK到ANDROID应用程序获取JSON数据   使用Java的stanford库中的异常   java正确使用来自其他类文件的方法   如果集合中的元素类型为接口类型,如何填充集合?(爪哇)   记录java。util。记录器创建的文件超过了应有的数量   类Java对象uniq值   尝试调用无法应用于()的方法时出现java错误