S3上的PDFS文件分割
S3PdfSplitter的Python项目详细描述
s3pdfsplitter
python aws-s3 pdf拆分器
用法
基本用法:
fromPdfSplitterimportSplitterspliter=Splitter("config.json")spliter.split(data)
例如config.json:
{"aws":{"access_key_id":"aws-acces-key","secret_access_key":"aws secret",},"s3":{"bucket":"bucket"}}
请注意,配置是用ConfigEnv管理的,因此您可以提供一个.ini文件,或者覆盖配置和环境变量(aws_s3_bucket、aws_access_key_id和aws_secret_access_key)
示例数据:
{"input":["firstFile.pdf","secondFile.pdf"],"output":[{"s3Key":"output1.pdf","pages":[{"index":0,"pages":[0,1]},{"index":1,"pages":[0,1]}]},{"s3Key":"output2.pdf","pages":[{"index":0,"pages":[0]},{"index":1,"pages":[0]},{"index":0,"pages":[1]},{"index":1,"pages":[1]}]}]}
这将在您的s3中生成两个pdf:
- 第一个是output1.pdf,第0页和第1页来自firstfile,第0页和第1页来自secondfile
- 第二个,output2.pdf,第0页来自firstfile,第0页来自secondfile,第1页来自firstfile,第1页来自secondfile
开发指南
安装
使用virtualenv:
# create virtualenv
virtualenv -p python3 .venv
# activate venv
source .venv/bin/activate
# install dependancies
pip install -r requirements.txt
pip install -r requirements-dev.txt
测试
使用unittest:
# if your test config is setup :
python -m unittest
# if you want to overide your test config :
S3_BUCKET=<your bucket> AWS_ACCESS_KEY_ID=<your key id> AWS_SECRET_ACCESS_KEY=<your key secret> python -m unittest