如何在构建Python项目的Docker镜像时避免重新安装包?
我的Dockerfile大概是这样的:
FROM my/base
ADD . /srv
RUN pip install -r requirements.txt
RUN python setup.py install
ENTRYPOINT ["run_server"]
每次我构建一个新的镜像时,依赖的包都得重新安装,这在我所在的地区可能会非常慢。
我想到的一种方法是通过用更新的镜像来覆盖my/base
镜像,从而缓存
已经安装的包,像这样:
docker build -t new_image_1 .
docker tag new_image_1 my/base
这样下次我用这个Dockerfile构建时,my/base
已经有一些包安装好了。
但是这个方法有两个问题:
- 并不是总能覆盖一个基础镜像
- 基础镜像会随着新镜像的叠加变得越来越大
那我可以用什么更好的方法来解决这个问题呢?
编辑:
关于我机器上Docker的一些信息:
☁ test docker version
Client version: 1.1.2
Client API version: 1.13
Go version (client): go1.2.1
Git commit (client): d84a070
Server version: 1.1.2
Server API version: 1.13
Go version (server): go1.2.1
Git commit (server): d84a070
☁ test docker info
Containers: 0
Images: 56
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Dirs: 56
Execution Driver: native-0.2
Kernel Version: 3.13.0-29-generic
WARNING: No swap limit support
6 个回答
当你想用GitLab的CI/CD来实现这个功能时,可以参考这个文档:https://docs.gitlab.com/ee/ci/docker/docker_layer_caching.html
如果这样做不奏效,可以尝试明确设置一下 DOCKER_BUILDKIT
和 BUILDKIT_INLINE_CACHE
。
variables:
DOCKER_BUILDKIT: '1'
....
script:
- docker build ........ BUILDKIT_INLINE_CACHE=1 .
pipenv install
默认情况下,它会尝试重新锁定。当它这样做时,Docker构建的缓存层不会被使用,因为Pipfile.lock文件已经改变了。查看文档
解决这个问题的方法是给Pipfile.lock版本号,并使用
RUN pipenv sync
来代替。
感谢JFG Piñeiro。
为了减少网络活动,你可以把 pip
指向你电脑上的一个缓存目录。
在运行你的 Docker 容器时,把你电脑上的 pip 缓存目录挂载到容器里的 pip 缓存目录。你可以使用这样的 docker run
命令:
docker run -v $HOME/.cache/pip-docker/:/root/.cache/pip image_1
然后在你的 Dockerfile 中,把安装依赖的步骤放在 ENTRYPOINT
语句(或者 CMD
语句)里,而不是放在 RUN
命令里。这一点很重要,因为(正如评论中提到的)在构建镜像时,挂载是不可用的(也就是说 RUN
语句执行的时候是无法访问的)。你的 Dockerfile 应该像这样:
FROM my/base
ADD . /srv
ENTRYPOINT ["sh", "-c", "pip install -r requirements.txt && python setup.py install && run_server"]
我知道这个问题已经有一些热门的回答了。但现在有一种更新的方法可以为包管理器缓存文件。我觉得这在未来当 BuildKit 更加普及时,可能会是个不错的答案。
从 Docker 18.09 开始,支持一种实验性的功能叫做 BuildKit。BuildKit 为 Dockerfile 增加了一些新特性,包括 实验性支持将外部存储挂载到 RUN
步骤中。这让我们可以为像 $HOME/.cache/pip/
这样的地方创建缓存。
我们将使用以下的 requirements.txt
文件作为例子:
Click==7.0
Django==2.2.3
django-appconf==1.0.3
django-compressor==2.3
django-debug-toolbar==2.0
django-filter==2.2.0
django-reversion==3.0.4
django-rq==2.1.0
pytz==2019.1
rcssmin==1.0.6
redis==3.3.4
rjsmin==1.1.0
rq==1.1.0
six==1.12.0
sqlparse==0.3.0
一个典型的 Python Dockerfile
可能看起来像这样:
FROM python:3.7
WORKDIR /usr/src/app
COPY requirements.txt /usr/src/app/
RUN pip install -r requirements.txt
COPY . /usr/src/app
在启用 BuildKit 的情况下,使用 DOCKER_BUILDKIT
环境变量,我们可以在大约 65 秒内构建未缓存的 pip
步骤:
$ export DOCKER_BUILDKIT=1
$ docker build -t test .
[+] Building 65.6s (10/10) FINISHED
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 120B 0.0s
=> [internal] load metadata for docker.io/library/python:3.7 0.5s
=> CACHED [1/4] FROM docker.io/library/python:3.7@sha256:6eaf19442c358afc24834a6b17a3728a45c129de7703d8583392a138ecbdb092 0.0s
=> [internal] load build context 0.6s
=> => transferring context: 899.99kB 0.6s
=> CACHED [internal] helper image for file operations 0.0s
=> [2/4] COPY requirements.txt /usr/src/app/ 0.5s
=> [3/4] RUN pip install -r requirements.txt 61.3s
=> [4/4] COPY . /usr/src/app 1.3s
=> exporting to image 1.2s
=> => exporting layers 1.2s
=> => writing image sha256:d66a2720e81530029bf1c2cb98fb3aee0cffc2f4ea2aa2a0760a30fb718d7f83 0.0s
=> => naming to docker.io/library/test 0.0s
现在,让我们添加实验性头部,并修改 RUN
步骤来缓存 Python 包:
# syntax=docker/dockerfile:experimental
FROM python:3.7
WORKDIR /usr/src/app
COPY requirements.txt /usr/src/app/
RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt
COPY . /usr/src/app
现在可以再进行一次构建。它应该花费相同的时间。但这次它在我们的新缓存挂载中缓存了 Python 包:
$ docker build -t pythontest .
[+] Building 60.3s (14/14) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 120B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> resolve image config for docker.io/docker/dockerfile:experimental 0.5s
=> CACHED docker-image://docker.io/docker/dockerfile:experimental@sha256:9022e911101f01b2854c7a4b2c77f524b998891941da55208e71c0335e6e82c3 0.0s
=> [internal] load .dockerignore 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 120B 0.0s
=> [internal] load metadata for docker.io/library/python:3.7 0.5s
=> CACHED [1/4] FROM docker.io/library/python:3.7@sha256:6eaf19442c358afc24834a6b17a3728a45c129de7703d8583392a138ecbdb092 0.0s
=> [internal] load build context 0.7s
=> => transferring context: 899.99kB 0.6s
=> CACHED [internal] helper image for file operations 0.0s
=> [2/4] COPY requirements.txt /usr/src/app/ 0.6s
=> [3/4] RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt 53.3s
=> [4/4] COPY . /usr/src/app 2.6s
=> exporting to image 1.2s
=> => exporting layers 1.2s
=> => writing image sha256:0b035548712c1c9e1c80d4a86169c5c1f9e94437e124ea09e90aea82f45c2afc 0.0s
=> => naming to docker.io/library/test 0.0s
大约 60 秒。和第一次构建差不多。
对 requirements.txt
做一个小改动(比如在两个包之间添加一行),强制使缓存失效,然后再运行一次:
$ docker build -t pythontest .
[+] Building 15.9s (14/14) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 120B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> resolve image config for docker.io/docker/dockerfile:experimental 1.1s
=> CACHED docker-image://docker.io/docker/dockerfile:experimental@sha256:9022e911101f01b2854c7a4b2c77f524b998891941da55208e71c0335e6e82c3 0.0s
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 120B 0.0s
=> [internal] load .dockerignore 0.0s
=> [internal] load metadata for docker.io/library/python:3.7 0.5s
=> CACHED [1/4] FROM docker.io/library/python:3.7@sha256:6eaf19442c358afc24834a6b17a3728a45c129de7703d8583392a138ecbdb092 0.0s
=> CACHED [internal] helper image for file operations 0.0s
=> [internal] load build context 0.7s
=> => transferring context: 899.99kB 0.7s
=> [2/4] COPY requirements.txt /usr/src/app/ 0.6s
=> [3/4] RUN --mount=type=cache,target=/root/.cache/pip pip install -r requirements.txt 8.8s
=> [4/4] COPY . /usr/src/app 2.1s
=> exporting to image 1.1s
=> => exporting layers 1.1s
=> => writing image sha256:fc84cd45482a70e8de48bfd6489e5421532c2dd02aaa3e1e49a290a3dfb9df7c 0.0s
=> => naming to docker.io/library/test 0.0s
只需大约 16 秒!
我们之所以能这么快,是因为不再下载所有的 Python 包。它们已经被包管理器(在这个例子中是 pip
)缓存并存储在一个缓存卷中。这个卷在运行步骤中提供给 pip
,这样它就可以重用我们已经下载的包。这个过程是在 Docker 层缓存之外进行的。
在更大的 requirements.txt
文件上,效果应该会更好。
注意事项:
- 这是实验性的 Dockerfile 语法,应该当作实验来对待。目前你可能不想在生产环境中使用它。
BuildKit 的功能目前在 Docker Compose 或其他直接使用 Docker API 的工具下不支持。现在在 Docker Compose 1.25.0 版本中已经支持这个功能。查看 如何在 docker-compose 中启用 BuildKit?- 目前没有直接的接口来管理缓存。当你执行
docker system prune -a
时,缓存会被清除。
希望这些功能能被纳入 Docker 的构建中,BuildKit 也能成为默认选项。如果那时发生,我会尽量更新这个答案。
试着构建一个看起来像这样的Dockerfile:
FROM my/base
WORKDIR /srv
ADD ./requirements.txt /srv/requirements.txt
RUN pip install -r requirements.txt
ADD . /srv
RUN python setup.py install
ENTRYPOINT ["run_server"]
只要你没有对requirements.txt
文件做任何修改,Docker在执行pip安装时会使用缓存,不管你在.
目录下的其他代码文件是否有改动。下面是一个例子。
这是一个简单的Hello, World!
程序:
$ tree
.
├── Dockerfile
├── requirements.txt
└── run.py
0 directories, 3 file
# Dockerfile
FROM dockerfile/python
WORKDIR /srv
ADD ./requirements.txt /srv/requirements.txt
RUN pip install -r requirements.txt
ADD . /srv
CMD python /srv/run.py
# requirements.txt
pytest==2.3.4
# run.py
print("Hello, World")
docker build的输出结果:
Step 1 : WORKDIR /srv
---> Running in 22d725d22e10
---> 55768a00fd94
Removing intermediate container 22d725d22e10
Step 2 : ADD ./requirements.txt /srv/requirements.txt
---> 968a7c3a4483
Removing intermediate container 5f4e01f290fd
Step 3 : RUN pip install -r requirements.txt
---> Running in 08188205e92b
Downloading/unpacking pytest==2.3.4 (from -r requirements.txt (line 1))
Running setup.py (path:/tmp/pip_build_root/pytest/setup.py) egg_info for package pytest
....
Cleaning up...
---> bf5c154b87c9
Removing intermediate container 08188205e92b
Step 4 : ADD . /srv
---> 3002a3a67e72
Removing intermediate container 83defd1851d0
Step 5 : CMD python /srv/run.py
---> Running in 11e69b887341
---> 5c0e7e3726d6
Removing intermediate container 11e69b887341
Successfully built 5c0e7e3726d6
现在我们来修改一下run.py
:
# run.py
print("Hello, Python")
再试着构建一次,下面是输出结果:
Sending build context to Docker daemon 5.12 kB
Sending build context to Docker daemon
Step 0 : FROM dockerfile/python
---> f86d6993fc7b
Step 1 : WORKDIR /srv
---> Using cache
---> 55768a00fd94
Step 2 : ADD ./requirements.txt /srv/requirements.txt
---> Using cache
---> 968a7c3a4483
Step 3 : RUN pip install -r requirements.txt
---> Using cache
---> bf5c154b87c9
Step 4 : ADD . /srv
---> 9cc7508034d6
Removing intermediate container 0d7cf71eb05e
Step 5 : CMD python /srv/run.py
---> Running in f25c21135010
---> 4ffab7bc66c7
Removing intermediate container f25c21135010
Successfully built 4ffab7bc66c7
如你所见,这次docker在构建时使用了缓存。现在,我们来更新一下requirements.txt
:
# requirements.txt
pytest==2.3.4
ipython
下面是docker build的输出结果:
Sending build context to Docker daemon 5.12 kB
Sending build context to Docker daemon
Step 0 : FROM dockerfile/python
---> f86d6993fc7b
Step 1 : WORKDIR /srv
---> Using cache
---> 55768a00fd94
Step 2 : ADD ./requirements.txt /srv/requirements.txt
---> b6c19f0643b5
Removing intermediate container a4d9cb37dff0
Step 3 : RUN pip install -r requirements.txt
---> Running in 4b7a85a64c33
Downloading/unpacking pytest==2.3.4 (from -r requirements.txt (line 1))
Running setup.py (path:/tmp/pip_build_root/pytest/setup.py) egg_info for package pytest
Downloading/unpacking ipython (from -r requirements.txt (line 2))
Downloading/unpacking py>=1.4.12 (from pytest==2.3.4->-r requirements.txt (line 1))
Running setup.py (path:/tmp/pip_build_root/py/setup.py) egg_info for package py
Installing collected packages: pytest, ipython, py
Running setup.py install for pytest
Installing py.test script to /usr/local/bin
Installing py.test-2.7 script to /usr/local/bin
Running setup.py install for py
Successfully installed pytest ipython py
Cleaning up...
---> 23a1af3df8ed
Removing intermediate container 4b7a85a64c33
Step 4 : ADD . /srv
---> d8ae270eca35
Removing intermediate container 7f003ebc3179
Step 5 : CMD python /srv/run.py
---> Running in 510359cf9e12
---> e42fc9121a77
Removing intermediate container 510359cf9e12
Successfully built e42fc9121a77
注意到docker在pip安装时没有使用缓存。如果不行,检查一下你的docker版本。
Client version: 1.1.2
Client API version: 1.13
Go version (client): go1.2.1
Git commit (client): d84a070
Server version: 1.1.2
Server API version: 1.13
Go version (server): go1.2.1
Git commit (server): d84a070