如何在Docker Compose中通过Privoxy将Scrapy与Python和Tor结合使用

2024-06-07 18:28:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试用Python、tor和privoxy运行Scrapy。 我在https://github.com/khpeek/privoxy-tor-scraper中使用了khpeek/privoxy-tor scraper的scraper。以下是我的目录结构:

 - docker-compose.yml
 - privoxy
   - config
   - Dockerfile
- scraper
   - Dockerfile
   - newnym.py
   - requirements.txt
- tor
   - Dockerfile

我正在尝试运行以下docker compose.yml

version: '3'

services:
  privoxy:
    build: ./privoxy
    ports:
      - "8118:8118"
    links:
      - tor

  tor:
    build:
      context: ./tor
      args:
        password: "1234"
    ports:
      - "9050:9050"
      - "9051:9051"

  scraper:
    build: ./scraper
    links:
      - tor
      - privoxy

其中torDockerfile为:

FROM alpine:3.7
EXPOSE 9050 9051
ARG password
RUN apk --update add tor
RUN echo "ControlPort 9051" >> /etc/tor/torrc
RUN echo "CookieAuthentication 1" >> /etc/tor/torrc
RUN echo "HashedControlPassword $(tor --quiet --hash-password $password)" >> /etc/tor/torrc
CMD ["tor"]

privoxy的帽子是:

FROM alpine:latest
EXPOSE 8118
RUN apk --update add privoxy
COPY config /etc/privoxy/
#CMD ["privoxy", "--no-daemon"]
CMD ["privoxy", "--no-daemon", "/etc/privoxy/config"]

其中,config由两行组成:

listen-address 0.0.0.0:8118
forward-socks5 / tor:9050 .

刮板的Dockerfile是:

FROM python:3.6-alpine
ADD . /scraper
WORKDIR /scraper
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
CMD ["python", "newnym.py"]

其中requirements.txt包含一行请求。最后,程序newnym.py旨在简单测试使用Tor更改IP地址是否有效:

from time import sleep, time

import requests as req
import telnetlib

def get_ip():
    IPECHO_ENDPOINT = 'http://ipecho.net/plain'
    HTTP_PROXY = 'http://privoxy:8118'
    return req.get(IPECHO_ENDPOINT, proxies={'http': HTTP_PROXY}).text

def request_ip_change():
    #tn = telnetlib.Telnet('privoxy',8118)
    tn = telnetlib.Telnet('tor',9051)
    tn.read_until("Escape character is '^]'.", 2)
    tn.write('AUTHENTICATE ""\r\n')
    tn.read_until("250 OK", 2)
    tn.write("signal NEWNYM\r\n")
    tn.read_until("250 OK", 2)

if __name__ == '__main__':
    dts = []
    #isOpen('tor',9051)
    #isOpen('privoxy',8118)
    try:
        while True:
            ip = get_ip()
            t0 = time()
            request_ip_change()
            while True:
                new_ip = get_ip()
                if new_ip == ip:
                    sleep(1)
                else:
                    break
            dt = time() - t0
            dts.append(dt)
            print("{} -> {} in ~{}s".format(ip, new_ip, int(dt)))
    except KeyboardInterrupt:
        print("Stopping...")
        print("Average: {}".format(sum(dts) / len(dts)))

docker compose build成功生成,但如果我尝试docker compose up,我会收到以下错误消息:

scraper_1_651fd6690a2d | Traceback (most recent call last):
scraper_1_651fd6690a2d |   File "newnym.py", line 45, in <module>
scraper_1_651fd6690a2d |     request_ip_change()
scraper_1_651fd6690a2d |   File "newnym.py", line 27, in request_ip_change
scraper_1_651fd6690a2d |     tn = telnetlib.Telnet('tor',9051)
scraper_1_651fd6690a2d |   File "/usr/local/lib/python3.6/telnetlib.py", line 218, in __init__
scraper_1_651fd6690a2d |     self.open(host, port, timeout)
scraper_1_651fd6690a2d |   File "/usr/local/lib/python3.6/telnetlib.py", line 234, in open
scraper_1_651fd6690a2d |     self.sock = socket.create_connection((host, port), timeout)
scraper_1_651fd6690a2d |   File "/usr/local/lib/python3.6/socket.py", line 724, in create_connection
scraper_1_651fd6690a2d |     raise err
scraper_1_651fd6690a2d |   File "/usr/local/lib/python3.6/socket.py", line 713, in create_connection
scraper_1_651fd6690a2d |     sock.connect(sa)
scraper_1_651fd6690a2d | ConnectionRefusedError: [Errno 111] Connection refused

Tags: dockerruninpydockerfileiplineetc

热门问题