// Make sure again that none of the threads
// are
// alive.
logger.info("It looks like no thread is working, waiting for 10 seconds to make sure...");
sleep(10);
// ... more code ...
logger.info("No thread is working and no more URLs are in queue waiting for another 10 seconds to make sure...");
sleep(10);
// ... more code ...
logger.info("Waiting for 10 seconds before final clean up...");
sleep(10);
此外,主循环每10秒检查一次,以了解爬行线程是否完成:
while (true) {
sleep(10);
// code to check if some thread is still working
}
protected void sleep(int seconds) {
try {
Thread.sleep(seconds * 1000);
} catch (Exception ignored) {
}
}
# 1 楼答案
刚刚检查了crawler4jsource code。CrawerController.start方法有很多固定的10秒“暂停”,以确保线程已经完成并准备好清理
此外,主循环每10秒检查一次,以了解爬行线程是否完成:
因此,对这些电话进行微调并减少睡眠时间可能是值得的
如果你能抽出一些时间,一个更好的解决方案是重写这个方法。我会用ExecutorService替换
List<Thread> threads
,它的awaitTermination方法会特别方便。与睡眠不同,如果所有任务都完成,awaitTermination(10, TimeUnit.SECONDS)
将立即返回