Python videocr包_程序模块 - PyPI

利用机器学习从视频中提取硬编码字幕

videocr的Python项目详细描述

视频CR

使用带有python的Tesseractocr引擎从视频中提取硬编码（烧录）字幕。

输入带有硬编码字幕的视频：

screenshot

# print_sub.pyimportvideocrif__name__=='__main__':print(videocr.get_subtitles('video.mp4',lang='chi_sim+eng',sim_threshold=70,conf_threshold=65))

$ python3 print_sub.py

输出：

0
00:00:01,042 --> 00:00:02,877
喝 点 什么 ? 
What can I get you?

1
00:00:03,044 --> 00:00:05,463
我 不 知道
Um, I'm not sure.

2
00:00:08,091 --> 00:00:10,635
休闲 时 光 …
For relaxing times, make it...

3
00:00:10,677 --> 00:00:12,595
三 得 利 时 光
Bartender, Bob Suntory time.

4
00:00:14,472 --> 00:00:17,142
我 要 一 杯 伏特 加
Un, I'll have a vodka tonic.

5
00:00:18,059 --> 00:00:19,019
谢谢
Laughs Thanks.

性能

ocr进程是cpu密集型的。在我的双核笔记本电脑上提取20秒的视频需要3分钟。更多的CPU内核将使它更快。

安装

安装Tesseract并确保它位于$PATH
$ pip install videocr

功能

get_subtitles(video_path:str,lang='eng',time_start='0:00',time_end='',conf_threshold=65,sim_threshold=90,use_fullframe=False)

返回srt格式的字幕字符串。

save_subtitles_to_file(video_path:str,file_path='subtitle.srt',lang='eng',time_start='0:00',time_end='',conf_threshold=65,sim_threshold=90,use_fullframe=False)

给file_path写副标题。如果文件不存在，它将自动创建。

参数

lang
字幕的语言。你几乎可以提取任何语言的字幕。支持this page上的所有语言代码（如英文'eng'）和this repository中的所有脚本名（如简体中文'HanS'）。
请注意，您可以同时使用多种语言，例如lang='hin+eng'用于印地语和英语。
语言文件将自动下载到您的~/tessdata。您可以在它们的wiki page上阅读有关tesseract语言数据文件的更多信息。
conf_threshold
单词预测的置信阈值。信任度低于此值的单词将被丢弃。默认值65在大多数情况下都可以。
如果每行中的单词太少，则使其接近0；如果每行中的多余单词太多，则使其接近100。
sim_threshold
字幕行的相似度阈值。大于此阈值的Levenshtein比率的字幕行将合并在一起。默认值90在大多数情况下都可以。
如果复制的字幕行太多，则使其接近0；如果复制的字幕行太少，则使其接近100。
time_start和time_end
只从视频片段中提取字幕。字幕时间戳仍然根据整个视频长度计算。
use_fullframe
默认情况下，只有每帧的下半部分用于ocr。如果字幕不在每帧的下半部分，则可以显式地使用整帧。

欢迎加入QQ群-->： 979659372

videocr 0.1.5

videocr的Python项目详细描述

视频CR

性能

安装

功能

参数

推荐PyPI第三方库

pyobjcframeworkbusinesschat

tull

markflow

simple-calc-kw

gw-chirp

dist-helloworldmod1

plsc

extensionhelpers

hflow

geotolkparser

distributions-PR

polypus

asyncif

lpips

drory-distributions

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

videocr 0.1.5

videocr的Python项目详细描述

视频CR

性能

安装

功能

参数

推荐PyPI第三方库

pyobjcframeworkbusinesschat

tull

markflow

simple-calc-kw

gw-chirp

dist-helloworldmod1

plsc

extensionhelpers

hflow

geotolkparser

distributions-PR

polypus

asyncif

lpips

drory-distributions

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签