使用hachoir库提取元数据的程序
hachoir-metadata的Python项目详细描述
hachoir元数据从多媒体文件中提取元数据:音乐、图片, 录像,还有档案。它支持最常见的文件格式:
- Archives: bzip2, gzip, zip, tar
- Audio: MPEG audio (“MP3”), WAV, Sun/NeXT audio, Ogg/Vorbis (OGG), MIDI, AIFF, AIFC, Real audio (RA)
- Image: BMP, CUR, EMF, ICO, GIF, JPEG, PCX, PNG, TGA, TIFF, WMF, XCF
- Misc: Torrent
- Program: EXE
- Video: ASF format (WMV video), AVI, Matroska (MKV), Quicktime (MOV), Ogg/Theora, Real media (RM)
它试图提供尽可能多的信息。对于某些文件格式, 它提供比libextractor更多的信息,例如riff 解析器,它可以提取创建日期,用于生成文件的软件, 等等,但是hachoir元数据不能猜测信息。最复杂的操作 就是用帧大小和文件大小来计算音乐的持续时间。
hachoir元数据有三种模式:
- classic mode: extract metadata, you can use –level=LEVEL to limit quantity of information to display (and not to extract)
- –type: show on one line the file format and most important informations
- –mime: just display file MIME type
“hachoir metadata–mime”命令的工作方式类似于“file–mime”, “hachoir metadata–键入”like“file”。但现在文件命令支持 比hachoir元数据更多的文件格式。
网站:http://bitbucket.org/haypo/hachoir/wiki/hachoir-metadata
示例
AVI视频示例(RIFF文件格式):
$ hachoir-metadata pacte_des_gnous.avi Common: - Duration: 4 min 25 sec - Comment: Has audio/video index (248.9 KB) - MIME type: video/x-msvideo - Endian: Little endian Video stream: - Image width: 600 - Image height: 480 - Bits/pixel: 24 - Compression: DivX v4 (fourcc:"divx") - Frame rate: 30.0 Audio stream: - Channel: stereo - Sample rate: 22.1 KHz - Compression: MPEG Layer 3
模式–mime和–type
选项–mime请求仅显示文件mime类型(工作方式类似于unix “文件–mime”程序:
$ hachoir-metadata --mime logo-Kubuntu.png sheep_on_drugs.mp3 wormux_32x32_16c.ico logo-Kubuntu.png: image/png sheep_on_drugs.mp3: audio/mpeg wormux_32x32_16c.ico: image/x-ico
选项–文件显示文件类型的简短描述(工作方式如下 Unix“文件”程序:
$ hachoir-metadata --type logo-Kubuntu.png sheep_on_drugs.mp3 wormux_32x32_16c.ico logo-Kubuntu.png: PNG picture: 331x90x8 (alpha layer) sheep_on_drugs.mp3: MPEG v1 layer III, 128.0 Kbit/sec, 44.1 KHz, Joint stereo wormux_32x32_16c.ico: Microsoft Windows icon: 16x16x32
类似项目
- Kaa - http://freevo.sourceforge.net/cgi-bin/freevo-2.0/Kaa (written in Python)
- libextractor: http://gnunet.org/libextractor/ (written in C)
其他库的lot被写入mp3以读取和/或写入元数据 音乐和/或Exif照片。
hachoir元数据1.3.3(2010-07-26)
- Support WebM video (update Matroska extractor)
- Matroska parser extracts audio bits per sample
hachoir元数据1.3.2(2010-02-04)
- Include hachoir_metadata/qt/dialog_ui.py in MANIFEST.in
- setup.py ignores pyuic4 error if dialog_ui.py is present
- setup.py installs hachoir_metadata.qt module
hachoir元数据1.3.1(2010-01-28)
- setup.py compiles dialog.ui to dialog_ui.py and install hachoir-metadata-qt. Create –disable-qt option to skip hachoir-metadata-qt installation.
- Create a MANIFEST.in file to include extra files like ChangeLog, AUTHORS, gnome and kde subdirectories, test_doc.py, etc.
Hachoir元数据1.3(2010-01-20)
- Create hachoir-metadata-qt: a graphical interface (Qt toolkit) to display files metadata
- Create ISO9660 extractor
- Hide Hachoir warnings by default (use –verbose to show them)
- hachoir-metadata program: create –force-parser option to choose the parser
hachoir元数据1.2.1(2008-10-16)
- Using –raw, strings are not normalized (don’t strip trailing space, new line, nul byte, etc.)
- Extract much more informations from Microsoft Office documents (.doc, .xsl, .pps, etc.)
- Improve OLE2 (Word) extractor
- Fix ASF extractor for hachoir-parser 1.2.1
hachoir元数据1.2(2008-09-03)
- Create –maxlen option for hachoir-metadata program: –maxlen=0 disable the arbitrary string length limit
- Create FLAC metadata extractor
- Create hachoir_metadata.config, especially MAX_STR_LENGTH option (maximum string length)
- GIF image may contains multiple comments
hachoir元数据1.1(2008-04-01)
- More extractors are more stable and fault tolerant
- Create basic Gtk+ GUI: hachoir-metadata-gtk
- Catch error on data conversion
- Read width and height DPI for most image formats
- JPEG (EXIF): read GPS informations
- Each data item can has its own “setter”
- Add more ID3 keys (TCOP, TDAT, TRDA, TORY, TIT1)
- Create datetime filter supporting timezone
- Add “meters”, “pixels”, “DPI” suffix for human display
- Create SWF extractor
- RIFF: read also informations from headers field, compute audio compression rate
- MOV: read width and height
- ASF: read album artist
hachoir元数据1.0.1(?)??)
- Only use hachoir_core.profiler with –profiler command line option so ‘profiler’ Python module is now optional
- Set shebang to “#!/usr/bin/python”
hachoir元数据1.0(2007-07-11)
- Real audio: read number of channel, bit rate, sample rate and compute compression rate
- JPEG: Read user commment
- Windows ANI: Read frame rate
- Use Language from hachoir_core to store language from ID3 and MKV
- OLE2 and FLV: Extractors are now fault tolerant