专利解析工具是一个库,提供从谷歌的uspto数据生成训练和测试集的工具,有助于测试机器学习算法
patent-parsing-tools的Python项目详细描述
##系统要求:
`Bash sudo yum install python-devellibxslt-devellibxml2-devel `
##python要求:
`Bash pip install -r requirements.txt `
##跑步:
收集和序列化数据: `Bash python -m patent_parsing_tools.supervisor [working_directory] [train_destination] [test_destination] [year_from] [year_to] `
例如。 `Bash python -m patent_parsing_tools.supervisor patents/working_directory patents/train_destination patents/test_destination 2014 2015 `
用列车组生成字典: `Bash python -m patent_parsing_tools.bow.dictionary_maker [train_directory] [max_parsed_patents] [dict_max_size] [dictionary_name] `
例如。 `Bash python -m patent_parsing_tools.bow.dictionary_maker patents/train_destination 1000000000 4096 dictionary.txt `
使用列车组和测试集生成单词包: `Bash python -m patent_parsing_tools.bow.bag_of_words [directory_with_serialized_patents] [destination_directory] [dictionary.txt] [package_size > 1024] `
例如。 `Bash python -m patent_parsing_tools.bow.bag_of_words patents/train_destination patents/final_dataset_train dictionary.txt 1048576 python -m patent_parsing_tools.bow.bag_of_words patents/test_destination patents/final_dataset_test dictionary.txt 1048576 `
##运行测试
`Bash python -m unittest discover . `