用于创建和使用zipf发行版的包
zipf的Python项目详细描述
它是做什么的?
zipf包的实现简化了zipf分布的创建和操作,如和、减、乘、除、平均值、方差等静态操作。
我怎么得到它?
只需输入终端:
pip install zipf
计算距离和散度
我编写了另一个名为dictances的包,它计算离散分布(如zipf)之间的各种距离和发散。下面是一个示例:
fromzipfimportZipffromdictancesimport*a=zipf.load("my_first_zipf.json")b=zipf.load("my_second_zipf.json")euclidean(a,b)chebyshev(a,b)hamming(a,b)kullback_leibler(a,b)jensen_shannon(a,b)
使用zipf工厂创建zipf
这里有几个例子:
列表中的zipf
fromzipf.factoriesimportZipfFromListmy_factory=ZipfFromList()my_zipf=my_factory.run(["one","one","two","my","oh","my",1,2,3])print(my_zipf)''' { "one": 0.22222222222222215, "my": 0.22222222222222215, "two": 0.11111111111111108, "oh": 0.11111111111111108, "1": 0.11111111111111108, "2": 0.11111111111111108, "3": 0.11111111111111108 } '''
来自文本的zipf
fromzipf.factoriesimportZipfFromTextmy_factory=ZipfFromText()my_factory.set_word_filter(lambdaw:len(w)>3)my_zipf=my_factory.run("""You've got to find what you love. And that is as true for your work as it is for your lovers. Keep looking. Don't settle.""")print(my_zipf)''' { "your": 0.16666666666666666, "find": 0.08333333333333333, "what": 0.08333333333333333, "love": 0.08333333333333333, "that": 0.08333333333333333, "true": 0.08333333333333333, "work": 0.08333333333333333, "lovers": 0.08333333333333333, "Keep": 0.08333333333333333, "looking": 0.08333333333333333, "settle": 0.08333333333333333 } '''
k序列的zipf
fromzipf.factoriesimportZipfFromKSequencesequence_fraction_len=5my_factory=ZipfFromKSequence(sequence_fraction_len)my_zipf=my_factory.run("ACTGGAAATGATGGDTGATDGATGAGTDGATGGGGGAAAGDTGATDGATDGATGDTGGGGADDDGATAGDTAGTDGAGAGAGDTGATDGAAAGDTG")print(my_zipf)''' { "TGGGG": 0.1, "ACTGG": 0.05, "AAATG": 0.05, "ATGGD": 0.05, "TGATD": 0.05, "GATGA": 0.05, "GTDGA": 0.05, "GAAAG": 0.05, "DTGAT": 0.05, "DGATD": 0.05, "GATGD": 0.05, "ADDDG": 0.05, "ATAGD": 0.05, "TAGTD": 0.05, "GAGAG": 0.05, "AGDTG": 0.05, "ATDGA": 0.05, "AAGDT": 0.05, "G": 0.05 } '''
来自文本文件的zipf
fromzipf.factoriesimportZipfFromFilemy_factory=ZipfFromFile()my_factory.set_word_filter(lambdaw:w!="brown")my_zipf=my_factory.run()print(my_zipf)''' { "The": 0.125, "quick": 0.125, "fox": 0.125, "jumps": 0.125, "over": 0.125, "the": 0.125, "lazy": 0.125, "dog": 0.125 } '''
来自网页的zipf
fromzipf.factoriesimportZipfFromUrlimportjsonmy_factory=ZipfFromUrl()my_factory.set_word_filter(lambdaw:int(w)>100)my_factory.set_interface(lambdar:json.loads(r.text)["ip"])my_zipf=my_factory.run("https://api.ipify.org/?format=json")print(my_zipf)''' { "134": 0.5, "165": 0.5 } '''
目录
中的zipffromzipf.factoriesimportZipfFromDirimportjsonmy_factory=ZipfFromDir(use_cli=True)my_factory.set_word_filter(lambdaw:len(w)>4)my_zipf=my_factory.run("path/to/my/directory",["txt"])''' My directory contains 2 files with the following texts: - You must not lose faith in humanity. Humanity is an ocean; if a few drops of the ocean are dirty, the ocean does not become dirty. - Try not to become a man of success, but rather try to become a man of value. '''print(my_zipf)''' { "ocean": 0.20000000000000004, "become": 0.20000000000000004, "dirty": 0.13333333333333336, "faith": 0.06666666666666668, "humanity": 0.06666666666666668, "Humanity": 0.06666666666666668, "drops": 0.06666666666666668, "success": 0.06666666666666668, "rather": 0.06666666666666668, "value": 0.06666666666666668 } '''
创建zipf的选项
有些内置选项可用,您可以通过打印来读取任何工厂对象的选项:
fromzipf.zipf.factoriesimportZipfFromListprint(ZipfFromList())''' { "remove_stop_words": false, # Removes stop words (currently only Italian's) "minimum_count": 0, # Removes words that appear less than 'minimum_count' "chain_min_len": 1, # Chains up words, starting by a min of 'chain_min_len' "chain_max_len": 1, # and ending to a maximum of 'chain_max_len' "chaining_character": " ", # The character to interpose between words "chain_after_filter": false, # The chaining is done after filtering "chain_after_clean": false # The chaining is done after cleaning } '''
许可证
这个图书馆是根据麻省理工学院的许可证发行的。