生成具有可选dq问题的真实原始数据集
rawdata的Python项目详细描述
生成具有可选dq问题的真实原始数据集
要安装,请运行
pipinstallrawdata
基本用法
创建一个随机表
importrawdata.generatecolLabel=['Year','Name','Born','Details','Amount']colTypes=['DATE','PEOPLE','PLACE','WORD','CURRENCY']tbl=rawdata.generate.TableGenerator(3,colTypes,colLabel)print(tbl)>Year,name,Age,Born,Details,Amount>2013,Douglas,34,Scandinavia,BowlingBall,$34.95>1999,Hunter,65,SierraLeone,Fish,12.00>2005,Shubha,18,Madagascar,screenplay,-$231.00
向表格中添加错误
importrawdata.errorst=rawdata.errors.TableWithErrors(tbl,'BAD_STRING')t.add_errors(3)print(t.tbl)
在添加了3个随机错误之后,douglas中还有额外的空格,douglas born列中有一个假字符串,hunter缺少born列
YearNameBorn------------------------2013DouglasBAD_STRING1999Hunter2005ShubhaMadagascar
您可以使用通过自定义列表生成的列
custom_list=['Carved Statue','1984 Volvo','2 metre Ball of string']tbl=TableGenerator(5,['PEOPLE','INT',custom_list],['Name','Age','Fav Possession'])print(tbl)>Name,Age,FavPossession>Inez,58,CarvedStatue>Zane,50,2metreBallofstring>Jered,49,1984Volvo>Tameron,55,2metreBallofstring>Wyatt,68,CarvedStatue
其他功能
importrawdata.generaten=rawdata.generate.NumberGenerators=rawdata.generate.StringGeneratorprint('Random Number = ',n.random_int(1,100))>RandomNumber=84print('Random Letters = ',s.random_letters(40))>RandomLetters=T1CElkRAGPAmWSavbDItDbFmQIvUh26SyJE58x49print('Random Password = ',s.generate_password())>RandomPassword=peujlsmbf19966YKCXwords=rawdata.generate.get_list_words()print(len(words),' words : ',words[500:502])>10739words:['architeuthis','arcsine']places=rawdata.generate.get_list_places()print(len(places),' places : ',places[58:60])>262places:['Brazil','British Virgin Islands']
列类型列表(表格生成器)
'INT'-returnsanumber'CURRENCY'-returnsacurrencythatmayhavestrings$/pounds'STRING'-returnsarandomstring'WORD'-returnsawordfromnouns.csv'DATE'-returnsadate'YEAR'-returnsayear.Bothyearanddatecanhaverangessetviaset_range()'PLACE'-returnsalocationfromcountry.csv'PEOPLE'-returnsanamefromnames.csv[list]-passanylisttoreturnarandomchoicefromit(e.g.my_colours=['Blue','Green','Orange'])