如何在JSON文件中搜索字符串并返回最相似的出现?
我看到了一段Python代码,它可以生成一个文件,里面包含了字符串的表示方式,也就是一些向量。
这个文件的格式是用模型“all-MiniLM-L6-v2”生成的,具体内容是:
[
{
"codigo":1,
"descricao":"Alain Prost",
"embedding":[
-0.04376700147986412,
0.08378474414348602,
-0.044959407299757004,
-0.06955558061599731,
-0.0011182611342519522,
0.10521695017814636,
0.11189017444849014,
0.1651790291070938,
0.07515741139650345,
0.05490146577358246,
0.02417689561843872,
-0.016437038779258728,
0.010290289297699928,
0.017122231423854828,
-0.05169348418712616,
-0.016834666952490807,
-0.01511311624199152,
0.007502275053411722,
0.03960637003183365,
0.013815234415233135,
-0.05070938542485237,
-0.056177735328674316,
0.015933101996779442,
-0.007893730886280537,
1.4036894754099194e-05,
-0.01063060574233532,
0.05427253618836403,
0.016765154898166656,
0.04841822385787964,
-0.02379232831299305,
0.025293899700045586,
-0.06888816505670547,
-0.03624174743890762,
-0.040663089603185654,
-0.004510633181780577,
-0.03612743690609932,
-0.08588571101427078,
-0.03383230045437813,
-0.03971630707383156,
0.0925847589969635,
0.06980527937412262,
0.011318318545818329,
-0.14096367359161377,
0.029876230284571648,
-0.01633320190012455,
-0.010737375356256962,
0.04669718071818352,
-0.014320306479930878,
-0.05380765348672867,
-0.01826721429824829,
-0.0775720626115799,
0.007413752842694521,
0.010430709458887577,
-0.07329824566841125,
-0.038187265396118164,
-0.02384389564394951,
0.07746574282646179,
0.02492334321141243,
0.002449194435030222,
-0.05240411311388016,
0.020897606387734413,
-0.01624673791229725,
-0.06399786472320557,
-0.03406109660863876,
0.05889088287949562,
0.045756977051496506,
-0.08131976425647736,
0.0538562573492527,
-0.06892945617437363,
0.04350525140762329,
-0.05869260057806969,
0.024457629770040512,
0.0017231887904927135,
0.041741617023944855,
0.06515597552061081,
-0.08843974024057388,
-0.036975421011447906,
-0.04383429139852524,
-0.04289741814136505,
-0.03480835258960724,
0.04213075712323189,
-0.0947691947221756,
-0.10631424933671951,
-0.05164273455739021,
0.0527079738676548,
-0.0026282896287739277,
0.11123877763748169,
-0.010186375118792057,
0.004350247327238321,
-0.09234373271465302,
0.00022207570145837963,
-0.036559659987688065,
-0.05228490009903908,
0.03234873339533806,
-0.005511161405593157,
0.04750655218958855,
-0.08976765722036362,
-0.005845387000590563,
-0.02803802862763405,
0.14588715136051178,
-0.0012976604048162699,
0.04080767557024956,
0.04338463768362999,
0.015407223254442215,
-0.08320754021406174,
0.037945766001939774,
-0.017297346144914627,
0.024563206359744072,
0.04263288155198097,
0.025433938950300217,
-0.03403696045279503,
-0.05286381393671036,
-0.01756090484559536,
-0.002016932936385274,
0.0027279567439109087,
0.047004375606775284,
-0.04959726706147194,
-0.015475046820938587,
0.0725177600979805,
-0.04801830276846886,
0.048273105174303055,
-0.029613768681883812,
-0.05410566180944443,
0.05482526868581772,
0.0076617104932665825,
0.073040671646595,
-0.03162190690636635,
-8.039190253239277e-34,
-0.013159706257283688,
-0.016090840101242065,
0.07397063821554184,
0.07282368093729019,
-0.005004068370908499,
0.0062707713805139065,
-0.05940960720181465,
-0.07829747349023819,
-0.017122328281402588,
-0.07634077966213226,
-0.02839534729719162,
-0.07541434466838837,
0.011743525043129921,
-0.026070842519402504,
0.021514642983675003,
0.03044724091887474,
0.037806976586580276,
0.03549019619822502,
0.013167202472686768,
-0.018708810210227966,
0.007411877159029245,
0.04208431392908096,
-0.0017672213725745678,
0.016767306253314018,
0.042273279279470444,
0.00972240325063467,
0.09876655787229538,
-0.013753202743828297,
-0.039335619658231735,
-0.030701594427227974,
-0.006173287518322468,
0.025760365650057793,
-0.04054010286927223,
0.056439004838466644,
0.023311946541070938,
-0.022928737103939056,
-0.007852778770029545,
-0.04520851746201515,
0.045798882842063904,
0.008332950063049793,
0.005317758768796921,
-0.021758222952485085,
0.08777586370706558,
-0.001095705316402018,
0.008322017267346382,
-0.047873519361019135,
0.023781653493642807,
0.05791536718606949,
0.1103583350777626,
-0.03695837780833244,
0.03424883633852005,
-0.0043442994356155396,
-0.045328013598918915,
-0.006399083416908979,
-0.0022741626016795635,
0.026356521993875504,
-0.06595919281244278,
0.01489550806581974,
-0.00993384514003992,
-0.004256079904735088,
0.05318630486726761,
0.03500215709209442,
-0.030282488092780113,
0.06818058341741562,
-0.03611261025071144,
-0.00042665813816711307,
-0.03958318755030632,
0.054165199398994446,
0.03490123152732849,
-0.027355331927537918,
-0.1218971237540245,
0.059496473520994186,
0.11048189550638199,
-0.044817615300416946,
-0.045876920223236084,
0.05318649485707283,
-0.019234681501984596,
0.025589890778064728,
-0.09075476229190826,
0.006619459483772516,
-0.07048900425434113,
0.002478431211784482,
0.014732835814356804,
0.015378294512629509,
-0.010561746545135975,
-0.044879332184791565,
-0.0440324991941452,
0.000804506300482899,
0.04663644731044769,
0.12025374174118042,
0.02576148509979248,
-0.006950514391064644,
-0.008816791698336601,
0.01322726346552372,
-0.10207735002040863,
6.758107581531859e-35,
-0.04895230382680893,
0.00044889742275699973,
0.06258796155452728,
0.05086054280400276,
0.10057681798934937,
-0.03941198065876961,
0.021326975896954536,
0.08152614533901215,
-0.0004993032780475914,
0.019457058981060982,
0.09902072697877884,
-0.06066109240055084,
0.10520972311496735,
-0.1180957779288292,
-0.04043348878622055,
0.13587746024131775,
-0.011231197975575924,
0.005684691481292248,
-0.05967259034514427,
-0.08215924352407455,
0.024332145228981972,
0.024530921131372452,
0.031302567571401596,
-0.04070316627621651,
-0.12310207635164261,
0.03254634514451027,
0.11270913481712341,
0.060394853353500366,
-0.08383730798959732,
-0.01133598294109106,
-0.03808245062828064,
-0.023190151900053024,
-0.06691887974739075,
0.013513924553990364,
-0.05324095860123634,
0.09535984694957733,
-0.021769806742668152,
0.06808806955814362,
-0.0018341721734032035,
0.08443459868431091,
-0.04012518748641014,
-0.009696738794445992,
0.037875086069107056,
-0.026477433741092682,
0.07446243613958359,
-0.06514057517051697,
0.015685996040701866,
-0.06705299019813538,
0.024632146582007408,
-0.014661968685686588,
-0.018442410975694656,
0.05574002489447594,
-0.02014113776385784,
-0.047132350504398346,
0.0496378056704998,
0.0052811079658567905,
-0.03336593508720398,
-0.002416495466604829,
0.008500812575221062,
0.07484209537506104,
0.07398315519094467,
0.056250426918268204,
0.03129546344280243,
0.0264076329767704,
0.030829958617687225,
-0.06896060705184937,
-0.11525331437587738,
-0.02287617139518261,
0.014295394532382488,
0.06505643576383591,
0.08990739285945892,
0.05023878812789917,
-0.1306740790605545,
0.005228940863162279,
-0.02513446845114231,
0.09248469024896622,
-0.04951559379696846,
0.07476413995027542,
-0.02717839926481247,
0.008030343800783157,
-0.03858125954866409,
-0.09855242073535919,
-0.04341096431016922,
0.01543387770652771,
-0.024819210171699524,
0.036512166261672974,
-0.03962823003530502,
-0.09858094900846481,
0.0702538713812828,
-0.04758270084857941,
-0.0056264870800077915,
-0.025418918579816818,
0.04300766438245773,
-0.05326545983552933,
0.02151181921362877,
-1.2410082739222617e-08,
-0.022358816117048264,
0.015648063272237778,
-0.0415060892701149,
-0.00010502521035959944,
-0.0314381904900074,
-0.06952173262834549,
0.030622998252511024,
-0.09376975148916245,
-0.04358035698533058,
0.004702138714492321,
-0.04107971489429474,
-0.015522287227213383,
0.04647141695022583,
-0.03630853071808815,
0.07640153914690018,
0.015367956832051277,
0.0003513091360218823,
0.07410185784101486,
-0.024652114138007164,
0.04225892946124077,
0.005745219066739082,
0.03425384312868118,
-0.017282333225011826,
-0.028105905279517174,
-0.019109562039375305,
-0.022345177829265594,
0.04238805174827576,
0.01908213645219803,
0.004253830295056105,
-0.004323870409280062,
-0.00828507263213396,
0.04277166351675987,
0.01263809110969305,
-0.08606499433517456,
0.06635372340679169,
0.09709060937166214,
0.03835307061672211,
0.05318101495504379,
-0.0021448535844683647,
0.0766974613070488,
0.024480514228343964,
-0.03913270682096481,
0.004100404679775238,
0.029588110744953156,
0.006501220166683197,
0.03766942396759987,
0.0055293552577495575,
-0.05407750979065895,
0.003028532490134239,
-0.004140743054449558,
-0.0023235157132148743,
0.05007375031709671,
-0.01090778037905693,
0.012557691894471645,
0.018586203455924988,
0.053417790681123734,
-0.03843330964446068,
0.003068356541916728,
-0.07908729463815689,
-0.01524473074823618,
0.04108268767595291,
-0.02860739268362522,
0.06565400958061218,
0.023170659318566322
]
},
{
"codigo":2,
"descricao":"Ayrton Senna",
"embedding":[
-0.11275111883878708,
-0.04252505674958229,
-0.009049834683537483,
0.011212156154215336,
-0.047949858009815216,
0.030582023784518242,
0.13628773391246796,
-0.008150441572070122,
-0.0001293766836170107,
0.03802379593253136,
0.072489432990551,
-0.08784235268831253,
-0.0781305655837059,
0.06677593290805817,
-0.06298733502626419,
0.087885282933712,
-0.053338438272476196,
-0.013437110930681229,
0.02285934053361416,
-0.03463083133101463,
-0.1208895593881607,
0.035654135048389435,
-0.0034052329137921333,
0.02075120247900486,
0.01327497884631157,
-0.032590851187705994,
0.004454594571143389,
0.05418514460325241,
-0.06094468757510185,
-0.05599478632211685,
-0.004106787499040365,
-0.07678581774234772,
0.04340159147977829,
0.017842937260866165,
0.02949387952685356,
-0.007257427088916302,
-0.0644332766532898,
0.012047283351421356,
0.014177532866597176,
0.015570977702736855,
0.007476386614143848,
-0.01021003257483244,
-0.024430135264992714,
0.01893731951713562,
-0.03585066273808479,
-0.040841732174158096,
0.02237538993358612,
-0.06412603706121445,
0.03432679921388626,
0.0031201448291540146,
-0.026181157678365707,
-0.04635085165500641,
-0.059544868767261505,
-0.005927531514316797,
-0.0033280153293162584,
0.021542759612202644,
-0.01260500680655241,
0.033978041261434555,
-0.03178206831216812,
-0.025371814146637917,
0.07174889743328094,
-0.0024521711748093367,
-0.09167266637086868,
-0.046929117292165756,
0.022732241079211235,
0.02222401276230812,
-0.024650216102600098,
-0.04264489933848381,
0.024509301409125328,
-0.026767950505018234,
0.09544091671705246,
-0.06721024960279465,
0.018102342262864113,
-0.018531465902924538,
-0.02721196413040161,
0.005214688368141651,
0.03094632364809513,
-0.08467657119035721,
0.006663993466645479,
0.06828898191452026,
-0.009517649188637733,
-0.08511777967214584,
-0.03374364972114563,
-0.027803972363471985,
0.023442445322871208,
-0.0266878679394722,
0.006919735576957464,
0.010021806694567204,
-0.036597177386283875,
-0.00617715111002326,
0.014031169936060905,
0.0701993927359581,
-0.0393521748483181,
-0.007316326256841421,
0.014301341958343983,
0.02702433057129383,
0.03956086188554764,
0.060301244258880615,
-0.055976178497076035,
0.1338510662317276,
0.001156043028458953,
0.041097491979599,
-0.14731338620185852,
-0.0029199898708611727,
-0.00013599869271274656,
-0.0736226737499237,
0.03325321152806282,
-0.14085189998149872,
0.03928329795598984,
-0.011393381282687187,
0.008337186649441719,
0.022270601242780685,
-0.06819078326225281,
0.010874142870306969,
-0.049424681812524796,
0.019682565703988075,
-0.010403553955256939,
0.09375917166471481,
0.02362806536257267,
0.07171869277954102,
0.020774055272340775,
0.042299773544073105,
-0.06543327867984772,
0.11427047103643417,
0.05618273466825485,
-0.03619793802499771,
-0.07144389301538467,
5.301082792865114e-34,
0.014501710422337055,
-0.03433850780129433,
0.008394746109843254,
0.07597401738166809,
0.10349489003419876,
0.015405677258968353,
-0.032848604023456573,
-0.06884612143039703,
-0.046885162591934204,
-0.09671584516763687,
-0.011314226314425468,
-0.01856561005115509,
-0.06512365490198135,
-0.07238120585680008,
-0.02506783977150917,
-0.009671981446444988,
-0.0677078366279602,
-0.05653739720582962,
-0.06995690613985062,
-0.008146820589900017,
-0.01214279793202877,
0.059145353734493256,
-0.00256781792268157,
0.08436328917741776,
-0.0045662252232432365,
-0.07445189356803894,
0.01798633486032486,
0.060066550970077515,
0.017383728176355362,
0.04766349866986275,
-0.015692079439759254,
-0.04757498577237129,
-0.02762548439204693,
0.047303322702646255,
0.07723086327314377,
-0.07400372624397278,
0.011420260183513165,
-0.04891768470406532,
-0.016991885378956795,
0.026902154088020325,
-0.04760833457112312,
0.018312858417630196,
-0.02989778108894825,
0.0897020772099495,
-0.04281701147556305,
0.013710093684494495,
0.0396006740629673,
0.06410706043243408,
0.08556067198514938,
-0.04379606246948242,
-0.07834725081920624,
-0.06623218953609467,
-0.030430499464273453,
-0.005324682220816612,
-0.034603726118803024,
-0.062134772539138794,
0.008219441398978233,
0.04189149662852287,
0.10299007594585419,
0.021307796239852905,
0.0607219822704792,
-0.04500466585159302,
-0.0028528186958283186,
-0.06410374492406845,
-0.0048947567120194435,
0.028550991788506508,
-0.021970335394144058,
-0.006687256507575512,
0.09578950703144073,
-0.08069927245378494,
0.002758170710876584,
-0.026523113250732422,
0.08033037930727005,
0.013537789694964886,
-0.03719128668308258,
0.05603921413421631,
0.020577840507030487,
0.02021518349647522,
-0.10423598438501358,
-0.059956539422273636,
-0.0928533598780632,
-0.019149193540215492,
0.008638947270810604,
0.07607108354568481,
0.023537373170256615,
-0.03286019340157509,
-0.029357632622122765,
-0.06599190086126328,
0.08896324038505554,
-0.011197819374501705,
0.019649725407361984,
0.0985945537686348,
0.006205311976373196,
-0.13322098553180695,
-0.015043631196022034,
-1.1596729315441888e-34,
-0.02202794700860977,
0.022142373025417328,
-0.0908736065030098,
0.06232170760631561,
0.02226484753191471,
-0.03699196130037308,
0.025422628968954086,
0.03936171904206276,
0.051816947758197784,
0.01941952295601368,
0.04169097915291786,
-0.0668347030878067,
0.028993966057896614,
-0.04779044911265373,
0.016057901084423065,
0.11099212616682053,
0.13915076851844788,
0.04464653879404068,
0.01808364875614643,
0.0003248233115300536,
-0.027428222820162773,
0.03427209332585335,
-0.11964283138513565,
0.020802685990929604,
-0.024637149646878242,
0.04913446679711342,
-0.03343263268470764,
0.0007999022491276264,
-0.0363985113799572,
0.015618329867720604,
-0.03916076198220253,
-0.027130674570798874,
0.030908452346920967,
0.00839168019592762,
-0.019726410508155823,
0.06671995669603348,
0.06294506788253784,
-0.00662987632676959,
-0.048772092908620834,
0.10865209251642227,
0.077969029545784,
-0.03438835218548775,
-0.016370991244912148,
0.08795364946126938,
-0.007750320713967085,
-0.09498050808906555,
-0.07556591928005219,
0.10646194964647293,
-0.0030609527602791786,
-0.012251066043972969,
0.05219857394695282,
-0.03321979194879532,
0.057967476546764374,
-0.10663087666034698,
0.032691169530153275,
-0.009770980104804039,
0.047311775386333466,
-0.02411728724837303,
0.05368872731924057,
0.06182878091931343,
0.07617446780204773,
-0.05318167805671692,
-0.033945482224226,
0.03228505700826645,
-0.007170077878981829,
0.05959790572524071,
-0.056909944862127304,
-0.02985152043402195,
0.006446316838264465,
0.03801654651761055,
0.012191289104521275,
0.029834797605872154,
-0.006095391698181629,
-0.029733596369624138,
-0.09887736290693283,
0.009565076790750027,
0.04332743212580681,
0.042507629841566086,
0.06287199258804321,
-0.01998593844473362,
-0.03811212256550789,
-0.014080194756388664,
0.039666227996349335,
0.03266460821032524,
0.07517889142036438,
0.04624589905142784,
0.05244888737797737,
0.019929179921746254,
0.02101832628250122,
0.007519490085542202,
0.06198029965162277,
0.023592155426740646,
0.04938758164644241,
0.027339544147253036,
-0.01008431427180767,
-1.4285190808038806e-08,
0.030376587063074112,
-0.02963241934776306,
-0.035167571157217026,
0.02413598634302616,
0.0570375956594944,
0.007684706710278988,
0.12187618762254715,
-0.007570839952677488,
0.029319867491722107,
0.06720910966396332,
0.024405328556895256,
-0.011419138871133327,
0.03922741860151291,
0.024336550384759903,
0.04098387807607651,
0.03207016363739967,
-0.008450492285192013,
0.1041002869606018,
-0.03652212396264076,
0.010552185587584972,
-0.049762122333049774,
0.06643325090408325,
-0.04128921404480934,
-0.05123789608478546,
-0.029389763250947,
0.0248995590955019,
-0.04405771195888519,
0.1402818262577057,
0.014684601686894894,
-0.009909572079777718,
0.010877342894673347,
0.005315002519637346,
0.00048737594624981284,
-0.04477892816066742,
-0.06588546186685562,
0.005400381051003933,
-0.02504221349954605,
-0.010384864173829556,
-0.02279285155236721,
0.006243698764592409,
-0.059665076434612274,
0.024622157216072083,
0.08627490699291229,
0.044212888926267624,
-0.02827167697250843,
-0.019425155594944954,
-0.022057976573705673,
-0.03141951560974121,
0.043426185846328735,
0.018655214458703995,
0.07349660992622375,
0.028337983414530754,
0.018872670829296112,
0.07257463783025742,
0.003528063651174307,
-0.010571202263236046,
-0.01876663975417614,
0.02528848499059677,
-0.13014712929725647,
-0.061667099595069885,
0.013025691732764244,
0.00994929950684309,
-0.007341751828789711,
-0.06776775419712067
]
}
]
我想让ChatGPT理解这个内容,并帮我生成一段C#代码,这段代码可以用一个字符串作为参数,来根据这些向量搜索相似的字符串。
举个例子:
var drive = searchInEmbeddings(“sena”);
然后返回“Ayrton Senna”。
这里有没有人做过类似的事情,可以帮我一下?
我提到的那段Python代码是:
import json
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
with open('embeddings.json', 'r') as f:
data = json.load(f)
model = SentenceTransformer('all-MiniLM-L6-v2')
def find_most_similar(descriptions, query, top_n=3):
query_embedding = model.encode([query])
description_embeddings = np.array([desc['embedding'] for desc in descriptions])
similarities = cosine_similarity(query_embedding, description_embeddings)[0]
top_indices = np.argsort(similarities)[-top_n:][::-1]
return [{"codigo": descriptions[i]['codigo'], "descricao": descriptions[i]['descricao']} for i in top_indices]
query = "sena"
top_matches = find_most_similar(data, query, top_n=3)
print("Question:", query , "\Answers: \nNearest occurrences:\n", top_matches)
然后返回:
$ python search.py
Question: sena
Answers:
Nearest occurrences:
[{'codigo': 3, 'descricao': 'Ayrton Senna'}, {'codigo': 31, 'descricao': 'Niki Lauda'}, {'codigo': 21, 'descricao': 'Kimi Räikkönen'}]
谢谢大家的帮助。
1 个回答
0
如果你想问,“我怎么能让ChatGPT帮我写C#代码呢...”你可以这样问:
请写一个小函数,用C#实现,它接受一个字符串到词向量的映射和一个字符串'a',然后返回与字符串'a'最相近的那个字符串。
接下来你还可以问:
好的,请再写一个C#函数,它能从输入文件中读取字符串到词向量的映射。
对于第一个问题,ChatGPT3.5生成的代码可能不是完全正确,但很容易修正。