我有一个包含一个subreddits列的dataframe和另一个包含在该subreddit中评论的作者的列。以下是快照:
subreddit user
0xProject [7878ayush, Mr_Yukon_C, NomChompsky92, PM_ME_Y...
100sexiest [T10rock]
100yearsago [PM_ME_MII, Quisnam]
1022 [MikuWaifuForLaifu, ghrshow, johnnymn1]
1200isjerky [Rhiann0n, Throwaway412160987]
1200isplenty [18hourbruh, Bambi726, Cosmiicao, Gronky_Kongg...
1200isplentyketo [yanqi83]
12ozmouse [ChBass]
12thMan [8064r7, TxAg09, brb1515]
12winArenaLog [fnayr]
13ReasonsWhy [SawRub, _mw8, morbs4]
13or30 [BOTS_RISE_UP, mmcjjc]
14ers [BuccoFan8]
1500isplenty [nnowak]
15SecondStories [DANKY-CHAN, NORMIESDIE]
18650masterrace [Airazz]
18_19 [-888-, 3mb3r89, FuriousBiCurious, FusRohDoing...
1911 [EuphoricaI, Frankshungry, SpicyMagnum23, cnw4...
195 [RobDawg344, ooi_]
19KidsandCounting [Kmw134, Lvzv, mpr1011, runjanarun]
1P_LSD [420jazz, A1M8E7, A_FABULOUS_PLUM, BS_work, EL...
2007oneclan [J_D_I]
2007scape [-GrayMan-, -J-a-y-, -Maxy-, 07_Tank, 0ipopo, ...
2010sMusic [Vranak]
21savage [Uyghur1]
22lr [microphohn]
23andme [Nimushiru, Pinuzzo, Pugmas, Sav1025, TOK715, ...
240sx [I_am_a_Dan, SmackSmackk, jimmyjimmyjimmy_, pr...
24CarrotCraft [pikaras]
24hoursupport [GTALionKing, Hashi856, Moroax, SpankN, fuck_u...
...
youtubetv [ComLaw, P1_1310, kcamacho11]
yoyhammer [Emicrania, Jbugman, RoninXiC, Sprionk, jonow83]
ypp [Loxcam]
ypsi [FLoaf]
ytp [Profsano]
yugijerk [4sham, Exos_VII]
yugioh [1001puppys, 6000j, 8512332158, A_fiSHy_fish, ...
yumenikki [ripa9]
yuri [COMMENTS_ON_NSFW_PIC, MikuxLuka401, Pikushibu...
yuri_jp [Pikushibu]
yuruyuri [ACG_Yuri, KirinoNakano, OSPFv3, SarahLia]
zagreb [jocus985]
zcoin [Fugazi007]
zec [Corm, GSXP, JASH_DOADELESS_, PSYKO_Inc, infinis]
zedmains [BTZx2, EggyGG, Ryan_A121, ShacObama, Tryxi, m...
zelda [01110111011000010111, Aura64, AzaraAybara, BA...
zen [ASAMANNAMMEDNIGEL, Cranky_Kong, Dhammakayaram...
zerocarb [BigBrain007, Manga-san, vicinius]
zetime [xrnzrx]
zfs [Emachina, bqq100, fryfrog, michio_kakus_hair,...
ziftrCOIN [GT712]
zoemains [DrahaKka, OJSaucy, hahAAsuo, nysra, x3noPLEB,...
zombies [carbon107, rjksn]
zomby [jwccs46]
zootopia [BCRE8TVE, Bocaj1000, BunnyMakingAMark, Far414...
zumba [GabyArcoiris]
zyramains [Dragonasaur, Shaiaan]
zyzz [Xayv]
我试图遍历每个subreddit,然后遍历它下面的每个subreddit以找到共享的评论。最终目标是一个包含subreddit 1、subreddit2和共享注释器数量的数据帧。在
我甚至无法想象如何使用apply来实现这一点,也不知道如何使用pandas df来实现双for循环
这个主意对吗?在
^{pr2}$以下是输入和预期输出的示例:
df = pd.DataFrame({'subreddit': ['sub1', 'sub2', 'sub3', 'sub4'],
'user': [['A', 'B', 'C'], ['A', 'F', 'C'], ['F', 'E', 'D'], ['X', 'Y', 'Z']]})
第一个子标题的输出:
subreddit_1 subreddit_2 shared_users
sub1 sub2 2
sub1 sub3 0
sub1 sub4 0
我不知道你能不能用循环。这看起来与计算关联矩阵的方式非常相似,它使用pandas documentation中的循环。至少它是对称的,所以你只需要比较它们的一半。在
您不需要计算相关性,而是要找到两个列表
lst1
和lst2
之间共享的元素数量,即len(set(lst1) & set(lst2))
如果你想得到更小的
^{pr2}$DataFrames
,那么你只需要一点操作。例如:相关问题 更多 >
编程相关推荐