gmane邮件列表数据库分析工具

gmaneLegac的Python项目详细描述


这个项目为gmane的分析提供帮助类。 电子邮件数据库。安装时使用:

$ pip install gmaneLegacy

$ python setup.py install

为了更好地控制定制(和调试),请克隆repo并使用-e:

安装pip

$ git clone https://github.com/ttm/gmaneLegacy.git

$ pip install -e <path_to_repo>

这种安装方法对于 从ipython.lib.deepreload和标准importlib重新加载函数。

功能基于有关交互网络的物理文章: [1]人类交互网络的稳定性:顶点的原始类型、测度的显著性和活动统计:http://arxiv.org/abs/1310.7769 [2]交互网络中文本产生的连接分化:http://arxiv.org/abs/1412.7309 [3]versinus:进化图的可视化方法:http://arxiv.org/abs/1412.7311

其核心概念是:1)拓扑结构分析;2)文本生成分析;3)演化结构可视化。随着时间的推移和参与者之间的活动分布也通过特定的程序和间接通过1)、2)和3)来探讨。

理想情况下,该软件包应能够: -正在下载gmane电子邮件列表数据。 -用下载的数据建立基本的数据结构。 -通过复杂网络和nlp准则分析数据。 -通过多种布局方法实现可视化。

附言。 根据[1],手工在网络代理活动中实现对称性度量(在网络和数字包中找不到)。 第2页。 正在进行的研究 tests/newtexttables.py和 测试/makeOverallTextAnalysis.py 附言。 还要检查gmane python包https://github.com/ttm/gmane

用法示例

从一个gmane列表下载消息:

importgmaneasgdl=g.DownloadGmaneData()# saves into ~/.gmane/dl.downloadListsIDS()# acquires all GMANE list_idsdl.downloadListMessages(dl.list_ids[100])dl.cleanDownloadedLists()# remove empty messages for coherencedl.downloadedStats()# creates ~/.gmane/stats.txt# to load message contents to Python objects:# load 10 messages from list with list_id gmane.ietf.rfc822lm=g.LoadMessages("gmane.ietf.rfc822",10)# or access the structures downloaded to your filesystemdl=g.DownloadGmaneData()dl.getDownloadedLists()lms=[]# and download all messages from 5 listsforlist_idindl.downloaded_lists[:5]:lms.append(g.LoadMessages(list_id))# to load first three lists with the greated number# of downloaded messages:dl.downloadedStats()# might take a whileload_msgs=[]forlist_statindl.lists[:3]:list_id=list_stat[0]load_msgs.append(g.LoadMessages(list_id))# to make basic datastructures of a list with# greatest number of messages:ds=g.MessageDataStructures(load_msgs[0])mm=ds.messagesids=ds.message_idsprint("first: ",mm[ids[0]][2],"last:",mm[ids[-1]][2])# circular (directional) statistics for activity along time# (hours of the day, days of the week, days of the month, etc):# mean_vec, mean_angle, size_mean_vec, circular_mean,# circular_variance, circular dispersion# and histogramsts=g.TimeStatistics(ds)print("made overall circular activity statistics along time")# make latex tables to observe distributions within bins of interesthi=100*ts.hours["histogram"]/ts.hours["histogram"].sum()row_labels=list(range(24))tstring=g.parcialSums(row_labels,data=[hi],partials=[1,2,3,4,6,12],partial_labels=["h","2h","3h","4h","6h","12h"],datarow_labels=["APACHE"])g.writeTex(tstring,"here.tex")ps=g.AgentStatistics(ds)print("made overall activity statistics among participants")# build the interaction network of the messages:nw=g.InteractionNetwok(ds)print("number of nodes: {}, number of edges: {}".format(nw.g.number_of_nodes(),nw.g.number_of_edges()))nm=g.NetworkMeasures(nw)# take measures, including symmetry related measuresnp=g.NetworkPartitioning(nm)# partition in primitive typologysa=np.sectorialized_agents# get members of each sectorprint("{} agents in periphery, {} are intermediary and {} hubs".format(sa[0],sa[1],sa[2]))sa=np.sectorialized_agents__# smoothed histogram for classificationprint("{} agents in periphery, {} are intermediary and {} hubs".format(sa[0],sa[1],sa[2]))# drawnd=g.NetworkDrawer()print("drawer started")nd.makeLayout(nm)print("gave (x,y) for each author with 5-15-80")nd2=g.NetworkDrawer()print("drawer two started")nd2.makeLayout(nm,np)print("gave (x,y) for each author with \
sectors by comparison with Erdos-Renyi")nd.drawNetwork(iN,nm,"test.png")nd2.drawNetwork(iN,nm,"test2.png")# make basic PCA plots of network measures:npca=g.NetworkPCA(nm)# Plot PCA with a colored primitive sectorsnpca=g.NetworkPCA(nm,np)# Evolves network with measures, partitions,# PCA, principal components and Versinus plots saved to disklm=lms[0]# loaded messages from list with most messagesne=g.NetworkEvolution(step_size=10)ne.evolveRaw(lm.messages,imagerate=4,erdos_sectors=True)# ne.makeVideo() use this to avoid evolving again just to make video# see testDrawer.py or g.NetworkEvolution to make movies:# https://www.youtube.com/watch?v=iS8NwEy291g# after making network evolution measurements and video,# you can both make music:em=g.EvolutionMusic()print("music is done")# avconv -i mixY.wav -i evo[..<depends on the evolution done>..].avi final.avi# delivers you the final.avi animation with a soundtrack relative to network measures# currently it is the 'four hubs dance' by default:# https://www.youtube.com/watch?v=YxDiwzAUPeU# and further analysis of measures and Erdos sectors:et=g.EvolutionTimelines()print("Written png files with network measures along evolution timeline")# Enjoy!

进一步的文档在tests/文件夹和object docstrings中。

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
JBossJava进程内存持续增长   Java postincrement(++)在作为参数传递时表现不符合预期   TableView列的java编辑值   java根据springboot@Scheduled注释使用的条件动态修改调度程序计时   java无法将jsp表单值设置为类变量   java ParseQuery from字段未保存   java为什么日历返回月份。是否获取(Calendar.MONTH)上个月而不是当前月?   java无法获取api密钥的md5指纹   java通用DAO和嵌套属性支持   javapomi的版本已经改变了。从OJDBC6到OJDBC8的xml。使用新版本OJDBC8执行某些查询时出现锁定问题   java IntelliJ颜色方案定制   java从第三方读取Linux存储库   amazon s3在AWS s3 Java SDK中设置对象元数据   java一对多映射比