将tsv文件转换为可用于Python中的节点和边

1 投票

1 回答

2322 浏览

提问于 2025-04-18 01:29

我有一个tsv文件，我想读取这个文件，并计算路径中节点的数量。

这个tsv文件的部分内容是这样的：

  6a3701d319fc3754  1297740409  166  14th_century;15th_century;16th_century;Pacific_Ocean;Atlantic_Ocean;Accra;Africa;Atlantic_slave_trade;African_slave_trade  NULL
  3824310e536af032  1344753412  88  14th_century;Europe;Africa;Atlantic_slave_trade;African_slave_trade 3

路径的格式是这样的：14th_century;15th_century，用分号';'分隔。

这是我目前写的代码：

import networkx as nx

fh = open("test.tsv", 'rb')
G = nx.read_edgelist("test.tsv", create_using=nx.DiGraph())
print G.nodes()
print G.edges()

所以我想问的是，如何计算路径中经过的节点数量呢？

数据处理图论节点计算 tsv文件路径分析

1 个回答

我在这里使用pandas这个库来提高速度，你可以通过 pip install pandas 来安装它，更多信息可以查看这里: http://pandas.pydata.org/

首先，根据你的示例代码来构建我们的数据框：

In [39]:

temp = """6a3701d319fc3754  1297740409  166  14th_century;15th_century;16th_century;Pacific_Ocean;Atlantic_Ocean;Accra;Africa;Atlantic_slave_trade;African_slave_trade  NULL

  3824310e536af032  1344753412  88  14th_century;Europe;Africa;Atlantic_slave_trade;African_slave_trade 3"""

# construct the dataframe
# in your case replace io.String() with the path to your tsv file
df = pd.read_csv(io.StringIO(temp), sep='\s+', header=None, names=['a','b','c','d','e'])

df
Out[39]:

                  a           b    c  \
0  6a3701d319fc3754  1297740409  166   
1  3824310e536af032  1344753412   88   

                                                   d   e  
0  14th_century;15th_century;16th_century;Pacific... NaN  
1  14th_century;Europe;Africa;Atlantic_slave_trad...   3  

[2 rows x 5 columns]

In [65]:

# use itertools to flatten our list of lists
import itertools

def to_edge_list(x):
    # split on semi-colon
    split_list = x.split(';')
    #print(split_list)
    # get our main node
    primary_node = split_list[0]
    # construct our edge list
    edge_list=[]
    # create a list comprehension from the split list
    edge_list = [(primary_node, x) for x in split_list[1:] ]
    #print(edge_list)
    return edge_list

# now use itertools to flatten the list of lists into a single list
combined_edge_list = list(itertools.chain.from_iterable(df['d'].apply(to_edge_list)))

print(combined_edge_list)

[('14th_century', '15th_century'), ('14th_century', '16th_century'), ('14th_century', 'Pacific_Ocean'), ('14th_century', 'Atlantic_Ocean'), ('14th_century', 'Accra'), ('14th_century', 'Africa'), ('14th_century', 'Atlantic_slave_trade'), ('14th_century', 'African_slave_trade'), ('14th_century', 'Europe'), ('14th_century', 'Africa'), ('14th_century', 'Atlantic_slave_trade'), ('14th_century', 'African_slave_trade')]

# Now construct our networkx graph from the edge list
In [66]:

import networkx as nx

G = nx.MultiDiGraph()
G.add_edges_from(combined_edge_list)
G.edges()


Out[66]:

[('14th_century', '15th_century'),
 ('14th_century', 'Africa'),
 ('14th_century', 'Africa'),
 ('14th_century', 'Atlantic_slave_trade'),
 ('14th_century', 'Atlantic_slave_trade'),
 ('14th_century', 'African_slave_trade'),
 ('14th_century', 'African_slave_trade'),
 ('14th_century', '16th_century'),
 ('14th_century', 'Accra'),
 ('14th_century', 'Europe'),
 ('14th_century', 'Atlantic_Ocean'),
 ('14th_century', 'Pacific_Ocean')]

然后画出图表（虽然看起来不太好，但没关系）：

在这里输入图片描述

回答于 2025-04-18 由 Python大师

分享举报

将tsv文件转换为可用于Python中的节点和边

1 个回答

撰写回答