Pypark图的真子图

from graphframes.examples import Graphs import graphframes g = Graphs(sqlContext).friends() # Get example graph # Select subgraph of users older than 30 v2 = g.vertices.filter("age > 30") g2 = graphframes.GraphFrame(v2, g.edges)

2条回答

网友

1楼 · 编辑于 2024-05-14 07:43:06

我用来给一个graphframe子图的方法是使用motif：

motifs = g.find("(a)-[e]->(b)").filter(<conditions for a,b or e>)
new_vertices = sqlContext.createDataFrame(motifs.map(lambda row: row.a).union(motifs.map(lambda row: row.b)).distinct())
new_edges = sqlContext.createDataFrame(motifs.map(lambda row:row.e).distinct())
new_graph = GraphFrame(new_vertices,new_edges)

虽然这看起来更复杂，可能需要更长的运行时间，但对于更复杂的图形查询，这很适合作为单个实体而不是作为单独的顶点和边与graphframe交互。因此，对顶点进行过滤也会影响graphframe中左侧的边。在

网友

2楼 · 编辑于 2024-05-14 07:43:06

有意思。。我看不出结果：

>>> from graphframes.examples import Graphs
>>> import graphframes
>>> g = Graphs(sqlContext).friends()  # Get example graph
>>> # Select subgraph of users older than 30
... v2 = g.vertices.filter("age > 30")
>>> g2 = graphframes.GraphFrame(v2, g.edges)
>>> print(g.vertices.count(), g.edges.count())
(6, 7)
>>> print(g2.vertices.count(), g2.edges.count())
(4, 7)

到现在为止，GraphFrames不检查图是否有效-也就是说，在图形构建时，所有的边都连接到顶点等等。但是在过滤器之后顶点的数量似乎是正确的？在

相关问题更多 >

编程相关推荐

热门问题

热门文章