'Graph learning in R, igraph, tidygraph

I have a graph with each node having a value (value in red).

enter image description here

I would like to do the following two things (I guess 1 is a special case of 2):

  1. Each node should be assigned the mean of the value of the direct peers directing to it. For example node #5 (1+2)/2=1.5 or node #3 (0+2+0)/3=2/3.

  2. Instead of direct neighbors, include all connected nodes but with a diffusion of times 1/n with n being the distance to the node. The further away the information is coming from the weaker signal we'd have.

I looked into functions of igraph, but could not find anything that is doing this (I might have overseen though). How could I do this computation?

Below is the code for a sample network with random values.

library(tidyverse)
library(tidygraph)
library(ggraph)

set.seed(6)
q <- tidygraph::play_erdos_renyi(6, p = 0.2) %>% 
  mutate(id = row_number(),
         value = sample(0:3, size = 6, replace = T))
q %>% 
  ggraph(layout = "with_fr") +
  geom_edge_link(arrow = arrow(length = unit(0.2, "inches"), 
                               type = "closed")) +
  geom_node_label(aes(label = id)) +
  geom_node_text(aes(label = value), color = "red", size = 7, 
                 nudge_x = 0.2, nudge_y = 0.2)

Edit, found a solution to 1

q %>% 
  mutate(value_smooth = map_local_dbl(order = 1, mindist = 1, mode = "in", 
                                      .f = function(neighborhood, ...) {
    mean(as_tibble(neighborhood, active = 'nodes')$value)
  }))

Edit 2, solution to 2, not the most elegant I guess

q %>% 
  mutate(value_smooth = map_local_dbl(order = 1, mindist = 0, mode = "in", 
                                      .f = function(neighborhood, node, ...) {
    ne <- neighborhood
    
    ne <- ne %>%
      mutate(d = node_distance_to(which(as_tibble(ne, 
                                                  active = "nodes")$id == node)))
    
    as_tibble(ne, active = 'nodes') %>% 
      filter(d != 0) %>% 
      mutate(helper = value/d) %>% 
      summarise(m = mean(value)) %>% 
      pull(m)
    }))

Edit 3, a faster alternative to map_local_dbl

map_local loops through all nodes of the graph. For large graphs, this takes very long. For just computing the means, this is not needed. A much faster alternative is to use the adjacency matrix and some matrix multiplication.

q_adj <- q %>% 
  igraph::as_adjacency_matrix()

# out
(q_adj %*% as_tibble(q)$value) / Matrix::rowSums(q_adj)

# in
(t(q_adj) %*% as_tibble(q)$value) / Matrix::colSums(q_adj)

The square of the adjacency matrix is the second order adjacency matrix, and so forth. So a solution to problem 2 could also be created.



Solution 1:[1]

Probably you can try the code below

q %>%
    set_vertex_attr(
        name = "value",
        value = sapply(
            ego(., mode = "in", mindist = 1),
            function(x) mean(x$value)
        )
    )

which gives

# A tbl_graph: 6 nodes and 7 edges
#
# A directed simple graph with 1 component
#
# Node Data: 6 x 2 (active)
     id   value
  <int>   <dbl>
1     1   0.5
2     2 NaN
3     3   0.667
4     4 NaN
5     5   1.5
6     6 NaN
#
# Edge Data: 7 x 2
   from    to
  <int> <int>
1     3     1
2     6     1
3     1     3
# ... with 4 more rows

Solution 2:[2]

Each node should be assigned the mean of the value of the direct peers directing to it.

Guessing that you really mean

Each node should be assigned the mean of the values of the direct peers directing to it, before any node values were changed

This seems trivial - maybe I am missing something?

Loop over nodes
    Sum values of adjacent nodes
    Calculate mean and store in vector by node index
Loop over nodes
    Set node value to mean stored in previous loop

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ThomasIsCoding
Solution 2 ravenspoint