'How to create an origin-destination matrix from two vectors (in R or Stata)

I have a dataset of commuting by individuals, like this:

id    Origin Destination
 1         1           3   
 2         1           4   
 3         2           1
 4         2           3 
 5         2           1
 6         3           4

where "id" are the individuals, "origin" and "destinations" are cities.

I want to construct an origin-destination matrix, like:

        City1 City2 City3 City4
  City1     0     0     1     1
  City2     2     0     1     0
  City3     0     0     0     1

etc

Someone know how to do this in R or Stata?

Additionally:

I may need to weight the flows by the population of the city of origin. Example:

id    Origin Destination OriginPop
 1         1           3         5
 2         1           4         5
 3         2           1        10
 4         2           3        10
 5         2           1        10
 6         3           4         7

where "OriginPop" is the total population of the city of origin.

An then I want a weighted origin-destination matrix, like:

        City1 City2 City3 City4
  City1     0     0   1/5   1/5
  City2  2/10     0  1/10     0
  City3     0     0     0   1/7

etc



Solution 1:[1]

I just wanted to add an answer to the second part of your question, and add a few things to what @Tensibai already suggested Here is your data:

cities<-data.frame(id=c(1:6),Origin=c(1,1,2,2,2,3),Destination=c(3,4,1,3,1,4),OriginPop=c(5,5,10,10,10,7))

If I were you I would convert the cities to factors, that way, even if a city doesn't appear on your Origin or Destination (such as your example where 2 doesn't appear in Destination and 4 doesn't show in Origin) it will still appear on your table, so you can just do this:

cities[,2:3]<-lapply(cities[,2:3],factor,levels=c(1:4))

Then as @Tensibai suggested you can simply do :

mytable<-table(cities$Origin,cities$Destination)
mytable
  1 2 3 4
1 0 0 1 1
2 2 0 1 0
3 0 0 0 1
4 0 0 0 0

As for the second part, I would create another data frame containing each unique city and its population:

citiespop<-data.frame(cities=unique(cities$Origin),pop=unique(cities$OriginPop))

And finally to get the results you want, you can just divide the rows of mytable by the population of that city:

mytable[citiespop$cities,]<-mytable[citiespop$cities,]/citiespop$pop
round(mytable,2) # round to make it more readable
    1    2    3    4
1 0.00 0.00 0.20 0.20
2 0.20 0.00 0.10 0.00
3 0.00 0.00 0.00 0.14
4 0.00 0.00 0.00 0.00

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Andrelrms