'Deadlock in book <The Go Programming Language>, how it would happen and why it happen?

There are several times in this book talk about deadlock regarding to incorrect usage of goroutine and channel, and I failed to grasp why the deadlock happen.

First thing I want to say is that, I know channel send&receive will block until there is items to be read or rooms to send items into, and maybe that's the deep reason of some deadlock. But please enlighten me with some explanation according to the following excerpt from the book:

Page 240

This code is to concurrently crawl urls, BFS style:

func main() {
    worklist := make(chan []string)

    // Start with the command-line arguments.
    go func() { worklist <- os.Args[1:] }()

    // Crawl the web concurrently.
    seen := make(map[string]bool)
    for list := range worklist {
        for _, link := range list {
            if !seen[link] {
                seen[link] = true
                go func(link string) {
                    worklist <- crawl(link)
                }(link)
            }
        }
    }
}

quoting book's second paragraph:

...the initial send of the command-line arguments to the worklist must run in its own goroutine to avoid deadlock, a stuck situation in which both the main goroutine and a crawler goroutine attempt to send to each other while neither is receiving...

suppose the initial send to worklist is not in its own goroutin, I imagine it works like this:

  1. main goroutine send initial to worklist, block until received
  2. the for range receive the initial item, so unblock the worklist channel
  3. the crawler goroutine send its items into worklist, loop...

So to my understanding, it won't block and deadlock. Where am I wrong?

UPDATE: @mkopriva helped me realized since step 1 is blocked, step 2,3 is unreachable. So I'm clear on this one.

Page 243

This deadlock situation might be the same as in page 240:

func main() {
    worklist := make(chan []string) // list of URLs, may have duplicates
    unseenLinks := make(chan string) // de-duplicated URLs

    // Add command-lin arguments to worklist.
    go func() { worklist <- os.Args[1:] }()

    // Create 20 crawler goroutines to fetch each unseen link.
    for i := 0; i < 20; i++ {
        go func() {
            for link := range unseenLinks {
                foundLinks := crawl(link)
                go func() { worklist <- foundLinks }()
            }
        }()
    }

    // The main goroutine de-duplicates worklist items
    // and sends the unseen ones to the crawlers.
    seen := make(map[string]bool)
    for list := range worklist {
        for _, link := range list {
            if !seen[link] {
                seen[link] = true
                unseenLinks <- link
            }
        }
    }
}

So if I put omit go inside the for-range loop, how the deadlock happen?



Solution 1:[1]

In the first snippet, the initial channel send needs to be done in a goroutine because without a goroutine the statement would block indefinitely and the execution would never reach the loop that receives from that channel. i.e. To get from 1. to 2., 1. needs to be done in a goroutine. If 1. blocks however, then 2. is never reached.

Where the comments start is where the execution stops:

func main() {
    worklist := make(chan []string)

    worklist <- os.Args[1:]
// 
//     seen := make(map[string]bool)
//     for list := range worklist {
//         for _, link := range list {
//             if !seen[link] {
//                 seen[link] = true
//                 go func(link string) {
//                     worklist <- crawl(link)
//                 }(link)
//             }
//         }
//     }
// }

In the second snippet you have a for-range loop over a channel, such a loop will NOT exit until the ranged-over channel is closed. That means that, while the body of such a loop may continue to get executed, the code after the loop-with-unclosed-channel will never be reached.

https://golang.org/ref/spec#For_range

  1. For channels, the iteration values produced are the successive values sent on the channel until the channel is closed. If the channel is nil, the range expression blocks forever.

Where the comments start is where the execution stops:

func main() {
    worklist := make(chan []string)
    unseenLinks := make(chan string)

    go func() { worklist <- os.Args[1:] }()

    for i := 0; i < 20; i++ {
        for link := range unseenLinks {
//            foundLinks := crawl(link)
//            go func() { worklist <- foundLinks }()
//        }
//     }
// 
//     seen := make(map[string]bool)
//     for list := range worklist {
//         for _, link := range list {
//             if !seen[link] {
//                 seen[link] = true
//                 unseenLinks <- link
//             }
//         }
//     }
// }

Solution 2:[2]

In the second snippet, I think the author is talking about the go routine in below:

go func() { worklist <- foundLinks }()

Links found by crawl are sent to the worklist from a dedicated goroutine to avoid deadlock.

Why must this go exist?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Yahya