'Parallelizing multiple long-running tasks with async/await

I have a helper method returns IEnumerable<string>. As the collection grows, it's slowing down dramatically. My current approach is to do essentially the following:

var results = new List<string>();
foreach (var item in items)
{
    results.Add(await item.Fetch());
}

I'm not actually sure whether this asynchronicity gives me any benefit (it sure doesn't seem like it), but all methods up the stack and to my controller's actions are asynchronous:

public async Task<IHttpActionResult> FetchAllItems()

As this code is ultimately used by my API, I'd really like to parallelize these all for what I hope would be great speedup. I've tried .AsParallel:

var results = items
    .AsParallel()
    .Select(i => i.Fetch().Result)
    .AsList();
return results;

And .WhenAll (returning a string[]):

var tasks = items.Select(i => i.Fetch());
return Task<string>.WhenAll<string>(tasks).Result;

And a last-ditch effort of firing off all long-running jobs and sequentially awaiting them (hoping that they were all running in parallel, so waiting on one would let all others nearly complete):

var tasks = new LinkedList<Task<string>>();
foreach (var item in items)
    tasks.AddLast(item.Fetch());

var results = new LinkedList<string>();
foreach (var task in tasks)
    results.AddLast(task.Result);

In every test case, the time it takes to run is directly proportional to the number of items. There's no discernable speedup by doing this. What am I missing in using Tasks and await/async?



Solution 1:[1]

There's a difference between parallel and concurrent. Concurrency just means doing more than one thing at a time, whereas parallel means doing more than one thing on multiple threads. async is great for concurrency, but doesn't (directly) help you with parallelism.

As a general rule, parallelism on ASP.NET should be avoided. This is because any parallel work you do (i.e., AsParallel, Parallel.ForEach, etc) shares the same thread pool as ASP.NET, so that reduces ASP.NET's capability to handle other requests. This impacts the scalability of your web service. It's best to leave the thread pool to ASP.NET.

However, concurrency is just fine - specifically, asynchronous concurrency. This is where Task.WhenAll comes in. Code like this is what you should be looking for (note that there is no call to Task<T>.Result):

var tasks = items.Select(i => i.Fetch());
return await Task<string>.WhenAll<string>(tasks);

Given your other code samples, it would be good to run through your call tree starting at Fetch and replace all Result calls with await. This may be (part of) your problem, because Result forces synchronous execution.

Another possible problem is that the underlying resource being fetched does not support concurrent access, or there may be throttling that you're not aware of. E.g., if Fetch retrieves data from another web service, check out System.Net.ServicePointManager.DefaultConnectionLimit.

Solution 2:[2]

There is also a configurable limitation on the max connections to a single server that can make download performance independent to the number of client threads.

To change the connection limit use ServicePointManager.DefaultConnectionLimit

Maximum concurrent requests for WebClient, HttpWebRequest, and HttpClient

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Stephen Cleary
Solution 2 Matt Rose