'Why should I avoid using the increase assignment operator (+=) to create a collection
The increase assignment operator (+=
) is often used in [PowerShell]
questions and answers at the StackOverflow site to construct a collection objects, e.g.:
$Collection = @()
1..$Size | ForEach-Object {
$Collection += [PSCustomObject]@{Index = $_; Name = "Name$_"}
}
Yet it appears an very inefficient operation.
Is it Ok to generally state that the increase assignment operator (+=
) should be avoided for building an object collection in PowerShell?
Solution 1:[1]
Yes, the increase assignment operator (+=
) should be avoided for building an object collection, see also: PowerShell scripting performance considerations.
Apart from the fact that using the +=
operator usually requires more statements (because of the array initialization = @()
) and it encourages to store the whole collection in memory rather then push it intermediately into the pipeline, it is inefficient.
The reason it is inefficient is because every time you use the +=
operator, it will just do:
$Collection = $Collection + $NewObject
Because arrays are immutable in terms of element count, the whole collection will be recreated with every iteration.
The correct PowerShell syntax is:
$Collection = 1..$Size | ForEach-Object {
[PSCustomObject]@{Index = $_; Name = "Name$_"}
}
Note: as with other cmdlets; if there is just one item (iteration), the output will be a scalar and not an array, to force it to an array, you might either us the [Array]
type: [Array]$Collection = 1..$Size | ForEach-Object { ... }
or use the Array subexpression operator @( )
: $Collection = @(1..$Size | ForEach-Object { ... })
Where it is recommended to not even store the results in a variable ($a = ...
) but immediately pass it into the pipeline to save memory, e.g.:
1..$Size | ForEach-Object {
[PSCustomObject]@{Index = $_; Name = "Name$_"}
} | ConvertTo-Csv .\Outfile.csv
Note: Using the System.Collections.ArrayList
class could also be considered, this is generally almost as fast as the PowerShell pipeline but the disadvantage is that it consumes a lot more memory than (properly) using the PowerShell pipeline.
see also: Fastest Way to get a uniquely index item from the property of an array and Array causing 'system.outofmemoryexception'
Performance measurement
To show the relation with the collection size and the decrease of performance you might check the following test results:
1..20 | ForEach-Object {
$size = 1000 * $_
$Performance = @{Size = $Size}
$Performance.Pipeline = (Measure-Command {
$Collection = 1..$Size | ForEach-Object {
[PSCustomObject]@{Index = $_; Name = "Name$_"}
}
}).Ticks
$Performance.Increase = (Measure-Command {
$Collection = @()
1..$Size | ForEach-Object {
$Collection += [PSCustomObject]@{Index = $_; Name = "Name$_"}
}
}).Ticks
[pscustomobject]$Performance
} | Format-Table *,@{n='Factor'; e={$_.Increase / $_.Pipeline}; f='0.00'} -AutoSize
Size Increase Pipeline Factor
---- -------- -------- ------
1000 1554066 780590 1.99
2000 4673757 1084784 4.31
3000 10419550 1381980 7.54
4000 14475594 1904888 7.60
5000 23334748 2752994 8.48
6000 39117141 4202091 9.31
7000 52893014 3683966 14.36
8000 64109493 6253385 10.25
9000 88694413 4604167 19.26
10000 104747469 5158362 20.31
11000 126997771 6232390 20.38
12000 148529243 6317454 23.51
13000 190501251 6929375 27.49
14000 209396947 9121921 22.96
15000 244751222 8598125 28.47
16000 286846454 8936873 32.10
17000 323833173 9278078 34.90
18000 376521440 12602889 29.88
19000 422228695 16610650 25.42
20000 475496288 11516165 41.29
Meaning that with a collection size of 20,000
objects using the +=
operator is about 40x
slower than using the PowerShell pipeline for this.
Steps to correct a script
Apparently some people struggle with correcting a script that already uses the increase assignment operator (+=
). Therefore, I have created a little instruction to do so:
- Remove all the
<variable> +=
assignments from the concerned iteration, just leave only the object item. By not assigning the object, the object will simply be put on the pipeline.
It doesn't matter if there are multiple increase assignments in the iteration or if there are embedded iterations or function, the end result will be the same.
Meaning, this:
ForEach ( ... ) {
$Array += $Object1
$Array += $Object2
ForEach ( ... ) {
$Array += $Object3
$Array += Get-Object
}
}
Is essentially the same as:
ForEach ( ... ) {
$Object1
$Object2
ForEach ( ... ) {
$Object3
Get-Object
}
}
Note: if there is no iteration, there is probably no reason to change your script as likely only concerns a few additions
- Assign the output of the iteration (everything that is put on the pipeline) to the concerned a variable. This is usually at the same level as where the array was initialized (
$Array = @()
). e.g.:
$Array = ForEach ( ... ) { ...
Note 1: Again, if you want single object to act as an array, you probably want to use the Array subexpression operator @( )
but you might also consider to do this at the moment you use the array, like: @($Array).Count
or ForEach ($Item in @($Array))
Note 2: Again, you're better off not assigning the output at all. Instead, pass the pipeline output directly to the next cmdlet to free up memory: ... | ForEach-Object {...} | Export-Csv .\File.csv
.
- Remove the array initialization
<Variable> = @()
For a full example, see: Comparing Arrays within Powershell
Note that the same applies to using +=
to build strings,
see: Is there a string concatenation shortcut in PowerShell?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |