'I am parsing RoboCopy logs from Millions of files, How can I make my code run Faster?

New to StackOverflow, I'll do my best to post correctly :)

Hoping someone can help me to get my code running faster.

The code is run against RoboCopy Migration logs from a massive DFS server migration (20 DFS servers being migrated).

The code first captures the source/destination of the log in question and then looks for the 'Newer', 'Older', 'New File' and 'Extra File' entries/rows. It then checks to see if these files exist at each side, what attributes they have and does a DFSR hash check against both sides (as the files are now being replicated via DFSR).

The main concern is if the hashes match for source and destination and if the temporary attribute is in place.

The problem I am having is that there are millions of files logged under these types (the migration was gargantuan) so the script is taking forever to run. To add to this the client will not allow ports for psremoting/invoke-command.

At present I am running my code without multi-threading, with a copy on each of the DFS servers looking at their respective logs but it is still slow.

I have been looking at running a foreach parallel on looping through each log row (not the loop of log files) but:

  1. With so much data within each log/loop my understanding is that I have to write it out rather than keep it in an PsCustomObject? Otherwise I would run out of RAM?
  2. I don't really understand how to use MUTEXes to get multiple writes to the CSV.

Can someone please advise me on the above 2 points? And maybe give me some more ideas on what I can do to optimise things?

My full code is below..

#Get Start Time
$ReportStartTime = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')

If(!(test-path "C:\Temp\MasterReport_$ReportStartTime\")){
                    new-item -type directory -path "C:\Temp\MasterReport_$ReportStartTime\" | Out-Null
                    }

"Script Started:$ReportStartTime" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"

#Get Logs from folder (Recursive)
$Logs = Try{
            Get-ChildItem -path 'C:\Temp\RoboCopyLogs\*\*.log' -Recurse -ErrorAction Stop | Select FullName 
            }
            catch{
                $_.Exception >> "C:\Temp\MasterReport_$ReportStartTime\Errors_$ReportStartTime.log"
                }
            
#Initialise Log Counters
$NumberOfFiles = 0
$DesktopFile = 0
$ProcessedFiles = 0
$Totalsize = 0

#Count Logs
$Logs | foreach {

        $SourceLog = $_
        #Get Logfile
        $Log = Get-Content $SourceLog.FullName


        #Get Log rows for required Error Types and begin loop
         $Log | Select-String -Pattern '(^\t+ +(Newer|Older|New File|Extra File))' `
        |foreach {    
                $NumberOfFiles=$NumberOfFiles+1

                If($_ | Select-String -pattern 'Desktop.ini' -SimpleMatch){
                    $DesktopFile=$DesktopFile+1  
                }           
        }
}

$Expected = $NumberOfFiles - $DesktopFile

"Total Files To Check  = $NumberOfFiles" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Total Files Excluded  = $DesktopFile" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Total Files To Ingest = $Expected" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"

$Main = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
"Main Script:$Main" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"


$Logs | foreach {

        $SourceLog = $_
        #Get Logfile
        $Log = Get-Content $SourceLog.FullName

        #Collect Source and Destination
        $S = $Log | Select-String -Pattern 'Source :'
        $D = $Log | Select-String -Pattern 'Dest :' 

        $SourceLocation = $S -replace '\s+Source : ',''
        $DestLocation = $D -replace '\s+Dest : ',''


        #Get Log rows for required Error Types and begin loop
        $Log | Select-String -Pattern '(^\t+ +(Newer|Older|New File|Extra File))' | Select-String -pattern 'Desktop.ini' -SimpleMatch -NotMatch `
        |foreach {
#This loop could be a foreach -parallel???

                #Check Percent Completed                
                If($ProcessedFiles>0){
                $PercentComplete=[Math]::Ceiling(($ProcessedFiles/$Expected)*100)
               
                    If($PercentComplete -match ('([0-9]0)')){
                        "$($PercentComplete)% Completed" > "C:\Temp\MasterReport_$ReportStartTime\PercentComplete.Log" 
                        ($ProcessedFiles/$Expected)*100               
                    }
                }

                #Count Logs Processed
                $ProcessedFiles=$ProcessedFiles+1  
                
                #Populate FilePath
                $FilePath = $_ -Replace '.*(?=\\\\)', ''

                #Populate Error type
                $RoboErrorRaw = $_ -replace '\s+','|'
                $RoboError = $RoboErrorRaw.split("|")[1]

                #Check if file path relates to Source or the Destination and set path variables
                if($FilePath -like "$SourceLocation*"){

                    $SourceFilePath = $FilePath
                    $DestFilePath = $FilePath.replace($SourceLocation,$DestLocation)
                    
                    }
                    Elseif($FilePath -like "$DestLocation*"){
                        $DestFilePath = $FilePath
                        $SourceFilePath = $FilePath.replace($DestLocation,$SourceLocation)
                        $IsAtPartner = Test-Path $SourceFilePath
                        }
                        Else{
                            $DestFilepath = "Could Not Resolve UNC to Source or Destination"
                        }
                   
                #Check if file exists at source and destination
                Try{
                    $IsAtPartner = Test-Path $DestFilePath -ErrorAction Stop
                    }
                        catch{
                        $IsAtPartner = $_.Exception
                        }

                Try{
                    $IsAtSource = Test-path $SourceFilePath -ErrorAction Stop
                    }
                        catch{
                        $IsAtSource = $_.Exception
                        }
                    
                
                If($IsAtSource){   
                        #Get the file details
                        Try{
                        $SourceFileDetails = Get-ChildItem $FilePath -Hidden -ErrorAction Stop
                        }
                        catch{ 
                            $SourceFileDetails = 'Failed'   
                            }

                        if($SourceFileDetails -ne 'Failed'){
                            #Check has temp attribute
                            if((($SourceFileDetails).Attributes -band 0x100) -eq 0x100){
                                $TempAttribute = "Yes"
                                }
                                Else{
                                    $TempAttribute = "No"
                                    } 
                                #Get attributes and last modified
                                Try{
                                        $AllAttributes = ($SourceFileDetails).Attributes
                                    }
                                    catch{
                                        $AllAttributes = $_.Exception
                                    }
                    
                                Try{
                                    $Modified = ($SourceFileDetails).LastWriteTime.ToString()  
                                    }
                                    catch{
                                        $Modified = $_.Exception
                                    } 
                        }
                     }
  
                #Check if .bak file
                if($filePath -match '\.bak$'){
                $Bakfile = "Yes"  
                }
                    Else{
                        $Bakfile = "No"
                    }
       
                #Get Hashes
                If($IsAtPartner -and $IsAtSource){
                       $HashSource = (Get-DfsrFileHash -Path $SourceFilepath).FileHash
                       $HashDest = (Get-DfsrFileHash -Path $DestFilepath).FileHash
                    }
                      ElseIf(!$IsAtSource -and !$IsAtPartner){
                                $HashSource = 'File Does not Exist at Source'
                                $HashDest = 'File Does not Exist At Partner'
                                }  
                                ElseIf(!$IsAtPartner){
                                        $HashSource = (Get-DfsrFileHash -Path $SourceFilepath).FileHash
                                        $HashDest = 'File Does not Exist At Partner'
                                    }
                                    ElseIf(!$IsAtSource){
                                            $HashSource = 'File Does not Exist at Source'
                                            $HashDest = (Get-DfsrFileHash -Path $DestFilepath).FileHash
                                            }
                                            Else{
                                                    $HashSource = 'ERROR'
                                                    $HashDest = 'ERROR'
                                                }

                #Compare Valid Hashes
                If($HashSource -eq $HashDest){
                    $HashMatch = 'Yes'                    
                    }
                    Else{
                        $HashMatch = 'No'
                    }
                       
                #Check Filesize where hashes do not match
                If($HashMatch = 'No'){
                $FileSizeMB = ($SourceFileDetails).length/1MB
                }

                #Create output object
                $Obj = [PSCustomObject]@{
                                        ErrorType = $RoboError
                                        FilePath = $SourceFilePath
                                        PartnerUNC = $DestFilePath
                                        IsAtSource = $IsAtSource
                                        IsAtDestination = $IsAtPartner
                                        BakFile = $Bakfile 
                                        TepmpAttribute = $TempAttribute
                                        LastModified = $Modified
                                        AllAttributes = $AllAttributes                                                
                                        HashSource = $HashSource.FileHash
                                        HashDest = $HashDest.FileHash
                                        HashMatch = $HashMatch                                        
                                        RoboSource = $SourceLocation
                                        RoboDest = $DestLocation
                                        FileSizeMB = $FileSizeMB
                                        SourceLog = $SourceLog.FullName
                                        } 

                $Source = $SourceLocation.split('\\')[2]
                $Destination = $DestLocation.split('\\')[2]

                if(!(test-path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)")){
                    new-item -type directory -path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)"  | Out-Null
                    }              

                #export to csv
                $obj |  Export-Csv -Path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)\RoboCopyLogChecks_$ReportStartTime.csv" -NoTypeInformation -Append
                $obj |  Export-Csv -Path "C:\Temp\MasterReport_$ReportStartTime\RoboCopyLogChecks_$ReportStartTime.csv" -NoTypeInformation -Append
                
                #Increment total size of data
                If($HashMatch -eq "Yes"){
                    $Totalsize = $Totalsize + $SourceFileDetails.Length
                }
                
                clear-variable -name RoboError,SourceFilePath,DestFilePath,IsAtSource,IsAtPartner,Bakfile,TempAttribute,Modified,AllAttributes,HashSource,HashDest,HashMatch,FileSizeMB,Source,Destination
                
                if($SourceFileDetails){
                    Remove-Variable -name SourceFileDetails
                }                
            }
}
$Completion = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
"Script Completed:$Completion Excluded Processed = $DesktopFile ,Total Processed = $ProcessedFiles" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Files without Matching Hashses amount to $($Totalsize/1GB)GB" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"

Here is some example log data (could be put in C:\Temp\RoboCopyLogs\Logs\ to run with above code)

-------------------------------------------------------------------------------
   ROBOCOPY     ::     Robust File Copy for Windows                              
-------------------------------------------------------------------------------

  Started : 24 April 2022 17:29:57
   Source : \\Test01\
     Dest : \\Test02\

    Files : *.*
        
Exc Files : ~*.*
        *.TMP
        
 Exc Dirs : \\Test01\DfsrPrivate
        
  Options : *.* /FFT /TS /L /S /E /DCOPY:DA /COPY:DAT /PURGE /MIR /B /NP /XJD /MT:8 /R:0 /W:0 

------------------------------------------------------------------------------

        Newer              30720 2021/07/20 14:49:36    \\Test01\Test2121.xls
        Older             651776 2020/10/25 21:49:32    \\Test01\testppt.ppt
        Older              94720 2019/06/10 11:46:03    \\Test01\Thumbs.db
      *EXTRA File          1.7 m 2020/09/17 10:36:57    \\Test02\months.jpg
      *EXTRA File          1.8 m 2020/09/17 10:36:57    \\Test02\happy.jpg
        New File            6421 2020/10/26 10:32:43    \\Test01\26-10-20.pdf
        New File            6321 2020/10/26 10:32:43    \\Test01\Testing20.pdf


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source