'I am parsing RoboCopy logs from Millions of files, How can I make my code run Faster?
New to StackOverflow, I'll do my best to post correctly :)
Hoping someone can help me to get my code running faster.
The code is run against RoboCopy Migration logs from a massive DFS server migration (20 DFS servers being migrated).
The code first captures the source/destination of the log in question and then looks for the 'Newer', 'Older', 'New File' and 'Extra File' entries/rows. It then checks to see if these files exist at each side, what attributes they have and does a DFSR hash check against both sides (as the files are now being replicated via DFSR).
The main concern is if the hashes match for source and destination and if the temporary attribute is in place.
The problem I am having is that there are millions of files logged under these types (the migration was gargantuan) so the script is taking forever to run. To add to this the client will not allow ports for psremoting/invoke-command.
At present I am running my code without multi-threading, with a copy on each of the DFS servers looking at their respective logs but it is still slow.
I have been looking at running a foreach parallel on looping through each log row (not the loop of log files) but:
- With so much data within each log/loop my understanding is that I have to write it out rather than keep it in an PsCustomObject? Otherwise I would run out of RAM?
- I don't really understand how to use MUTEXes to get multiple writes to the CSV.
Can someone please advise me on the above 2 points? And maybe give me some more ideas on what I can do to optimise things?
My full code is below..
#Get Start Time
$ReportStartTime = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
If(!(test-path "C:\Temp\MasterReport_$ReportStartTime\")){
new-item -type directory -path "C:\Temp\MasterReport_$ReportStartTime\" | Out-Null
}
"Script Started:$ReportStartTime" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
#Get Logs from folder (Recursive)
$Logs = Try{
Get-ChildItem -path 'C:\Temp\RoboCopyLogs\*\*.log' -Recurse -ErrorAction Stop | Select FullName
}
catch{
$_.Exception >> "C:\Temp\MasterReport_$ReportStartTime\Errors_$ReportStartTime.log"
}
#Initialise Log Counters
$NumberOfFiles = 0
$DesktopFile = 0
$ProcessedFiles = 0
$Totalsize = 0
#Count Logs
$Logs | foreach {
$SourceLog = $_
#Get Logfile
$Log = Get-Content $SourceLog.FullName
#Get Log rows for required Error Types and begin loop
$Log | Select-String -Pattern '(^\t+ +(Newer|Older|New File|Extra File))' `
|foreach {
$NumberOfFiles=$NumberOfFiles+1
If($_ | Select-String -pattern 'Desktop.ini' -SimpleMatch){
$DesktopFile=$DesktopFile+1
}
}
}
$Expected = $NumberOfFiles - $DesktopFile
"Total Files To Check = $NumberOfFiles" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Total Files Excluded = $DesktopFile" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Total Files To Ingest = $Expected" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
$Main = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
"Main Script:$Main" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
$Logs | foreach {
$SourceLog = $_
#Get Logfile
$Log = Get-Content $SourceLog.FullName
#Collect Source and Destination
$S = $Log | Select-String -Pattern 'Source :'
$D = $Log | Select-String -Pattern 'Dest :'
$SourceLocation = $S -replace '\s+Source : ',''
$DestLocation = $D -replace '\s+Dest : ',''
#Get Log rows for required Error Types and begin loop
$Log | Select-String -Pattern '(^\t+ +(Newer|Older|New File|Extra File))' | Select-String -pattern 'Desktop.ini' -SimpleMatch -NotMatch `
|foreach {
#This loop could be a foreach -parallel???
#Check Percent Completed
If($ProcessedFiles>0){
$PercentComplete=[Math]::Ceiling(($ProcessedFiles/$Expected)*100)
If($PercentComplete -match ('([0-9]0)')){
"$($PercentComplete)% Completed" > "C:\Temp\MasterReport_$ReportStartTime\PercentComplete.Log"
($ProcessedFiles/$Expected)*100
}
}
#Count Logs Processed
$ProcessedFiles=$ProcessedFiles+1
#Populate FilePath
$FilePath = $_ -Replace '.*(?=\\\\)', ''
#Populate Error type
$RoboErrorRaw = $_ -replace '\s+','|'
$RoboError = $RoboErrorRaw.split("|")[1]
#Check if file path relates to Source or the Destination and set path variables
if($FilePath -like "$SourceLocation*"){
$SourceFilePath = $FilePath
$DestFilePath = $FilePath.replace($SourceLocation,$DestLocation)
}
Elseif($FilePath -like "$DestLocation*"){
$DestFilePath = $FilePath
$SourceFilePath = $FilePath.replace($DestLocation,$SourceLocation)
$IsAtPartner = Test-Path $SourceFilePath
}
Else{
$DestFilepath = "Could Not Resolve UNC to Source or Destination"
}
#Check if file exists at source and destination
Try{
$IsAtPartner = Test-Path $DestFilePath -ErrorAction Stop
}
catch{
$IsAtPartner = $_.Exception
}
Try{
$IsAtSource = Test-path $SourceFilePath -ErrorAction Stop
}
catch{
$IsAtSource = $_.Exception
}
If($IsAtSource){
#Get the file details
Try{
$SourceFileDetails = Get-ChildItem $FilePath -Hidden -ErrorAction Stop
}
catch{
$SourceFileDetails = 'Failed'
}
if($SourceFileDetails -ne 'Failed'){
#Check has temp attribute
if((($SourceFileDetails).Attributes -band 0x100) -eq 0x100){
$TempAttribute = "Yes"
}
Else{
$TempAttribute = "No"
}
#Get attributes and last modified
Try{
$AllAttributes = ($SourceFileDetails).Attributes
}
catch{
$AllAttributes = $_.Exception
}
Try{
$Modified = ($SourceFileDetails).LastWriteTime.ToString()
}
catch{
$Modified = $_.Exception
}
}
}
#Check if .bak file
if($filePath -match '\.bak$'){
$Bakfile = "Yes"
}
Else{
$Bakfile = "No"
}
#Get Hashes
If($IsAtPartner -and $IsAtSource){
$HashSource = (Get-DfsrFileHash -Path $SourceFilepath).FileHash
$HashDest = (Get-DfsrFileHash -Path $DestFilepath).FileHash
}
ElseIf(!$IsAtSource -and !$IsAtPartner){
$HashSource = 'File Does not Exist at Source'
$HashDest = 'File Does not Exist At Partner'
}
ElseIf(!$IsAtPartner){
$HashSource = (Get-DfsrFileHash -Path $SourceFilepath).FileHash
$HashDest = 'File Does not Exist At Partner'
}
ElseIf(!$IsAtSource){
$HashSource = 'File Does not Exist at Source'
$HashDest = (Get-DfsrFileHash -Path $DestFilepath).FileHash
}
Else{
$HashSource = 'ERROR'
$HashDest = 'ERROR'
}
#Compare Valid Hashes
If($HashSource -eq $HashDest){
$HashMatch = 'Yes'
}
Else{
$HashMatch = 'No'
}
#Check Filesize where hashes do not match
If($HashMatch = 'No'){
$FileSizeMB = ($SourceFileDetails).length/1MB
}
#Create output object
$Obj = [PSCustomObject]@{
ErrorType = $RoboError
FilePath = $SourceFilePath
PartnerUNC = $DestFilePath
IsAtSource = $IsAtSource
IsAtDestination = $IsAtPartner
BakFile = $Bakfile
TepmpAttribute = $TempAttribute
LastModified = $Modified
AllAttributes = $AllAttributes
HashSource = $HashSource.FileHash
HashDest = $HashDest.FileHash
HashMatch = $HashMatch
RoboSource = $SourceLocation
RoboDest = $DestLocation
FileSizeMB = $FileSizeMB
SourceLog = $SourceLog.FullName
}
$Source = $SourceLocation.split('\\')[2]
$Destination = $DestLocation.split('\\')[2]
if(!(test-path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)")){
new-item -type directory -path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)" | Out-Null
}
#export to csv
$obj | Export-Csv -Path "C:\Temp\$($Source)-$($Destination)_$($ReportStartTime)\RoboCopyLogChecks_$ReportStartTime.csv" -NoTypeInformation -Append
$obj | Export-Csv -Path "C:\Temp\MasterReport_$ReportStartTime\RoboCopyLogChecks_$ReportStartTime.csv" -NoTypeInformation -Append
#Increment total size of data
If($HashMatch -eq "Yes"){
$Totalsize = $Totalsize + $SourceFileDetails.Length
}
clear-variable -name RoboError,SourceFilePath,DestFilePath,IsAtSource,IsAtPartner,Bakfile,TempAttribute,Modified,AllAttributes,HashSource,HashDest,HashMatch,FileSizeMB,Source,Destination
if($SourceFileDetails){
Remove-Variable -name SourceFileDetails
}
}
}
$Completion = (Get-Date).ToString('yyy-MM-dd_HH-mm-ss')
"Script Completed:$Completion Excluded Processed = $DesktopFile ,Total Processed = $ProcessedFiles" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
"Files without Matching Hashses amount to $($Totalsize/1GB)GB" >> "C:\Temp\MasterReport_$ReportStartTime\Log_$ReportStartTime.Log"
Here is some example log data (could be put in C:\Temp\RoboCopyLogs\Logs\ to run with above code)
-------------------------------------------------------------------------------
ROBOCOPY :: Robust File Copy for Windows
-------------------------------------------------------------------------------
Started : 24 April 2022 17:29:57
Source : \\Test01\
Dest : \\Test02\
Files : *.*
Exc Files : ~*.*
*.TMP
Exc Dirs : \\Test01\DfsrPrivate
Options : *.* /FFT /TS /L /S /E /DCOPY:DA /COPY:DAT /PURGE /MIR /B /NP /XJD /MT:8 /R:0 /W:0
------------------------------------------------------------------------------
Newer 30720 2021/07/20 14:49:36 \\Test01\Test2121.xls
Older 651776 2020/10/25 21:49:32 \\Test01\testppt.ppt
Older 94720 2019/06/10 11:46:03 \\Test01\Thumbs.db
*EXTRA File 1.7 m 2020/09/17 10:36:57 \\Test02\months.jpg
*EXTRA File 1.8 m 2020/09/17 10:36:57 \\Test02\happy.jpg
New File 6421 2020/10/26 10:32:43 \\Test01\26-10-20.pdf
New File 6321 2020/10/26 10:32:43 \\Test01\Testing20.pdf
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|