'Delete large number of rows (e.g. ~500k rows) based on a certain criteria
I have a large number of rows and columns (e.g. 500k rows and 20 columns) all filled with numbers.
I'm trying to delete all data in column I that has a certain value (e.g. less than or equal to 8), but when I try to use autofilter to delete the values, it freezes up Excel and doesn't delete.
It works quickly for data in column A. I remade similar data in a new sheet to make sure all cells were filled, no columns/rows were hidden etc.
Why is it freezing up for column I?
Sub DeleteRow()
Application.DisplayAlerts = False
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
Dim ws As Worksheet
Dim rng As Range
Dim lastRow As Long
Set ws = ActiveWorkbook.Sheets("Sheet1")
'filter and delete all but header row which is in row 3
lastRow = ws.Range("I" & ws.Rows.count).End(xlUp).row
MsgBox lastRow
Set rng = ws.Range("I3:I" & lastRow)
' filter and delete all but header row
With rng
.AutoFilter Field:=1, Criteria1:="<=8"
.SpecialCells(xlCellTypeVisible).EntireRow.Delete
End With
' turn off the filters
If ActiveSheet.FilterMode Then
ActiveSheet.ShowAllData
End If
Application.DisplayAlerts = True
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
End Sub
Solution 1:[1]
There are a lot of posts on SO about deleting rows, some good, some not so good.
Two common ones are the Autofilter (which you are using) and building a range with Union (one of which David has linked you to).
For a data set of this size and this many deletions, you will find any method that uses references to Excel worksheet methods (such as AutoFilter, Find, Sort, Union, Formula's etc) slow. Some will be better than others, depending on the exact nature of your data.
There is another method that may work for you. That is to not actually Delete the rows, but to overwrite the data with a modified version.
Note that this only work if you DO NOT have any formulas (either on this sheet or any other) that refer to individual cells in the data being processed (whole column references should be OK, but YMMV)
I ran this code on a sample data set 500k rows, 20 columns of random numbers 1..32 (so about 25% or rows deleted)
This ran in ~10s
Sub DeleteRows2()
Dim ws As Worksheet
Dim rng As Range
Dim i As Long, j As Long
Dim NewI As Long
Dim dat, NewDat
Dim TestCol As Long
Dim Threashold As Long
Dim LastRow As Long, LastCol As Long
Dim t1 As Single, t2 As Single
t1 = Timer()
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
TestCol = 9
Threashold = 8
Set ws = Sheet1
With ws
Set rng = .Range(.Cells(.Rows.Count, 1).End(xlUp), .Cells(1, .Columns.Count).End(xlToLeft))
End With
dat = rng.Value2
ReDim NewDat(1 To UBound(dat, 1), 1 To UBound(dat, 2))
LastRow = UBound(dat, 1)
LastCol = UBound(dat, 2)
NewI = 0
For i = 1 To LastRow
If dat(i, TestCol) > Threashold Then
NewI = NewI + 1
For j = 1 To LastCol
NewDat(NewI, j) = dat(i, j)
Next
End If
Next
rng = NewDat
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
t2 = Timer()
MsgBox "deleted in " & t2 - t1 & "s"
End Sub
Solution 2:[2]
I would use Range.Resize in VBA. Deletes 200.000 rows in a few seconds. See example below:
With wb3.Sheets("Changed Confirmations")
Set HistoryTable = .ListObjects("ChangedConfirmations")
With HistoryTable
.ShowAutoFilter = True
.Range.AutoFilter Field:=1, Criteria1:="<=" & EndDate, Operator:=xlOr, Criteria2:="="
On Error Resume Next
Application.DisplayAlerts = False
.Range.Resize(.Range.Rows.Count - 1).Offset(1).SpecialCells(xlCellTypeVisible).Delete
Application.DisplayAlerts = True
On Error goto 0
.ShowAutoFilter = False
End With
End With
Solution 3:[3]
first off, with 100ks records you'd better switch to some database oriented software
sticking to Excel, if you don't mind reordering records, this is quite fast:
Option Explicit
Sub DeleteRows()
Application.DisplayAlerts = False
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
Dim rng As Range
With ActiveWorkbook.Sheets("Sheet1")
Set rng = .Range("I2", .Cells(.Rows.Count, "I").End(xlUp))
With .Sort
.SortFields.Clear
.SortFields.Add Key:=rng(1), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
.SetRange rng.CurrentRegion
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With
With rng
.AutoFilter Field:=1, Criteria1:="<=8"
.Offset(1).Resize(.Rows.Count - 1).SpecialCells(xlCellTypeVisible).EntireRow.Delete
End With
.AutoFilterMode = False
End With
Application.DisplayAlerts = True
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
End Sub
if you mind records order it can be simply twicked to keep it
Solution 4:[4]
Trying putting the spreadsheet in csv format into Notepad++ - from there you should be able to remove empty lines and other multiple entries very quickly and easily = you will have to download and install notepad++ but it completely free
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | cigien |
Solution 3 | DisplayName |
Solution 4 | user17469156 |