'Delete large number of rows (e.g. ~500k rows) based on a certain criteria

I have a large number of rows and columns (e.g. 500k rows and 20 columns) all filled with numbers.

I'm trying to delete all data in column I that has a certain value (e.g. less than or equal to 8), but when I try to use autofilter to delete the values, it freezes up Excel and doesn't delete.

It works quickly for data in column A. I remade similar data in a new sheet to make sure all cells were filled, no columns/rows were hidden etc.

Why is it freezing up for column I?

Sub DeleteRow()

    Application.DisplayAlerts = False
    Application.Calculation = xlCalculationManual
    Application.ScreenUpdating = False

    Dim ws As Worksheet
    Dim rng As Range
    Dim lastRow As Long

    Set ws = ActiveWorkbook.Sheets("Sheet1")

    'filter and delete all but header row which is in row 3
    lastRow = ws.Range("I" & ws.Rows.count).End(xlUp).row
    MsgBox lastRow
    Set rng = ws.Range("I3:I" & lastRow)

    ' filter and delete all but header row
    With rng
         .AutoFilter Field:=1, Criteria1:="<=8"
         .SpecialCells(xlCellTypeVisible).EntireRow.Delete
    End With

    ' turn off the filters
    If ActiveSheet.FilterMode Then
        ActiveSheet.ShowAllData
    End If

    Application.DisplayAlerts = True
    Application.Calculation = xlCalculationAutomatic
    Application.ScreenUpdating = True
End Sub


Solution 1:[1]

There are a lot of posts on SO about deleting rows, some good, some not so good.

Two common ones are the Autofilter (which you are using) and building a range with Union (one of which David has linked you to).

For a data set of this size and this many deletions, you will find any method that uses references to Excel worksheet methods (such as AutoFilter, Find, Sort, Union, Formula's etc) slow. Some will be better than others, depending on the exact nature of your data.

There is another method that may work for you. That is to not actually Delete the rows, but to overwrite the data with a modified version.

Note that this only work if you DO NOT have any formulas (either on this sheet or any other) that refer to individual cells in the data being processed (whole column references should be OK, but YMMV)

I ran this code on a sample data set 500k rows, 20 columns of random numbers 1..32 (so about 25% or rows deleted)

This ran in ~10s

Sub DeleteRows2()
    Dim ws As Worksheet
    Dim rng As Range
    Dim i As Long, j As Long
    Dim NewI As Long
    Dim dat, NewDat
    
    Dim TestCol As Long
    Dim Threashold As Long
    Dim LastRow  As Long, LastCol As Long
    Dim t1 As Single, t2 As Single
    
    t1 = Timer()
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual
    
    TestCol = 9
    Threashold = 8
    
    Set ws = Sheet1
    With ws
        Set rng = .Range(.Cells(.Rows.Count, 1).End(xlUp), .Cells(1, .Columns.Count).End(xlToLeft))
    End With
    dat = rng.Value2
    ReDim NewDat(1 To UBound(dat, 1), 1 To UBound(dat, 2))
    
    LastRow = UBound(dat, 1)
    LastCol = UBound(dat, 2)
    
    NewI = 0
    For i = 1 To LastRow
        If dat(i, TestCol) > Threashold Then
            NewI = NewI + 1
            For j = 1 To LastCol
                NewDat(NewI, j) = dat(i, j)
            Next
        End If
    Next
    
    rng = NewDat
    
    Application.ScreenUpdating = True
    Application.Calculation = xlCalculationAutomatic
    
    t2 = Timer()
    MsgBox "deleted in " & t2 - t1 & "s"
End Sub

Solution 2:[2]

I would use Range.Resize in VBA. Deletes 200.000 rows in a few seconds. See example below:

With wb3.Sheets("Changed Confirmations")
Set HistoryTable = .ListObjects("ChangedConfirmations")

With HistoryTable
    .ShowAutoFilter = True

    .Range.AutoFilter Field:=1, Criteria1:="<=" & EndDate, Operator:=xlOr, Criteria2:="="
    
    On Error Resume Next
    Application.DisplayAlerts = False
        .Range.Resize(.Range.Rows.Count - 1).Offset(1).SpecialCells(xlCellTypeVisible).Delete
    Application.DisplayAlerts = True
    On Error goto 0

    .ShowAutoFilter = False

    End With
End With

Solution 3:[3]

first off, with 100ks records you'd better switch to some database oriented software

sticking to Excel, if you don't mind reordering records, this is quite fast:

Option Explicit

Sub DeleteRows()

    Application.DisplayAlerts = False
    Application.Calculation = xlCalculationManual
    Application.ScreenUpdating = False

    Dim rng As Range

    With ActiveWorkbook.Sheets("Sheet1")
        Set rng = .Range("I2", .Cells(.Rows.Count, "I").End(xlUp))
        With .Sort
            .SortFields.Clear
            .SortFields.Add Key:=rng(1), SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal

            .SetRange rng.CurrentRegion
            .Header = xlYes
            .MatchCase = False
            .Orientation = xlTopToBottom
            .SortMethod = xlPinYin
            .Apply
        End With

        With rng
            .AutoFilter Field:=1, Criteria1:="<=8"
            .Offset(1).Resize(.Rows.Count - 1).SpecialCells(xlCellTypeVisible).EntireRow.Delete
        End With
        .AutoFilterMode = False
    End With


    Application.DisplayAlerts = True
    Application.Calculation = xlCalculationAutomatic
    Application.ScreenUpdating = True
End Sub

if you mind records order it can be simply twicked to keep it

Solution 4:[4]

Trying putting the spreadsheet in csv format into Notepad++ - from there you should be able to remove empty lines and other multiple entries very quickly and easily = you will have to download and install notepad++ but it completely free

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 cigien
Solution 3 DisplayName
Solution 4 user17469156