'Trimming site names

I'm trying to trim any combination of a webname to just web.co.

Example

1.htt:pubhorn.co
2.www.pubhorn.co
3.htt://pubhorn.co
4.pubhorn.co/awawa

I want to output pubhorn.co, if the format is already like that move on to the next.

I found this code

Dim a As Variant
Dim i As Long
a = Range("A2").Resize(Cells(Rows.Count, 1).End(xlUp).Row - 1)
With CreateObject("VBScript.RegExp")
    .Pattern = "([\A-Z\.])\w+[\.\w+]+\w+"
    For i = 1 To UBound(a)
        If a(i, 1) <> "" Then
            a(i, 1) = .Execute(a(i, 1))(0)
            If (Left(a(i, 1), 1)) = "." Then a(i, 1) = Mid(a(i, 1), 2, 255)
        End If
    Next
End With
Cells(2, 2).Resize(UBound(a)) = a

My initial run of the above code.

Test data

1.  htp://www.PubHorn01.co/asasasasas
2.  htp://PubHorn01.co/asasasasas
3.  htps://www.PubHorn01.co.ar/asasas/asasasa
4.  ww.PubHorn01.com.xx.yy/aw
5.  ww.PubHorn01.tk
6.  ww.PubHorn01.ifo
7.  PubHorn01.ino
8.  ww.john.blogspot.co
9.  john.blogspot.co
10. htps://PubHorn01.co/asasasasas
11. htps://www.john.blogspot.c
12. htp://z--z.co/
13. www.man-whole.co
14. http://yongqing.is-programmer.co/ewr
15. man-whole.co
16. ww.man-world-wide.cm/werwer
17. mn-world-wide.co/werwerwer
18. http://www.yumpu.co
19. ww.yuniti.co
20. htp://www.yunjii.co
21. htp://www.yuppee.co/write-for-us/
22. htps://www.yuuby.co/
23. htps://www.yydocklandslocksmiths.co.uk/
24. htp://www.yyool.co
25. htp://zalivaet.spb.ru/

Output

1.  PubHorn01.co
2.  PubHorn01.co
3.  PubHorn01.co.ar
4.  PubHorn01.co.xx.yy
5.  PubHorn01.tk
6.  PubHorn01.ino
7.  PubHorn01.ino
8.  john.blogspot.co
9.  blogspot.co <-- wrong
10. PubHorn01.co
11. john.blogspot.co
12. com <-- wrong
13. man <-- wrong
14. com <-- wrong
15. com <-- wrong
16. man <-- wrong
17. com <-- wrong
18. yumpu.co
19. yuniti.co
20. yunjii.co
21. yuppee.co
22. yuuby.co
23. yydocklandslocksmiths.co.uk
24. yyool.co
25. spb.ru

I want to understand the concept of how it works but the .Pattern code is new to me.

The code hits a bug on the below entry.

9.  john.blogspot.co
12. htt://z--z.co/
13. www.man-whole.co
14. htt://yongqing.is-programmer.co/ewr
15. man-whole.co
16. www.man-world-wide.co/werwer
17. man-world-wide.co/werwerwer

Output

9.  blogspot.com
12. com
13. man
14. com
15. com
16. man
17. com

It seems the code deletes symbols like - signs march-april.com etc.
It also bugs out if the website has double periods within its name, john.blogspot.

This should be what I need to edit.

With CreateObject("VBScript.RegExp") <--first time to see this
        .Pattern = "([\A-Z\.])\w+[\.\w+]+\w+" <- dont really understand 
      `this` but i have a hunch this is what i need to edit
        For i = 1 To UBound(a) <- first time to see Ubound
        If a(i, 1) <> "" Then <-- dont know
            a(i, 1) = .Execute(a(i, 1))(0) <-- dont know
            If (Left(a(i, 1), 1)) = "." Then a(i, 1) = Mid(a(i, 1), 2, 
           `255)`
            End If

I understand some part of the code but I'm lost with this part.



Solution 1:[1]

I figured this regex could work to identify everything to be removed (htp:// and www. and subdomains):

(\w+:\/\/)|w+\.|(?<=\w)\/[\w-]*

but VBA doesn't support Lookbehind and I couldn't make it work with OR ( | ) statements either.

So I figured Replacing each part separately with an empty string ("") should do the trick:

Sub trim_site_names()
    Dim a As Variant
    Dim i As Long
    
    ' Remove everything before :// and :// itself
    a = Range("A1").Resize(Cells(Rows.Count, 1).End(xlUp).Row - 1)
    With CreateObject("VBScript.RegExp")
        .Pattern = "\w+:\/\/"
        For i = 1 To UBound(a)
            If a(i, 1) <> "" Then
                a(i, 1) = .Replace(a(i, 1), "")
                If (Left(a(i, 1), 1)) = "." Then
                    a(i, 1) = Mid(a(i, 1), 2, 255)
                End If
            End If
        Next
    End With
    Cells(1, 2).Resize(UBound(a)) = a
    
    ' Remove www. or ww. or w.
    b = Range("B1").Resize(Cells(Rows.Count, 1).End(xlUp).Row - 1)
    With CreateObject("VBScript.RegExp")
        .Pattern = "^w+\."
        For i = 1 To UBound(b)
            If b(i, 1) <> "" Then
                b(i, 1) = .Replace(b(i, 1), "")
                If (Left(b(i, 1), 1)) = "." Then
                    b(i, 1) = Mid(b(i, 1), 2, 255)
                End If
            End If
        Next
    End With
    Cells(1, 2).Resize(UBound(b)) = b
    
    ' Remove /subdomains
    c = Range("B1").Resize(Cells(Rows.Count, 1).End(xlUp).Row - 1)
    With CreateObject("VBScript.RegExp")
        .Pattern = "\/.*"
        For i = 1 To UBound(c)
            If c(i, 1) <> "" Then
                c(i, 1) = .Replace(c(i, 1), "")
                If (Left(c(i, 1), 1)) = "." Then
                    c(i, 1) = Mid(c(i, 1), 2, 255)
                End If
            End If
        Next
    End With
    Cells(1, 2).Resize(UBound(c)) = c
    
    
End Sub

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1