'Trimming site names
I'm trying to trim any combination of a webname to just web.co.
Example
1.htt:pubhorn.co
2.www.pubhorn.co
3.htt://pubhorn.co
4.pubhorn.co/awawa
I want to output pubhorn.co, if the format is already like that move on to the next.
I found this code
Dim a As Variant
Dim i As Long
a = Range("A2").Resize(Cells(Rows.Count, 1).End(xlUp).Row - 1)
With CreateObject("VBScript.RegExp")
.Pattern = "([\A-Z\.])\w+[\.\w+]+\w+"
For i = 1 To UBound(a)
If a(i, 1) <> "" Then
a(i, 1) = .Execute(a(i, 1))(0)
If (Left(a(i, 1), 1)) = "." Then a(i, 1) = Mid(a(i, 1), 2, 255)
End If
Next
End With
Cells(2, 2).Resize(UBound(a)) = a
My initial run of the above code.
Test data
1. htp://www.PubHorn01.co/asasasasas
2. htp://PubHorn01.co/asasasasas
3. htps://www.PubHorn01.co.ar/asasas/asasasa
4. ww.PubHorn01.com.xx.yy/aw
5. ww.PubHorn01.tk
6. ww.PubHorn01.ifo
7. PubHorn01.ino
8. ww.john.blogspot.co
9. john.blogspot.co
10. htps://PubHorn01.co/asasasasas
11. htps://www.john.blogspot.c
12. htp://z--z.co/
13. www.man-whole.co
14. http://yongqing.is-programmer.co/ewr
15. man-whole.co
16. ww.man-world-wide.cm/werwer
17. mn-world-wide.co/werwerwer
18. http://www.yumpu.co
19. ww.yuniti.co
20. htp://www.yunjii.co
21. htp://www.yuppee.co/write-for-us/
22. htps://www.yuuby.co/
23. htps://www.yydocklandslocksmiths.co.uk/
24. htp://www.yyool.co
25. htp://zalivaet.spb.ru/
Output
1. PubHorn01.co
2. PubHorn01.co
3. PubHorn01.co.ar
4. PubHorn01.co.xx.yy
5. PubHorn01.tk
6. PubHorn01.ino
7. PubHorn01.ino
8. john.blogspot.co
9. blogspot.co <-- wrong
10. PubHorn01.co
11. john.blogspot.co
12. com <-- wrong
13. man <-- wrong
14. com <-- wrong
15. com <-- wrong
16. man <-- wrong
17. com <-- wrong
18. yumpu.co
19. yuniti.co
20. yunjii.co
21. yuppee.co
22. yuuby.co
23. yydocklandslocksmiths.co.uk
24. yyool.co
25. spb.ru
I want to understand the concept of how it works but the .Pattern
code is new to me.
The code hits a bug on the below entry.
9. john.blogspot.co
12. htt://z--z.co/
13. www.man-whole.co
14. htt://yongqing.is-programmer.co/ewr
15. man-whole.co
16. www.man-world-wide.co/werwer
17. man-world-wide.co/werwerwer
Output
9. blogspot.com
12. com
13. man
14. com
15. com
16. man
17. com
It seems the code deletes symbols like - signs march-april.com etc.
It also bugs out if the website has double periods within its name, john.blogspot.
This should be what I need to edit.
With CreateObject("VBScript.RegExp") <--first time to see this
.Pattern = "([\A-Z\.])\w+[\.\w+]+\w+" <- dont really understand
`this` but i have a hunch this is what i need to edit
For i = 1 To UBound(a) <- first time to see Ubound
If a(i, 1) <> "" Then <-- dont know
a(i, 1) = .Execute(a(i, 1))(0) <-- dont know
If (Left(a(i, 1), 1)) = "." Then a(i, 1) = Mid(a(i, 1), 2,
`255)`
End If
I understand some part of the code but I'm lost with this part.
Solution 1:[1]
I figured this regex could work to identify everything to be removed (htp:// and www. and subdomains):
(\w+:\/\/)|w+\.|(?<=\w)\/[\w-]*
but VBA doesn't support Lookbehind and I couldn't make it work with OR ( | ) statements either.
So I figured Replacing each part separately with an empty string ("") should do the trick:
Sub trim_site_names()
Dim a As Variant
Dim i As Long
' Remove everything before :// and :// itself
a = Range("A1").Resize(Cells(Rows.Count, 1).End(xlUp).Row - 1)
With CreateObject("VBScript.RegExp")
.Pattern = "\w+:\/\/"
For i = 1 To UBound(a)
If a(i, 1) <> "" Then
a(i, 1) = .Replace(a(i, 1), "")
If (Left(a(i, 1), 1)) = "." Then
a(i, 1) = Mid(a(i, 1), 2, 255)
End If
End If
Next
End With
Cells(1, 2).Resize(UBound(a)) = a
' Remove www. or ww. or w.
b = Range("B1").Resize(Cells(Rows.Count, 1).End(xlUp).Row - 1)
With CreateObject("VBScript.RegExp")
.Pattern = "^w+\."
For i = 1 To UBound(b)
If b(i, 1) <> "" Then
b(i, 1) = .Replace(b(i, 1), "")
If (Left(b(i, 1), 1)) = "." Then
b(i, 1) = Mid(b(i, 1), 2, 255)
End If
End If
Next
End With
Cells(1, 2).Resize(UBound(b)) = b
' Remove /subdomains
c = Range("B1").Resize(Cells(Rows.Count, 1).End(xlUp).Row - 1)
With CreateObject("VBScript.RegExp")
.Pattern = "\/.*"
For i = 1 To UBound(c)
If c(i, 1) <> "" Then
c(i, 1) = .Replace(c(i, 1), "")
If (Left(c(i, 1), 1)) = "." Then
c(i, 1) = Mid(c(i, 1), 2, 255)
End If
End If
Next
End With
Cells(1, 2).Resize(UBound(c)) = c
End Sub
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |