'SAX Parser in VBA
I am working on a vba macro to parse large xml files. Initially, i made use of DOM parser but due to memory leak the system hangs and that could not solve my problem. Now, i turn to using SAX handler as shown in the code below but I am getting an error. Kindly hep me check and let me know where I am wrong
Class Module
Name: clsBook
Option Explicit
Private mID As Integer
Private mAuthour As String
Private mTitle As String
Private mGenre As String
Private mPrice As String
Private mPublishDate As String
Private mDescription As String
Public Static Property Get ID() As Integer
ID = mID
End Property
Public Static Property Let ID(ByVal vNewValue As Integer)
mID = vNewValue
End Property
Public Static Property Get Authour() As String
Authour = mAuthour
End Property
Public Static Property Let Authour(ByVal vNewValue As String)
mAuthour = vNewValue
End Property
Public Property Get Title() As String
Title = mTitle
End Property
Public Property Let Title(ByVal vNewValue As String)
mTitle = vNewValue
End Property
Public Property Get Genre() As String
Genre = mGenre
End Property
Public Property Let Genre(ByVal vNewValue As String)
bGenre = mNewValue
End Property
Public Property Get Price() As String
Price = mPrice
End Property
Public Property Let Price(ByVal vNewValue As String)
mPrice = vNewValue
End Property
Public Property Get Description() As String
Description = mDescription
End Property
Public Property Let Description(ByVal vNewValue As String)
mDescription = vNewValue
End Property
Public Property Get PublishedDate() As String
PublishedDate = mPublishDate
End Property
Public Property Let PublishedDate(ByVal vNewValue As String)
mPublishDate = vNewValue
End Property
Class Module: ContentHandlerImpl
Option Explicit
Implements IVBSAXContentHandler
Private lCounter As Long
Private sNodeValues As String
Private bAuthor As Boolean
Private bTitle As Boolean
Private bPrice As Boolean
Private bGenre As Boolean
Private bDescription As Boolean
Private bPublishDate As Boolean
Private mBook As clsBook
Private mBooks As Collection
Private Sub IVBSAXContentHandler_characters(strChars As String)
If (bAuthor) Then
mBook.Authour = strChars
bAuthor = False
ElseIf (bTitle) Then
mBook.Title = strChars
bTitle = False
ElseIf (bGenre) Then
mBook.Genre = strChars
bGenre = False
ElseIf (bPrice) Then
mBook.Price = strChars
bPrice = False
ElseIf (bPublishDate) Then
mBook.PublishedDate = strChars
bPublishDate = False
ElseIf (bDescription) Then
mBook.Description = strChars
bDescription = False
End If
End Sub
Private Property Set IVBSAXContentHandler_documentLocator(ByVal RHS As MSXML2.IVBSAXLocator)
End Property
Private Sub IVBSAXContentHandler_endDocument()
End Sub
Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)
Select Case strLocalName
Case "book"
If mBooks Is Nothing Then
Set mBooks = New Collection
End If
mBooks.Add (mBook)
If mBook Is Not Nothing Then
Set mBook = Nothing
End If
Case Else
' do nothing
End Select
End Sub
Private Sub IVBSAXContentHandler_endPrefixMapping(strPrefix As String)
End Sub
Private Sub IVBSAXContentHandler_ignorableWhitespace(strChars As String)
End Sub
Private Sub IVBSAXContentHandler_processingInstruction(strTarget As String, strData As String)
End Sub
Private Sub IVBSAXContentHandler_skippedEntity(strName As String)
End Sub
Private Sub IVBSAXContentHandler_startDocument()
End Sub
Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)
Select Case strLocalName
Case "book"
If mBook Is Nothing Then
Set mBook = New clsBook
End If
mBook.ID = CInt(oAttributes.getValueFromName("", "id"))
Case "author"
bAuthor = True
Case "title"
bTitle = True
Case "genre"
bGenre = True
Case "price"
bPrice = True
Case "publish_date"
bPublishDate = True
Case "description"
bDescription = True
Case Else
' do nothing
End Select
End Sub
Private Sub IVBSAXContentHandler_startPrefixMapping(strPrefix As String, strURI As String)
End Sub
Public Function getBooks() As Collection
getBooks = mBooks
End Function
Test Function
Sub main()
Dim saxReader As SAXXMLReader60
Dim saxhandler As ContentHandlerImpl
Dim iItems As Collection
Dim iItem As clsBook
Set saxReader = New SAXXMLReader60
Set saxhandler = New ContentHandlerImpl
Set saxReader.contentHandler = saxhandler
saxReader.Parse ThisWorkbook.Path & "\books.xml"
Set iItem = New clsBook
Set iItems = saxhandler.getBooks
For Each iItem In iItems
Debug.Print "ID: " & iItem.ID & vbCrLf & "Authour: " & iItem.Authour & vbCrLf & "Title: " & iItem.Title & vbCrLf
Next iItem
Set saxReader = Nothing
End Sub
'############Ps find below the point where I am getting the error
Sub main()
Set saxReader = New SAXXMLReader60
Set saxhandler = New ContentHandlerImpl
Set saxReader.contentHandler = saxhandler
saxReader.Parse ThisWorkbook.Path & "\books.xml"
Set saxReader = Nothing
End Sub
Please, can somebody let me know what i have done wrong or what i should do to resolve this issue.
Thanks
Solution 1:[1]
Error is System error: -2146828275 and the file is 9GB.
Thanks
Solution 2:[2]
Man, 9GB, that isn't large, that is huge! The sax parser should be able to handle it but even so it'll most likely be a bit of a stretch. The first thing I'd suggest is make everything a long, forget about using int.s, esp. in VBA. From memory an int in VB6/VBA is only 60k and change, it ain't gonna cut it. In fact even long in VB6 is max 2,147,483,647, and if you're doing a count internally for a 9GB file it'll probably pass this so just use double. Apart form this your machine's memory is going to be the limiting factor. It's been a while since you posted this - I'm curious as to how you went.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Samuel Owolabi |
Solution 2 | walbury |