'How can I use BERT for address matching problem?

I am building an address matching algorithm. The main problem is that previous models like Conditional Random fields (CRF)from Paserator and Averaged Perceptron from Libpostal do not match address entities correctly.

I am using the free sample from AddressBases premium https://www.ordnancesurvey.co.uk/business-government/products/addressbase-premium.

I want when I parse an address to the algorithm like this:

bert.parser('FLAT ABC 7-9 TEDWORTH SQUARE LONDON SW3 4DU')

it will return the parsed tokens with high precision

             ('BuildingName', '7-9'),
             ('StreetName', 'TEDWORTH SQUARE'),
             ('TownName', 'LONDON'),
             ('Postcode', 'SW3 4DU')])

I have reviewed AddressNet, Usaddress, Deepmatcher and chinese address with (BERT) https://huggingface.co/cola/chinese-address-ner

I am looking for something in English with BERT (RNN,lSTM) for this problem.



Solution 1:[1]

NER is one of the options to match the address, but you have to prepare the dataset to train the Bert model, like BuildingName, StreetName, TownName and PostCode. The Bert base model doesn't have the knowledge to recognise it. You have to feed it by data.

FYI. https://medium.com/analytics-vidhya/creating-own-name-entity-recognition-using-bert-and-spacy-tourism-data-set-c5ee1c2955a2

Besides, address is an interesting topic, when people search the address, they may use their knowledge or free text to search, or even typo. semantic search with cosine similarity could help.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 pakwai122