Named Entity Recognition for British text

2 min readMar 27, 2022

Performing Named Entity Recognition (NER) to automatically detect important bits of information, such as personally identifiable information (PII) in text documents can be useful for several reasons:

To anonymize text
Help compliance with regulatory requirements such as GDPR
Label training data for Machine Learning projects

We’ve discussed several techniques to perform NER and automatically find useful words and phrases in text with this series of articles. We explored popular open source tools and techniques which found ways to identify names of people, email addresses, phone numbers, credit card numbers, names of places, etc.

Most of the efforts in the A.I. community to solve this problem for English text is targeting the American audience. Therefore if you try and use most of the tools available on British text data, the results are usually not as good as text with USA address format, phone numbers, names, etc.

Introducing London Analytics

We work in the UK so regularly deal with British text data. Out-the-box solutions weren’t accurate enough for us so we had to build our own NER models and APIs.

Which made us think that maybe those APIs might be useful for other UK folks?

Therefore we’re making available our beta API, London Analytics, for other people to try out. Just contact us for an API Key and give it a test drive.

Example With Python

This prints the following:

PHONE: 020 3477 9393
EMAIL: spy@ninja.com

Some of the types of data currently supported:

PHONE
EMAIL
IP_ADDRESS
CREDIT_CARD
STREET_ADDRESS
POSTCODE
PERSON_NAME
LOCATION_NAME
PRODUCT_NAME

And a lot more.

Conclusion

If you deal with British text data and want to try our API, then please get in touch and we’ll give you access. The API is in Beta so expect more improvements and features.

Named Entity Recognition for British text

Introducing London Analytics

Example With Python

Conclusion

Written by Abhinay Mehta

Responses (1)