Named Entity Recognition for British text
Performing Named Entity Recognition (NER) to automatically detect important bits of information, such as personally identifiable information (PII) in text documents can be useful for several reasons:
- To anonymize text
- Help compliance with regulatory requirements such as GDPR
- Label training data for Machine Learning projects
We’ve discussed several techniques to perform NER and automatically find useful words and phrases in text with this series of articles. We explored popular open source tools and techniques which found ways to identify names of people, email addresses, phone numbers, credit card numbers, names of places, etc.
Most of the efforts in the A.I. community to solve this problem for English text is targeting the American audience. Therefore if you try and use most of the tools available on British text data, the results are usually not as good as text with USA address format, phone numbers, names, etc.
Introducing London Analytics
We work in the UK so regularly deal with British text data. Out-the-box solutions weren’t accurate enough for us so we had to build our own NER models and APIs.
Which made us think that maybe those APIs might be useful for other UK folks?
Therefore we’re making available our beta API, London Analytics, for other people to try out. Just contact us for an API Key and give it a test drive.
Example With Python
This prints the following:
PHONE: 020 3477 9393
EMAIL: spy@ninja.com
Some of the types of data currently supported:
PHONE
EMAIL
IP_ADDRESS
CREDIT_CARD
STREET_ADDRESS
POSTCODE
PERSON_NAME
LOCATION_NAME
PRODUCT_NAME
And a lot more.
Conclusion
If you deal with British text data and want to try our API, then please get in touch and we’ll give you access. The API is in Beta so expect more improvements and features.