Machine learning to decode doctors’ handwriting

Can we build a model that accurately predict what is written from a handwritten text containing medical terms?

Can we build a model that accurately predict what is written from a handwritten text containing medical terms?

Some official medical certificates contain handwritten notes by physicians. These notes are read and interpreted by people and subsequently entered into digital applications. Automating part of this increases efficiency, but care must be taken not to err.

Can we build a model that accurately predicts what is written from a handwritten text containing medical terms?

Business goal : Our goal is to speed up the full process of electronic registration of handwritten terms by building a model that accurately predicts handwritten medical terms. This partly automated process should support the manual labour.

Solution and workflow: The flow from scanned document to deciphered handwriting includes the following steps.

  • Anonymising the document, which contains sensitive personal information

  • Image processing to remove background, correct for scan artifacts, improve contrast, normalise pen strokes

  • Use Google Vision API to set a detection baseline

  • Define a training set from labeled data

  • Train a convolutional neural network

  • Test the model on over 200,000 labelled handwritings

  • Run it on new data

This is a typical machine-learning project, which required some data and algorithm exploration to come to the best results. Technologies involved are anonymising data, image processing, Google Vision API, deep learning, named-entity resolution, natural-language processing

Result: It turned out that the manual labelling of the data is insufficiently accurate to provide exact statistics, but what we do know is that over 60% of the handwritten words are exactly matched to their labels, and for over 20% of certificates all handwritten words are exactly matched to the labels. This then is a lower bound on the final statistics, because wrong labels result in a rejection instead of an acceptance.

For each word and certificate, a confidence level for the accuracy is provided to support the person who is responsible for registering the words.

Decoding handwriting