Our approach involved the creation of several competing algorithms – Random Forest, Support Vector Machine and Stochastic Gradient Descent Classifier. The algorithms were then heavily adjusted in our lab to extract the best results from the data.
A good example of such an adjustment is our use of Natural Language Processing with the Occupation of the insured person. We analysed how often humans used occupations in a similar sentence such as “…so and so became ill and went to the <vet> >doctor> <nurse>… ”, which then allowed us to group occupations into clusters by linguistic similarity.
This in turn allowed us to group ‘Chauffeur’ and ‘Driver’ into the same occupation cluster, therefore increasing our learning for both occupations.
We managed to scale down several thousand occupations to just 100 clusters, an approach which revealed significant results in terms of improving accuracy.