You don’t need a gun to rob a bank any more. Now a bank robber’s most powerful weapon is a computer and an internet connection.
Fraud detection in the banking and financial space is a critical activity that can span a series of fraud schemes and fraudulent activity from bank employees and customers alike.
As before the data needed for building such algorithms would come from a variety of sources, right from in-house databases to third party applications to data from CRMs. Divide this into training and test data to account for overfitting (avoiding an algorithm which corresponds too closely to a particular dataset and therefore failing to fit additional data or predict future observations reliably.)
The more information that is available, the better the accuracy the model.
Oversample the data for fraudulent accounts as such activity is outnumbered by other accounts and hence it is preferable to artificially increase its representation.
Since the more is merrier for raw data, we need to prune it down to make this simpler and the final algorithm relevant and much more accurate.
Run it through a model to do the computing and arrive at a final basis on which to decide which transactions might be fraudulent. Since the final prediction would either be a yes or a no, the model selected generally is a logistic regression.
Test it on another dataset and finally apply it to the live transactional data to check for frauds.