Detect Anomalies - Fraud Detection

image

Detect anomalies in your data.

Anomaly detection, such as fraud detection is crucial for preventing financial losses. ML models can be trained on e.g. historical credit card transaction data, and then used to identify and prevent fraudulent transactions. Peliqan's platform also supports the regular updating and fine-tuning of models to ensure their effectiveness in detecting fraud.

Here is an example on how to build an ML model in Peliqan.io with a few lines of Python code.

Import required modules

We will be using IsolationForest Algorithm to perform the anomaly detection. More on IsolationForest here.

import numpy as np
import pandas as pd
from sklearn.metrics import classification_report, accuracy_score
from sklearn.ensemble import IsolationForest
from joblib import dump

Load a dataset

Load data from a table into a dataframe (df).

# Load Data
df = pq.load_table('transactions', df=True)

Using Streamlit to build an app

We use the Streamlit module (st), built into Peliqan.io, to build a UI and show data.

# Show a title (st = Streamlit module)
st.title("Anomaly/fraud detection")

# Show some text
st.text("Sample data")

# Show the dataframe
st.dataframe(df.head(), use_container_width=True)
image

Understanding data

the dataset we are using contained 28 compressed features which are the result of a PCA transformation. Feature 'Time' contains the seconds elapsed between each transaction and the first transaction in the dataset. The feature 'Amount' is the transaction Amount. Feature 'Class' is the response variable and it takes the value 1 in case of fraud and 0 otherwise.

Let’s have a look if the data is balanced by plotting the Number of frauds in transactions vs non-frauds.

st.header('Target variable distribution')

# Ploting distribution of target variable
count_classes = pd.value_counts(df['Class'], sort = True)
st.bar_chart(count_classes)
image

Training the model & predicting

fraud = df[df['Class'] == 1] # Number of fraudulent transactions
valid = df[df['Class'] == 0] # Number of valid transactions
outlier_fraction = len(fraud)/float(len(valid))

X = df.drop(['Class', 'Prediction'],axis = 1) # X is input
Y = df['Class'] # y is output

model = IsolationForest(max_samples = len(X),contamination = outlier_fraction).fit(X) # Fitting the model.
pred = model.predict(X) # Prediction using trained model.

#Reshape the prediction values to 0 for Valid transactions, 1 for Fraud transactions
pred[pred == 1] = 0
pred[pred == -1] = 1

Note: It's important that we drop any NaN values of Impute them before training IsolationForest, more on it here.

Evaluation & Saving the model

st.header('Evaluation')

accuracy = accuracy_score(Y, pred)
st.text("Accuracy: " + str(accuracy))
st.text("Report:" + classification_report(Y, pred))

# Saving the model
dump(model, '/data_app/model_credit_card')
st.success('Model saved successfully!')

More on understanding classification report function here.

image

Peliqan's platform enables businesses to create anomaly detection systems, improving operational efficiency and preventing financial losses. By detecting anomalies in real time, businesses can make informed decisions and take proactive measures to prevent issues before they occur.

Expand this to see the full code
import numpy as np
import pandas as pd
from sklearn.metrics import classification_report, accuracy_score
from sklearn.ensemble import IsolationForest
from joblib import dump

# Load Data
df = pq.load_table('transactions')

# Show a title (st = Streamlit module)
st.title("Anomaly/fraud detection")

# Show some text
st.text("Sample data")

# Show the dataframe
st.dataframe(df.head(), use_container_width=True)

st.header('Target variable distribution')

# Ploting distribution of target variable
count_classes = pd.value_counts(df['Class'], sort = True)
st.bar_chart(count_classes)

fraud = df[df['Class'] == 1] # Number of fraudulent transactions
valid = df[df['Class'] == 0] # Number of valid transactions
outlier_fraction = len(fraud)/float(len(valid))

X = df.drop(['Class', 'Prediction'],axis = 1) # X is input
Y = df['Class'] # y is output

model = IsolationForest(max_samples = len(X),contamination = outlier_fraction).fit(X) # Fitting the model.
pred = model.predict(X) # Prediction using trained model.

#Reshape the prediction values to 0 for Valid transactions, 1 for Fraud transactions
pred[pred == 1] = 0
pred[pred == -1] = 1

st.header('Evaluation')

accuracy = accuracy_score(Y, pred)
st.text("Accuracy: " + str(accuracy))
st.text("Report:" + classification_report(Y, pred))

# Saving the model
dump(model, '/data_app/model_credit_card')
st.success('Model saved successfully!')

Next Steps

  1. You can make real-time predictions on new incoming data and send alerts to slack if the model makes a prediction above a certain threshold.
  2. You can make predictions on real-time incoming data using the saved model. Learn more about making real-time predictions on new incoming data.
  3. Using Peliqan you can create an app for your users to consume the model you have created in a simple and intuitive UI. Learn more about creating apps for users to consume your model.