[Tut] How I Created a Customer Churn Prediction App to Help Businesses - Printable Version +- Sick Gaming (https://www.sickgaming.net) +-- Forum: Programming (https://www.sickgaming.net/forum-76.html) +--- Forum: Python (https://www.sickgaming.net/forum-83.html) +--- Thread: [Tut] How I Created a Customer Churn Prediction App to Help Businesses (/thread-100975.html) |
[Tut] How I Created a Customer Churn Prediction App to Help Businesses - xSicKxBot - 04-13-2023 How I Created a Customer Churn Prediction App to Help Businesses <div> <div class="kk-star-ratings kksr-auto kksr-align-left kksr-valign-top" data-payload='{"align":"left","id":"1271532","slug":"default","valign":"top","ignore":"","reference":"auto","class":"","count":"1","legendonly":"","readonly":"","score":"5","starsonly":"","best":"5","gap":"5","greet":"Rate this post","legend":"5\/5 - (1 vote)","size":"24","title":"How I Created a Customer Churn Prediction App to Help Businesses","width":"142.5","_legend":"{score}\/{best} - ({count} {votes})","font_factor":"1.25"}'> <div class="kksr-stars"> <div class="kksr-stars-inactive"> <div class="kksr-star" data-star="1" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="2" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="3" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="4" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" data-star="5" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> </p></div> <div class="kksr-stars-active" style="width: 142.5px;"> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> <div class="kksr-star" style="padding-right: 5px"> <div class="kksr-icon" style="width: 24px; height: 24px;"></div> </p></div> </p></div> </div> <div class="kksr-legend" style="font-size: 19.2px;"> 5/5 – (1 vote) </div> </p></div> <p>Many businesses will agree that it takes a lot more time, money, and resources to get new customers than to keep existing ones. Hence, they are very much interested in knowing how many existing customers are leaving their business. This is known as <strong>churn</strong>.</p> <p>Churn tells business owners how many customers are no longer using their products and services. It is also the rate at which an amount of money is lost as a result of customers or employers leaving the company. The churn rate gives companies an idea of business performance. If the churn rate is higher than the growth rate, it means that the business is not growing.</p> <p>There are many reasons offered to explain customer churn. These include poor customer satisfaction, finance issues, customers not feeling appreciated, and customers’ need for a change. Understandably, companies have no absolute control over churn. But they can work to reduce to the barest minimum churn rate as regards the ones they have greater control.</p> <div class="wp-block-image"> <figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="616" height="924" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-69.png" alt="" class="wp-image-1271551" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-69.png 616w, https://blog.finxter.com/wp-content/uploads/2023/04/image-69-200x300.png 200w" sizes="(max-width: 616px) 100vw, 616px" /></figure> </div> <p>As data scientists, your role is to assist these companies by building a churn model tailored to the company’s goals and expectations to predict customer churn. Due to the lack of data available to meet a company’s specific needs, it becomes challenging for data scientists to design an effective churn model.</p> <p>However, we will make do with sample data for a fictional telecommunication company. You know, it is membership-based businesses performing subscription-based services that are mostly affected by customer churn. This data sourced by the IBM Developer Platform is available on <a href="https://github.com/finxter/customer-churn" data-type="URL" data-id="https://github.com/finxter/customer-churn" target="_blank" rel="noreferrer noopener">my GitHub page</a>.</p> <p>The dataset has 7043 rows and 21 columns which comprise 17 categorical features, 3 numerical features, and the prediction feature. Check my GitHub page for more information about the dataset.</p> <h2 class="wp-block-heading">Data Preprocessing</h2> <p>This step will be taken to make the data suitable for machine learning. We will start by getting an overview of the dataset.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd df = pd.read_csv('churn.csv') # get the shape of the dataset df.shape (7043, 21) # print the columns df.columns Index('customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents', 'tenure', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'], dtype='object') # check for missing values df.isna().sum() ''' customerID 0 gender 0 SeniorCitizen 0 Partner 0 Dependents 0 tenure 0 PhoneService 0 MultipleLines 0 InternetService 0 OnlineSecurity 0 OnlineBackup 0 DeviceProtection 0 TechSupport 0 StreamingTV 0 StreamingMovies 0 Contract 0 PaperlessBilling 0 PaymentMethod 0 MonthlyCharges 0 TotalCharges 0 Churn 0 dtype: int64 ''' #check for duplicates df.customerID.nunique() 7043 </pre> <p>Next, we drop the customerID column which was just there for identification purposes.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">df.drop(['customerID'], axis=1, inplace=True)</pre> <p>The <code>axis=1</code> means the columns. The <code>inplace</code> parameter is directly applied to the dataset.</p> <p>If you take a look at the dataset using the <code><a href="https://blog.finxter.com/pandas-dataframe-head-method/" data-type="post" data-id="343658" target="_blank" rel="noreferrer noopener">head()</a></code> method, you will notice that many features including the target feature have rows with values of Yes and No. We will transform them to 0 and 1 using <code>LabelEncoder</code> from the Scikit-learn library. We will also do the same with columns that have more than two categories.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">from sklearn.preprocessing import LabelEncoder label_encoder = LabelEncoder() obj = (df.dtypes == 'object') for col in list(obj[obj].index): df[col] = label_encoder.fit_transform(df[col])</pre> <h2 class="wp-block-heading">Model Building</h2> <p>It’s now time to train our data using Machine Learning algorithms. As we don’t know which model will perform well on our dataset, we will first test using different models.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.metrics import accuracy_score from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.naive_bayes import GaussianNB from sklearn.svm import SVC from sklearn.ensemble import AdaBoostClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import ExtraTreesClassifier from sklearn.ensemble import GradientBoostingClassifier from xgboost import XGBClassifier X = df.drop([‘Churn’], axis=1) Y = df.Churn X_train, X_test Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=7) models = [LogisticRegression(), RandomForestClassifier(),AdaBoostClassifier(), SVC(), DecisionTreeClassifier(), KNeighborsClassifier(), GaussianNB(), ExtraTreesClassifier(), LinearDiscriminantAnalysis(), GradientBoostingClassifier(), ] scaler = StandardScaler() rescaledX = scaler.fit_transform(x_train) for model in models: model.fit(rescaledX, Y_train.values) preds = model.predict(X_test.values) results = accuracy_score(Y_test, preds) print(f'{results}') ''' 0.2753726046841732 0.7388218594748048 0.7388218594748048 0.7388218594748048 0.2753726046841732 0.26330731014904185 0.47906316536550747 0.27324343506032645 0.7388218594748048 0.30376153300212916 0.6593328601845281 0.7402413058907026 '''</pre> <p>The results show that XGBoost performed better than the other models in this dataset. Therefore, we will use XGBoost as our Machine Learning algorithm to predict customer churn.</p> <h2 class="wp-block-heading">Tuning XGBoost</h2> <div class="wp-block-image"> <figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="925" height="617" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-70.png" alt="" class="wp-image-1271552" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-70.png 925w, https://blog.finxter.com/wp-content/uploads/2023/04/image-70-300x200.png 300w, https://blog.finxter.com/wp-content/uploads/2023/04/image-70-768x512.png 768w" sizes="(max-width: 925px) 100vw, 925px" /></figure> </div> <p>The XGBoost algorithm achieved a 74% accuracy score. Can it do better? Let’s try tuning the model using learning curves. To understand what we meant by the learning curve, please read <a href="1">this article</a>.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">models = [LogisticRegression(), RandomForestClassifier(),AdaBoostClassifier(), SVC(), DecisionTreeClassifier(), KNeighborsClassifier(), GaussianNB(), ExtraTreesClassifier(), LinearDiscriminantAnalysis(), GradientBoostingClassifier(), ] scaler = StandardScaler() rescaledX = scaler.fit_transform(x_train) for model in models: model.fit(rescaledX, Y_train.values) preds = model.predict(X_test.values) results = accuracy_score(Y_test, preds) print(f'{results}')</pre> <p>The results show that XGBoost performed better than the other models in this dataset. Therefore, we will use XGBoost as our Machine Learning algorithm to predict customer churn.</p> <h2 class="wp-block-heading">Tuning XGBoost</h2> <p>The XGBoost algorithm achieved a 74% accuracy score. Can it do better? Let’s try tuning the model using learning curves. To understand what we meant by the learning curve, please read <a href="https://machinelearningmastery.com/tune-xgboost-performance-with-learning-curves/" data-type="URL" data-id="https://machinelearningmastery.com/tune-xgboost-performance-with-learning-curves/" target="_blank" rel="noreferrer noopener">this article</a>.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># define the model model = XGBClassifier() # define the datasets to evaluate each iteration evalset = [(X_train, Y_train), (X_test, Y_test)] # fit the model model.fit(X_train, Y_train, eval_metric='logloss', eval_set=evalset) # evaluate performance preds = model.predict(X_test) score = accuracy_score(y_test, preds) print(f'Accuracy: {round(score*100, 1)}%') # Accuracy: 77.9%</pre> </p> <p>Wow, the model has improved with 77.9% accuracy score. Can it still do better? Let’s increase the number of iterations from 100 (default) to 200 and reduce the eta hyperparameter to 0.05 (default is 0.3) to slow down the learning rate.</p> <pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">model = XGBClassifier(n_estimators=200, eta=0.05) # fit the model model.fit(X_train, Y_train, eval_metric='logloss', eval_set=evalset) preds = model.predict(x_test) score = accuracy_score(y_test,preds) print(f'Accuracy: {round(score*100, 1)}%') # Accuracy: 78.6%</pre> <p>This is the extent we can go. Of course, we can go on tuning the model to achieve a higher score. An accuracy score of 78.6% is not bad.</p> <p>Create a new folder and save the following to a file named <code>model.py</code>.</p> <pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">#Import libraries import pandas as pd from xgboost import XGBClassifier import pickle from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder df = pd.read_csv('churn.csv') # Drop customerID df.drop(['customerID'], axis=1, inplace=True) # Convert to int datatype label_encoder = LabelEncoder() obj = (df.dtypes == ‘object’) for col in list(obj[obj].index): df[col] = label_encoder.fit_transform(df[col]) X = df.drop(['Churn'], axis=1) Y = df.Churn # splitting the dataset X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=7) model = XGBClassifier(n_estimators=200, eta=0.05) # define the datasets to evaluate each iteration evalset = [(X_train, Y_train), (X_test, Y_test)] # fit the model model.fit(X_train, Y_train, eval_metric='logloss', eval_set=evalset) # saving the trained model pickle.dump(model, open('lg_model.pkl', 'wb')) </pre> <p>Notice we save the trained model as a <a href="https://blog.finxter.com/serialization-part-1/" data-type="post" data-id="173558" target="_blank" rel="noreferrer noopener">pickle object</a> to be used later. We want the model to be running on Streamlit local server. So, we will create a Streamlit application for this. Create other files called <code>app.py</code> and <code>predict.py</code> in your current folder. Check my <a href="https://github.com/finxter/customer-churn" data-type="URL" data-id="https://github.com/finxter/customer-churn" target="_blank" rel="noreferrer noopener">GitHub page</a> to see the full content of the files.</p> <div class="wp-block-image"> <figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="713" height="401" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-67.png" alt="" class="wp-image-1271544" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-67.png 713w, https://blog.finxter.com/wp-content/uploads/2023/04/image-67-300x169.png 300w" sizes="(max-width: 713px) 100vw, 713px" /></figure> </div> <p>Please remember to manually run the <code>model.py</code> to generate the pickle file as I won’t be pushing it to GitHub. After running the <code>model.py</code> file, the accuracy was 80.4% showing the model learned the data very well.</p> <div class="wp-block-image"> <figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="713" height="401" src="https://blog.finxter.com/wp-content/uploads/2023/04/image-68.png" alt="" class="wp-image-1271545" srcset="https://blog.finxter.com/wp-content/uploads/2023/04/image-68.png 713w, https://blog.finxter.com/wp-content/uploads/2023/04/image-68-300x169.png 300w" sizes="(max-width: 713px) 100vw, 713px" /></figure> </div> <h2 class="wp-block-heading">Conclusion</h2> <p>In this tutorial, we created a customer churn prediction app to help businesses deal with some of the challenges facing them. We use the XGBoost model to train the data and generate the model. There are many things we didn’t do. Data visualization, feature engineering, and dealing with imbalance classification are some of them.</p> <p>You may wish to try them out and see if they can improve the model’s performance. Unfortunately, I wasn’t able to deploy the app because I couldn’t push the heavy pickle file to <a href="https://github.com/finxter/customer-churn" data-type="URL" data-id="https://github.com/finxter/customer-churn" target="_blank" rel="noreferrer noopener">GitHub</a>. Try pushing yours and then, you deploy it on Streamlit Cloud. Alright, enjoy your day.</p> </div> https://www.sickgaming.net/blog/2023/04/06/how-i-created-a-customer-churn-prediction-app-to-help-businesses/ |