Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] How I Built a House Price Prediction App Using Streamlit

#1
How I Built a House Price Prediction App Using Streamlit

<div>
<div class="kk-star-ratings kksr-auto kksr-align-left kksr-valign-top" data-payload='{&quot;align&quot;:&quot;left&quot;,&quot;id&quot;:&quot;1104457&quot;,&quot;slug&quot;:&quot;default&quot;,&quot;valign&quot;:&quot;top&quot;,&quot;ignore&quot;:&quot;&quot;,&quot;reference&quot;:&quot;auto&quot;,&quot;class&quot;:&quot;&quot;,&quot;count&quot;:&quot;1&quot;,&quot;legendonly&quot;:&quot;&quot;,&quot;readonly&quot;:&quot;&quot;,&quot;score&quot;:&quot;5&quot;,&quot;starsonly&quot;:&quot;&quot;,&quot;best&quot;:&quot;5&quot;,&quot;gap&quot;:&quot;5&quot;,&quot;greet&quot;:&quot;Rate this post&quot;,&quot;legend&quot;:&quot;5\/5 - (1 vote)&quot;,&quot;size&quot;:&quot;24&quot;,&quot;width&quot;:&quot;142.5&quot;,&quot;_legend&quot;:&quot;{score}\/{best} - ({count} {votes})&quot;,&quot;font_factor&quot;:&quot;1.25&quot;}'>
<div class="kksr-stars">
<div class="kksr-stars-inactive">
<div class="kksr-star" data-star="1" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="2" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="3" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="4" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" data-star="5" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
<div class="kksr-stars-active" style="width: 142.5px;">
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
<div class="kksr-star" style="padding-right: 5px">
<div class="kksr-icon" style="width: 24px; height: 24px;"></div>
</p></div>
</p></div>
</div>
<div class="kksr-legend" style="font-size: 19.2px;"> 5/5 – (1 vote) </div>
</p></div>
<p>In this tutorial, I will take you through a <a rel="noreferrer noopener" href="https://blog.finxter.com/machine-learning-engineer-income-and-opportunity/" data-type="post" data-id="306050" target="_blank">machine learning</a> project on House Price prediction with Python. We have previously learned how to solve a <a href="https://blog.finxter.com/how-i-built-and-deployed-a-python-loan-eligibility-prediction-app-on-streamlit/" data-type="post" data-id="1080976" target="_blank" rel="noreferrer noopener">classification problem</a>. </p>
<p class="has-base-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f449.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Recommended</strong>: <a href="https://blog.finxter.com/how-i-built-and-deployed-a-python-loan-eligibility-prediction-app-on-streamlit/" data-type="post" data-id="1080976" target="_blank" rel="noreferrer noopener">How I Built and Deployed a Python Loan Eligibility Prediction App on Streamlit</a></p>
<p>Today, I will show you how to solve a regression problem and deploy it on Streamlit Cloud.</p>
<p>You can find an app prototype to try out <a href="https://jonaben1-house-price-pred-app-8c0uje.streamlit.app/" data-type="URL" data-id="https://jonaben1-house-price-pred-app-8c0uje.streamlit.app/" target="_blank" rel="noreferrer noopener">here</a>:</p>
<div class="wp-block-image">
<figure class="aligncenter size-large"><a href="https://jonaben1-house-price-pred-app-8c0uje.streamlit.app/" target="_blank" rel="noreferrer noopener"><img loading="lazy" decoding="async" width="1024" height="803" src="https://blog.finxter.com/wp-content/uploads/2023/02/image-6-1024x803.png" alt="" class="wp-image-1104520" srcset="https://blog.finxter.com/wp-content/uploads/2023/02/image-6-1024x803.png 1024w, https://blog.finxter.com/wp-content/uplo...00x235.png 300w, https://blog.finxter.com/wp-content/uplo...68x602.png 768w, https://blog.finxter.com/wp-content/uplo...mage-6.png 1057w" sizes="(max-width: 1024px) 100vw, 1024px" /></a></figure>
</div>
<h2>What Is Streamlit?</h2>
<p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Info</strong>: Streamlit is a popular choice for data scientists looking to deploy their apps quickly because it is easy to set up and is compatible with data science libraries. We are going to set up the dashboard so that when our users fill in some details, it will predict the price of a house.</p>
<p>But you may wonder: </p>
<h2>Why Is House Price Prediction Important?</h2>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="688" height="387" src="https://blog.finxter.com/wp-content/uploads/2023/02/image-8.png" alt="" class="wp-image-1104530" srcset="https://blog.finxter.com/wp-content/uploads/2023/02/image-8.png 688w, https://blog.finxter.com/wp-content/uplo...00x169.png 300w" sizes="(max-width: 688px) 100vw, 688px" /></figure>
</div>
<p>Well, house prices are an important reflection of the economy. The price of a property is important in real estate transactions as it provides information to stakeholders, including real estate agents, investors, and developers, to enable them to make informed decisions.</p>
<p>Governments also use such information to formulate appropriate regulatory policies. Overall, it helps all parties involved to determine the selling price of a house. With such information, they will then decide when to buy or sell a house.</p>
<p>We will use machine learning with Python to try to predict the price of a house. Having a background knowledge of Python and its usage in machine learning is a necessary prerequisite for this tutorial. </p>
<p class="has-base-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f449.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Recommended</strong>: <a rel="noreferrer noopener" href="https://blog.finxter.com/python-crash-course/" data-type="post" data-id="3951" target="_blank">Python Crash Course</a> (Blog + Cheat Sheets)</p>
<p>To keep things simple, we will not be dealing with data visualization.</p>
<h2>The Datasets</h2>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="688" height="458" src="https://blog.finxter.com/wp-content/uploads/2023/02/image-9.png" alt="" class="wp-image-1104532" srcset="https://blog.finxter.com/wp-content/uploads/2023/02/image-9.png 688w, https://blog.finxter.com/wp-content/uplo...00x200.png 300w" sizes="(max-width: 688px) 100vw, 688px" /></figure>
</div>
<p>We will be using California Housing Data of 1990 to make this prediction. You can get the dataset on Kaggle or you check my GitHub page. Let’s load it using the Pandas library and find the number of rows and columns.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd data = pd.read_csv('housing.csv')
print(data.shape)
# (20640, 10)</pre>
<p>We can see the dataset has 20640 rows and 10 features.</p>
<p>Let’s get more information about the columns using the <code>.info()</code> method.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">data.info()</pre>
<p>Output:</p>
<pre class="wp-block-preformatted"><code>&lt;class 'pandas.core.frame.DataFrame'&gt;
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 10 columns): # Column Non-Null Count Dtype
--- ------ -------------- ----- 0 longitude 20640 non-null float64 1 latitude 20640 non-null float64 2 housing_median_age 20640 non-null float64 3 total_rooms 20640 non-null float64 4 total_bedrooms 20433 non-null float64 5 population 20640 non-null float64 6 households 20640 non-null float64 7 median_income 20640 non-null float64 8 median_house_value 20640 non-null float64 9 ocean_proximity 20640 non-null object
dtypes: float64(9), object(1)
memory usage: 1.6+ MB
</code></pre>
<ol type="a">
<li>The <code>longitude</code> indicates how far west a house is while the <code>latitude</code> shows how far north the house is.</li>
<li>The <code>housing_median_age</code> indicates the median age of a building. A lower number tells us that the house is newly constructed.</li>
<li>The <code>total_rooms</code> and <code>total_bedrooms</code> indicate the total number of rooms and bedrooms within a block.</li>
<li>The population tells us the number of people within a block while the households tell us the number of people living within a home unit of a block.</li>
<li>The <code>median_income</code> is measured in tens of thousands of US Dollars. It shows the median income of households living within a block.</li>
<li>The <code>median_house_value</code> is also measured in US Dollars. It is the median house value for households living in one block.</li>
<li>The <code>ocean_proximity</code> tells us how close to the sea a house is located.</li>
</ol>
<p>The dataset has the same number of columns except <code>total_bedroom</code> indicating the presence of missing values. They are all of <code>float</code> datatype except <code>ocean_proximity</code> which is categorical even though it is shown as object. Let us first confirm this.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">data.ocean_proximity.value_counts()</pre>
<p>Output:</p>
<pre class="wp-block-preformatted"><code>&lt;1H OCEAN 9136
INLAND 6551
NEAR OCEAN 2658
NEAR BAY 2290
ISLAND 5
Name: ocean_proximity, dtype: int64</code></pre>
<p>It is categorical. So, we have to convert the <code>ocean_proximity</code> to int datatype using <code>labelEncoder</code> from the <a href="https://blog.finxter.com/scikit-learn-cheat-sheets/" data-type="post" data-id="20549" target="_blank" rel="noreferrer noopener">Scikit-learn library</a>.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">from sklearn.preprocessing import LabelEncoder label_encoder = LabelEncoder()
obj = (data.dtypes == 'object') for col in list(obj[obj].index): data[col] = label_encoder.fit_transform(data[col])
</pre>
<p>Let’s check to confirm.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">data.ocean_proximity.value_counts()</pre>
<p>Output:</p>
<pre class="wp-block-preformatted"><code>0 9136
1 6551
4 2658
3 2290
2 5
Name: ocean_proximity, dtype: int64</code></pre>
<p>Take note of the way <code>labelEncoder</code> ordered the values. We will apply this when creating our Streamlit dashboard. We then fill in the missing values with the mean of their respective columns.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">for col in data.columns: data[col] = data[col].fillna(data[col].mean()) print(data.isna().sum())</pre>
<p>Output:</p>
<pre class="wp-block-preformatted"><code>longitude 0
latitude 0
housing_median_age 0
total_rooms 0
total_bedrooms 0
population 0
households 0
median_income 0
median_house_value 0
ocean_proximity 0
dtype: int64</code></pre>
<p>Having confirmed that there are no missing values, we can now proceed to the next step.</p>
<h2>Standardizing the Data</h2>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="688" height="459" src="https://blog.finxter.com/wp-content/uploads/2023/02/image-10.png" alt="" class="wp-image-1104533" srcset="https://blog.finxter.com/wp-content/uploads/2023/02/image-10.png 688w, https://blog.finxter.com/wp-content/uplo...00x200.png 300w" sizes="(max-width: 688px) 100vw, 688px" /></figure>
</div>
<p>If you take a glimpse of our data using the <code>.head()</code> method, you will observe that the data is of differing scales. </p>
<p>This will affect the model’s ability to perform accurate predictions. </p>
<p>Hence, we will have to standardize our data using StandardScaler from Scikit-learn. Also, to prevent data leakage, we will make use of pipelines.</p>
<h2>The Models</h2>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="688" height="457" src="https://blog.finxter.com/wp-content/uploads/2023/02/image-11.png" alt="" class="wp-image-1104535" srcset="https://blog.finxter.com/wp-content/uploads/2023/02/image-11.png 688w, https://blog.finxter.com/wp-content/uplo...00x199.png 300w" sizes="(max-width: 688px) 100vw, 688px" /></figure>
</div>
<p>We have no idea which algorithm or model will perform well in this regression problem. </p>
<p>A test will be carried out on different algorithms using default tuning parameters. Since this is a <a rel="noreferrer noopener" href="https://blog.finxter.com/python-linear-regression-1-liner/" data-type="post" data-id="1920" target="_blank">regr</a>ession problem, we will be using 10-fold cross-validation to design our test harness and evaluate the models using R Squared metric.</p>
<p class="has-global-color-8-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f4a1.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Info</strong>: The <strong>R Squared metric</strong> is an indication of goodness of fit. It is between 0 and 1. The closer to 1 the better. When the value is 1, it means a perfect fit.</p>
<p>K-fold cross-validation works by splitting the datasets into several parts (10 folds in our case). </p>
<p>The algorithm is trained repeatedly on each fold with one held back for testing. We chose this approach over <code>train_test_split</code> method because it gives us a more accurate and reliable result as the model is trained and evaluated repeatedly on different data.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression, Lasso, ElasticNet
from sklearn.model_selection import KFold, cross_val_score, train_test_split
from sklearn.pipeline import Pipeline
import bz2 pipelines = []
pipelines.append(('ScaledLR', Pipeline([('Scaler', StandardScaler()), ('LR', LinearRegression())])))
pipelines.append(('ScaledLASSO', Pipeline([('Scaler', StandardScaler()), ('LASSO', Lasso())])))
pipelines.append(('ScaledEN', Pipeline([('Scaler', StandardScaler()), ('EN', ElasticNet())])))
pipelines.append(('ScaledKNN', Pipeline([('Scaler', StandardScaler()), ('KNN', KNeighborsRegressor())])))
pipelines.append(('ScaledCART', Pipeline([('Scaler', StandardScaler()), ('CART', DecisionTreeRegressor())])))
pipelines.append(('ScaledSVR', Pipeline([('Scaler', StandardScaler()), ('SVR', SVR())]))) x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=7) def modeling(models): for name, model in models: kfold = KFold(n_splits=10) results = cross_val_score(model, x_train, y_train, cv = kfold, scoring='r2') print(f'{name} = {results.mean()}')
</pre>
<p>Notice how we used <code>Pipeline</code> while standardizing our models. We then created a function that used 10-fold cross validation to repeatedly train our models. Then, the result is displayed using R Squared metric.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">modeling(pipelines)</pre>
<pre class="wp-block-preformatted"><code>ScaledLR = 0.6321641933826154
ScaledLASSO = 0.6321647820595134
ScaledEN = 0.4953062096224026
ScaledKNN = 0.7106787517028879
ScaledCART = 0.6207570733565403
ScaledSVR = -0.05047991785208246</code>
</pre>
<p>The results show that KNN benefited from scaling the data. Let’s see if we can improve the result by tuning KNN parameters.</p>
<h2>Tuning the Parameters</h2>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="688" height="688" src="https://blog.finxter.com/wp-content/uploads/2023/02/image-12.png" alt="" class="wp-image-1104536" srcset="https://blog.finxter.com/wp-content/uploads/2023/02/image-12.png 688w, https://blog.finxter.com/wp-content/uplo...00x300.png 300w, https://blog.finxter.com/wp-content/uplo...50x150.png 150w" sizes="(max-width: 688px) 100vw, 688px" /></figure>
</div>
<p>The default number of neighbors of <a href="https://blog.finxter.com/k-nearest-neighbors-as-a-python-one-liner/" data-type="post" data-id="2445" target="_blank" rel="noreferrer noopener">KNN</a> is 7, and with it KNN achieved good results. We will conduct a grid search to identify which parameters will yield an even greater score.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">scaler = StandardScaler().fit(x_train)
rescaledx = scaler.transform(x_train)
k = list(range(1, 31))
kfold = KFold(n_splits=10)
grid = GridSearchCV(model, param_grid=param_grid, cv = k, scoring='r2')
grid_result = grid.fit(rescaledx, y_train) print(f'Best: {grid_result.best_score_} using {grid_result.best_params_}')
# Best: 0.7242988300529242 using {'n_neighbors': 14}</pre>
<p>The best for k is 14 with a mean score of 0.7243, slightly improved compared to the previous score. </p>
<p>Can we better this score? Yes, of course. I’m aiming for 80% and above accuracy. In that case, we will try using ensemble methods.</p>
<h2>Ensemble Methods</h2>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="659" height="878" src="https://blog.finxter.com/wp-content/uploads/2023/02/image-13.png" alt="" class="wp-image-1104539" srcset="https://blog.finxter.com/wp-content/uploads/2023/02/image-13.png 659w, https://blog.finxter.com/wp-content/uplo...25x300.png 225w" sizes="(max-width: 659px) 100vw, 659px" /></figure>
</div>
<p>Let’s see what we can achieve using 4 different ensemble machine learning algorithms. Everything other than the models remains the same.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">from sklearn.ensemble import RandomForestRegressor, ExtraTreesRegressor, GradientBoostingRegressor, AdaBoostRegressor # ensembles
ensembles = []
ensembles.append(('ScaledAB', Pipeline([('Scaler', StandardScaler()), ('AB', AdaBoostRegressor())])))
ensembles.append(('ScaledGBM', Pipeline([('Scaler', StandardScaler()), ('GBM', GradientBoostingRegressor())])))
ensembles.append(('ScaledRF', Pipeline([('Scaler', StandardScaler()), ('RF', RandomForestRegressor())])))
ensembles.append(('ScaledET', Pipeline([('Scaler', StandardScaler()), ('ET', ExtraTreesRegressor())]))) for name, model in ensembles: cv_results = cross_val_score(model, x_train, y_train, cv=kfold, scoring='r2') print(f'{name} = {cv_results.mean()}')
</pre>
<p>Output:</p>
<pre class="wp-block-preformatted"><code>ScaledAB = 0.3835320642243155
ScaledGBM = 0.772428054038791
ScaledRF = 0.81023174859107
ScaledET = 0.7978581384771901</code></pre>
<p>Random Forest Regressor achieved the highest score, and it’s what we are aiming for. Therefore, we are selecting the Random Forest Regressor algorithm to train and predict the price of a building. But can it do better than this? Sure, given that we trained only on default tuning parameters.</p>
<p>Here is the full code. Save it as <code>model.py</code>.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import pandas as pd
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split, KFold, cross_val_score
import pickle data = pd.read_csv('housing.csv')
# select only 1000 rows
data = data[:1000]
# converting categorical column to int datatype
label_encoder = LabelEncoder()
obj = (data.dtypes == 'object')
for col in list(obj[obj].index): data[col] = label_encoder.fit_transform(data[col]) # filling in missing values
for col in data.columns: data[col] = data[col].fillna(data[col].mean()) # making data a numpy array like
x = data.drop(['median_house_value'], axis=1)
y = data.median_house_value
x = x.values
y = y.values
# dividing data into train and test
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=7) # standardzing the data
stds = StandardScaler()
scaler = stds.fit(x_train)
rescaledx = scaler.transform(x_train) # selecting and fitting the model for training
model = RandomForestRegressor()
model.fit(rescaledx, y_train)
# saving the trained mode
pickle.dump(model, open('rf_model.pkl', 'wb'))
# saving StandardScaler
pickle.dump(stds, open('scaler.pkl', 'wb'))
</pre>
<p>We selected only 1000 rows to reduce pickled size. </p>
<p>Notice that we saved the <code>StandardScaler()</code> function to be used while creating the Streamlit dashboard. Since we scaled the dataset, we also expect to scale the input details from our users.</p>
<h2>Streamlit Dashboard</h2>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="475" height="872" src="https://blog.finxter.com/wp-content/uploads/2023/02/image-14.png" alt="" class="wp-image-1104541" srcset="https://blog.finxter.com/wp-content/uploads/2023/02/image-14.png 475w, https://blog.finxter.com/wp-content/uplo...63x300.png 163w" sizes="(max-width: 475px) 100vw, 475px" /></figure>
</div>
<p>It’s now time to design our Streamlit app. Once again, we will try to keep things simple and avoid complex designs. Save the following code as <code>app.py</code>.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import streamlit as st
import pickle def main(): style = """&lt;div style='background-color:pink; padding:12px'> &lt;h1 style='color:black'>House Price Prediction App&lt;/h1> &lt;/div>""" st.markdown(style, unsafe_allow_html=True) left, right = st.columns((2,2)) longitude = left.number_input('Enter the Longitude in negative number', step =1.0, format="%.2f", value=-21.34) latitude = right.number_input('Enter the Latitude in positive number', step=1.0, format='%.2f', value= 35.84) housing_median_age = left.number_input('Enter the median age of the building', step=1.0, format='%.1f', value=25.0) total_rooms = right.number_input('How many rooms are there in the house?', step=1.0, format='%.1f', value=56.0) total_bedrooms = left.number_input('How many bedrooms are there in the house?', step=1.0, format='%.1f', value=15.0) population = right.number_input('Population of people within a block', step=1.0, format='%.1f', value=250.0) households = left.number_input('Poplulation of a household', step=1.0, format='%.1f',value=43.0) median_income = right.number_input('Median_income of a household in Dollars', step=1.0, format='%.1f', value=3000.0) ocean_proximity = st.selectbox('How close to the sea is the house?', ('&lt;1H OCEAN', 'INLAND', 'NEAR OCEAN', 'NEAR BAY', 'ISLAND')) button = st.button('Predict') # if button is pressed if button: # make prediction result = predict(longitude, latitude, housing_median_age, total_rooms,total_bedrooms, population, households, median_income, ocean_proximity) st.success(f'The value of the house is ${result}')
</pre>
<p>We imported Streamlit and other libraries. Then we defined our main function. We want it to be executed as soon as we open the app. So, we will call the function using the <code>__name__</code> variable at the very last of our script.</p>
<p>The <code>unsafe_allow_html</code> makes it possible for the HTML tags to be executed by Python. </p>
<p>With <code>st.columns</code>, we were able to display our variables side by side. We formatted each variable to be the same datatype in our dataset. If the button is pressed, then a callback function, the <code>predict()</code> function, is executed. </p>
<p class="has-base-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f449.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Recommended</strong>: <a href="https://blog.finxter.com/learning-streamlits-buttons-features/" data-type="post" data-id="462652" target="_blank" rel="noreferrer noopener">Streamlit Button — A Helpful Guide</a></p>
<p>Let’s now define the <code>predict()</code> function.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># load the train model
with open('rf_model.pkl', 'rb') as rf: model = pickle.load(rf) # load the StandardScaler
with open('scaler.pkl', 'rb') as stds: scaler = pickle.load(stds) def predict(longitude, latitude, housing_median_age, total_rooms, total_bedrooms, population, households, median_income, ocean_pro): # processing user input ocean = 0 if ocean_pro == '&lt;1H OCEAN' else 1 if ocean_pro == 'INLAND' else 2 if ocean_pro == 'ISLAND' else 3 if ocean_pro == 'NEAR BAY' else 4 med_income = median_income / 5 lists = [longitude, latitude, housing_median_age, total_rooms, total_bedrooms, population, households, med_income, ocean] df = pd.DataFrame(lists).transpose() # scaling the data scaler.transform(df) # making predictions using the train model prediction = model.predict(df) result = int(prediction) return result
</pre>
<p>We started by loading the train model and <code>StandardScaler</code> we saved earlier.</p>
<p>In the <code>predict()</code> function, we use a <a rel="noreferrer noopener" href="https://blog.finxter.com/python-one-line-ternary/" data-type="post" data-id="10641" target="_blank">ternary operator</a> to turn user input into a number. More info about this operator in the referenced blog tutorial or this video:</p>
<figure class="wp-block-embed-youtube wp-block-embed is-type-video is-provider-youtube"><a href="https://blog.finxter.com/how-i-built-a-house-price-prediction-app-using-streamlit/"><img src="https://blog.finxter.com/wp-content/plugins/wp-youtube-lyte/lyteCache.php?origThumbUrl=https%3A%2F%2Fi.ytimg.com%2Fvi%2F9XXcUHXrqZ4%2Fhqdefault.jpg" alt="YouTube Video"></a><figcaption></figcaption></figure>
</p>
<p>Notice that we made sure it corresponds with the number assigned by <code>LabelEncoder</code>. If you are ever in doubt, use the <code>.value_counts()</code> method on the categorical column to confirm.</p>
<p>We <a href="https://blog.finxter.com/python-in-place-division-operator/" data-type="post" data-id="32906" target="_blank" rel="noreferrer noopener">divided</a> the <code>median_income</code> by 5 since the corresponding column in our dataset is said to be in tens of thousands of Dollars. However, this may not be necessary given that <code>StandardScaler</code> finally scaled the data. We did it just to be on the safe side.</p>
<p>The double parentheses are our way of instructing Python to turn the given inputs into a DataFrame. We also made sure the order of the parameters in the <code>predict()</code> function corresponds accordingly.</p>
<div class="wp-block-image">
<figure class="aligncenter size-full"><img decoding="async" loading="lazy" width="708" height="398" src="https://blog.finxter.com/wp-content/uploads/2023/02/image-5.png" alt="" class="wp-image-1104511" srcset="https://blog.finxter.com/wp-content/uploads/2023/02/image-5.png 708w, https://blog.finxter.com/wp-content/uplo...00x169.png 300w" sizes="(max-width: 708px) 100vw, 708px" /></figure>
</div>
<p>If the function seems to predict the same amount despite changes to the input details, then you may check the correlation the target variable has over the features by typing <code>data.corr()</code>. </p>
<p>If we were to apply <strong>Recursive Feature Elimination (RFE)</strong> to select the best features capable of predicting the target variable, it would select just 4: <code>longitude</code>, <code>latitude</code>, <code>median_income</code>, and <code>ocean_proximity</code>. Let me show you what I mean.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="python" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">from sklearn.feature_selection import RFE
model = RandomForestRegressor() rfe = RFE(model)
fit = rfe.fit(x,y) print(fit.n_features_)
# 4 print(fit.support_)
# array([ True, True, False, False, False, False, False, True, True]) print(fit.ranking_)
# array([1, 1, 2, 6, 3, 5, 4, 1, 1])
</pre>
<p>Only 4 features are capable of predicting the target variable. If you kept getting the same amount, that may be the reason.</p>
<p>The purpose of this tutorial is purely educational, to demonstrate how to use Python to solve machine learning problems. I tried to keep things simple by not going through data visualization and feature engineering. Since the data is old, it should not be relied on when making important decisions.</p>
<p>We finally came to the end of the tutorial. Be sure to <a href="https://github.com/finxter/House_Price_Pred" data-type="URL" data-id="https://github.com/finxter/House_Price_Pred" target="_blank" rel="noreferrer noopener">check my GitHub page</a> to see the full project code. </p>
<p>To deploy on Streamlit Cloud, I assume you have already created a repository and added the required files. Then, you create an account on Streamlit Cloud, and input your repository URL. Streamlit will do the rest. </p>
<p>I have already <a href="https://jonaben1-house-price-pred-app-8c0uje.streamlit.app" data-type="URL" data-id="https://jonaben1-house-price-pred-app-8c0uje.streamlit.app" target="_blank" rel="noreferrer noopener">deployed mine on Streamlit Cloud</a>. Alright, enjoy your day.</p>
<p class="has-base-background-color has-background"><img src="https://s.w.org/images/core/emoji/14.0.0/72x72/1f449.png" alt="?" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong>Recommended Project</strong>: <a rel="noreferrer noopener" href="https://blog.finxter.com/how-i-built-and-deployed-a-python-loan-eligibility-prediction-app-on-streamlit/" data-type="post" data-id="1080976" target="_blank">How I Built and Deployed a Python Loan Eligibility Prediction App on Streamlit</a></p>
</div>


https://www.sickgaming.net/blog/2023/02/...streamlit/
Reply



Forum Jump:


Users browsing this thread:
1 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016