Create an account


Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
[Tut] Fitting Data With Scipy’s UnivariateSpline() and LSQUnivariateSpline()

#1
Fitting Data With Scipy’s UnivariateSpline() and LSQUnivariateSpline()

<div><p>This article explores the use of the functions .<em>UnivariateSpline()</em> and <em>.LSQUnivariateSpline</em>(), from the <a href="https://blog.finxter.com/scipy-interpolate-1d-2d-and-3d/" target="_blank" rel="noreferrer noopener" title="Scipy Interpolate 1D, 2D, and 3D">Scipy </a>package. </p>
<h2>What Are Splines?</h2>
<p>Splines are <a href="https://blog.finxter.com/python-math-module/" target="_blank" rel="noreferrer noopener" title="Python Math Module [Ultimate Guide]">mathematical </a>functions that describe an ensemble of <a href="https://blog.finxter.com/np-polyfit/" target="_blank" rel="noreferrer noopener" title="np.polyfit() — Curve Fitting with NumPy Polyfit">polynomials </a>which are interconnected with each other in specific points called the <em><strong>knots </strong></em>of the spline. </p>
<p>They’re used to <a href="https://blog.finxter.com/scipy-interpolate-1d-2d-and-3d/" target="_blank" rel="noreferrer noopener" title="Scipy Interpolate 1D, 2D, and 3D">interpolate </a>a set of data points with a function that shows a continuity among the considered range; this also means that the splines will generate a smooth function, which avoid abrupt changes in slope. </p>
<p>Compared to the more classical fitting methods, the main advantage of splines is that the polynomial equation is not the same throughout the whole range of data points. </p>
<p>Instead, the fitting function can change from one interval to the subsequent one, allowing for fitting and interpolation of very complicated point distributions. In this article we will see: </p>
<ul>
<li>i) how to generate a spline function to <strong><em>fit </em></strong>a given set of data points, </li>
<li>ii) which functions we can then use to <strong><em>extrapolate </em></strong>the value of points within the fitted range, </li>
<li>iii) how to <strong><em>improve </em></strong>the fitting, and</li>
<li>iv) how to calculate the related <strong><em>error</em></strong>.   </li>
</ul>
<h2>Splines — A Mathematical Perspective</h2>
<p>In mathematics, splines are functions described by an ensemble of polynomials. </p>
<p>Even if splines seem to be described by a single equation, they are defined by different polynomial functions which holds over a specific range of points, whose extremes are called <em><strong>knots</strong></em>. Each knot hence represents a change in the polynomial function that is describing the shape of the spline in that specific interval. </p>
<p>One of the main <strong>characteristics </strong>of splines is their continuity; they are continuous along the entire interval in which they are defined; this allows for the generation of a smooth curve, that fit our set of data points. </p>
<p>One of the main <strong>advantages </strong>of using splines for fitting problems, instead of single polynomials, is the possibility of using lower degree polynomial functions to describe very complicated functions. </p>
<p>Indeed, if we wanted to use a single polynomial function, the degree of the polynomial usually increases with the complexity of the function that has to be described; increasing the degree of the fitting polynomial could introduce unwanted errors in the problem.</p>
<p>Here is a nice video that explain in simple terms this issue:</p>
<figure class="wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-4-3 wp-has-aspect-ratio">
<div class="wp-block-embed__wrapper">
<iframe class='youtube-player' width='980' height='552' src='https://www.youtube.com/embed/nDs3PPnMYZ0?version=3&rel=1&fs=1&autohide=2&showsearch=0&showinfo=1&iv_load_policy=1&wmode=transparent' allowfullscreen='true' style='border:0;'></iframe>
</div>
</figure>
<p>Splines avoid this by varying the fitting equation over the different intervals that characterize the initial set of data points. From an historical point of view, the word “Spline” comes from the flexible spline devices that were exploited by the shipbuilders to draw smooth shapes in the designing of vessels. Nowadays they also find large application as fundamental tools in lots of CAD software (<a href="https://en.wikipedia.org/wiki/Spline_(mathematics)" target="_blank" rel="noreferrer noopener">https://en.wikipedia.org/wiki/Spline_(mathematics)</a> ). </p>
<h2>Scipy.UnivariateSpline</h2>
<p>In the first part of this article we explore the function <em>.UnivariateSpline()</em>; which can be used to fit a spline of a specific degree to some data points.</p>
<p>To understand how this function works, we start by generating our initial x and y arrays of data points. The x array (called “x”), is defined by using the <em><a href="https://blog.finxter.com/np-linspace/" target="_blank" rel="noreferrer noopener" title="How to Use np.linspace() in Python? A Helpful Illustrated Guide">np.linspace()</a></em> function; the y array is defined by exploiting the <em><a href="https://blog.finxter.com/python-random-module/" target="_blank" rel="noreferrer noopener" title="Python’s Random Module – Everything You Need to Know to Get Started">np.random</a></em> function called <em>.randn()</em>, which return a sample from the standard normal distribution.</p>
<p>See: <a href="https://numpy.org/devdocs/reference/random/generated/numpy.random.randn.html" target="_blank" rel="noreferrer noopener">https://numpy.org/devdocs/reference/random/generated/numpy.random.randn.html</a> for additional documentation.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">import matplotlib.pyplot as plt
from scipy.interpolate import UnivariateSpline, LSQUnivariateSpline
import numpy as np #x and y array definition (initial set of data points)
x = np.linspace(0, 10, 30)
y = np.sin(0.5*x)*np.sin(x*np.random.randn(30))</pre>
<p>Once we have defined the initial set of data points, we can call the function <em>.UnivariateSpline()</em>, from the Scipy package and calculate the spline that best fits our points. </p>
<p>While the procedure is rather simple, understanding the fundamental parameters that define the spline function that we want to create, might generate some confusion; to this purpose, it is better to analyze in detail the main input parameters that can be defined when calling the function in our code. </p>
<p>As can be also seen in the documentation (<a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.UnivariateSpline.html" target="_blank" rel="noreferrer noopener">https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.UnivariateSpline.html</a>), the <em>.UnivariateSpline()</em> function accepts as mandatory inputs the x and y arrays of data points that we want to fit.</p>
<p>In most cases, our aim is to fit complicated functions and to this purpose, other parameters must be specified. </p>
<p>One of the most important parameters is “k”, which refers to the degree of the polynomials that define the spline segments. “k” can vary between one and five; increasing the degree of the polynomials allows a better fitting of more complicated functions; however, in order not to introduce artifacts in our fit; the best practice is to use the lower degree that allows for the better fitting procedure. </p>
<p>Another relevant parameter is “s”, it’s a float number which defines the so-called <em>smoothing factor</em>, which directly affects the number of knots present in the spline. More precisely, once we fix a specific value of “s”, the number of knots will be increased until the difference between the value of the original data points in the y array and their respective datapoints along the spline is less than the value of “s” (see documentation for the mathematical formula). It can be understood that the lower the value of “s”, the higher the fitting accuracy and (most of the times) the n° of knots, since we are asking for a smaller difference between the original points and the fitted ones.</p>
<p>Now that the parameters that governs the shape of our spline are clearer, we can return to the code and define the spline function. In particular, we will give as input arrays the “x” and “y” arrays previously defined; the value of the smoothing factor is initially set to five while the parameter “k” is left with the default value, which is three.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">#spline definition spline = UnivariateSpline(x, y, s = 5) </pre>
<p>The output of the <em>.UnivariateSpline()</em> function is the function that fit the given set of data points. At this point, we can generate a denser x array, called “x_spline” and evaluate the respective values on the y axis using the spline function just defined; we then store them in the array “y_spline” and generate the plot.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">x_spline = np.linspace(0, 10, 1000)
y_spline = spline(x_spline)
#Plotting
fig = plt.figure()
ax = fig.subplots()
ax.scatter(x, y)
ax.plot(x_spline, y_spline, 'g')
plt.show()
</pre>
<p>The result of this procedure is displayed in Figure 1.</p>
<div class="wp-block-image">
<figure class="aligncenter size-large"><img loading="lazy" width="367" height="264" src="https://blog.finxter.com/wp-content/uploads/2020/12/image-24.png" alt="" class="wp-image-18397" srcset="https://blog.finxter.com/wp-content/uploads/2020/12/image-24.png 367w, https://blog.finxter.com/wp-content/uplo...00x216.png 300w, https://blog.finxter.com/wp-content/uplo...50x108.png 150w" sizes="(max-width: 367px) 100vw, 367px" /><figcaption><strong>Figure 1:</strong> Initial set of data points (blue points) and spline function generated for the fitting (green curve). As can be easily guessed, the spline function is not able to follow with sufficient accuracy the data points.</figcaption></figure>
</div>
<p>As can be seen from Figure 1, the obtained spline gives a really bad fit of our initial data points; the main reason is the relatively high value that was assigned to the <em>smoothing factor;</em> we will now explore a possible strategy for improving our spline, without introducing exaggerated alterations. </p>
<p>One of the best way to improve this situation is to exploit the method <code>.set_smoothing_factor(s)</code>; which continues the spline calculation according to a new smoothing factor (“s”, given as the only input), without altering the knots already found during the last call. This represents a convenient strategy, indeed, splines might be very sensitive to changes in the smoothing factor; this means that changing the smoothing function, directly in the .<em>UnivariateSpline() </em>calling, might alter significantly the output result in term of the spline shape (keep in mind that our goal is always to obtain the best fit with the simplest spline possible). The following code lines describe the definition of a new and more accurate spline function, with a smoothing factor equal to 0.5. </p>
<p>After the application of the above-mentioned method, the procedure is identical to the one described for generating the first spline.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group=""># Changing the smoothing factor for a better fit
spline.set_smoothing_factor(0.05)
y_spline2 = spline(x_spline)
</pre>
<p>We conclude by plotting the result; Figure 2 display the final output, the new spline is the blue curve, plotted together with the old one (green curve) and the initial data points (light blue points).</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">#Plotting
fig = plt.figure()
ax = fig.subplots()
ax.scatter(x, y)
ax.plot(x_spline, y_spline, 'g', alpha =0.5)
ax.plot(x_spline, y_spline2, 'b')
plt.show()
</pre>
<div class="wp-block-image">
<figure class="aligncenter size-large"><img loading="lazy" width="388" height="281" src="https://blog.finxter.com/wp-content/uploads/2020/12/image-25.png" alt="" class="wp-image-18400" srcset="https://blog.finxter.com/wp-content/uploads/2020/12/image-25.png 388w, https://blog.finxter.com/wp-content/uplo...00x217.png 300w, https://blog.finxter.com/wp-content/uplo...50x109.png 150w" sizes="(max-width: 388px) 100vw, 388px" /><figcaption><strong>Figure 2: </strong>New spline function (blue curve), plotted together with the old spline (green curve) and the initial data points (light blue points). After setting the smoothing factor to a lower value, the fit improves significantly; this is because we forced the initial points in the y array and the ones along the spline to have a smaller difference.</figcaption></figure>
</div>
<p>As can be seen from Figure 2, the newly generated spline function well describes the initial data points and still pass by the knots that were found in the initial call (data points common to both the two spline functions)</p>
<p>We conclude this part by illustrating some useful methods that can be used after the generation of the correct spline function, for describing our data points. The first of these methods is called “.__call__(x)”, which allows evaluating the value of specific points on the spline, given in the form of a <a href="https://blog.finxter.com/python-lists/" target="_blank" rel="noreferrer noopener" title="The Ultimate Guide to Python Lists">list </a>or single number. The following lines describe the application of this methods (we evaluate the the spline for a value of 2 in the x-axis).</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">#evaluate point along the spline
print(spline.__call__(2))
</pre>
<p>The result of the <a href="https://blog.finxter.com/the-separator-and-end-arguments-of-the-python-print-function/" target="_blank" rel="noreferrer noopener" title="Python Print Function [And Its SECRET Separator &amp; End Arguments]">print </a>command is 0.5029480519149454. Another important method is <code>.get_residual()</code>, which allows obtaining the weighted <a href="https://blog.finxter.com/python-one-line-sum-list/" target="_blank" rel="noreferrer noopener" title="Python One Line Sum List">sum </a>of squared residuals of the spline approximation (more simply, an evaluation of the error in the fitting procedure).</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">#get the residuals
print(spline.get_residual())</pre>
<p>The result for this case is 0.049997585478530546. In some applications, it could be of some interest to calculate the definite integral of the spline (i.e. the area underneath the spline curve between a specific range along the x-axis); to do this, the method <code>.integral(<em>a,b</em>)</code> represents the simplest solution; “a” and “b” are the lower and upper limits along the x-axis between which we want to evaluate the area (in this case we calculate the area underneath the spline, between 1 and 2). Application of this method is illustrated in the following lines.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">#definite integral of the spline
print(spline.integral(1,2))</pre>
<p>The result of the integration is -0.2935394976155577. The last method allows obtaining the values of the points in which the spline crosses the x-axis, i.e. the solutions to the equations defining the spline function. The method is called .roots(), its application is shown in the following lines.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">#finding the roots of the spline function
print(spline.roots())
</pre>
<p>The output of this last line is an array containing the values of the points for which the spline crosses the x-axis, namely: </p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">[1.21877130e-03 3.90089909e-01 9.40446113e-01 1.82311679e+00 2.26648393e+00 3.59588983e+00 3.99603385e+00 4.84430942e+00 6.04000192e+00 6.29857365e+00 7.33532448e+00 9.54966590e+00]</pre>
<h2>Scipy.LSQUnivariateSpline</h2>
<p>In the last part of this article, we introduce <em>.LSQUnivariateSpline()</em>, another function that can be used for spline generation. From a practical point of view, it works similarly to <em>.UnivariateSpline()</em>, indeed as we will see, there are very few differences in how we call and define it in our script. </p>
<p>The fundamental difference between this function and the previous one, is that <em>.LSQUnivariateSpline()</em> allows generating spline curves by directly controlling the number and the position of the knots. </p>
<p>This means that we have the <strong><em>full control of the knots</em></strong> that will define the spline; differently, in the previous case, the number of knots was indirectly regulated through the choice of the smoothing factor. In order to appreciate how our spline will change by increasing the number of knots, we start by defining two different arrays, “t” and “t1”, t1 is the denser array.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">#LSQUnivariateSpline
t = np.array([0.5, 1, 2.5])
t1 = np.linspace(1, 9, 20)
</pre>
<p>The function .<em>LSQUnivariateSpline</em>() accepts as mandatory input, the x, y arrays and the array “t”, which contains the coordinates of the knots that will define our spline. An important condition that has to be kept in mind is that the coordinates of the knots must be located within the range of the x array. </p>
<p>In our case, we will use the same x and y arrays employed for the previous case. At this point we have to call the function twice, in order to show the difference between the two set of knots arrays. In addition, we specify the parameter “k”, which again refers to the degree of the polynomials that describe the spline.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">LSQUspline = LSQUnivariateSpline(x, y, t1, k = 4)
LSQUspline1 = LSQUnivariateSpline(x, y, t, k = 4)
</pre>
<p>Our final task is to plot the two splines, together with the original data points. We will generate the arrays containing the y values of the two splines directly in the plotting command.</p>
<pre class="EnlighterJSRAW" data-enlighter-language="generic" data-enlighter-theme="" data-enlighter-highlight="" data-enlighter-linenumbers="" data-enlighter-lineoffset="" data-enlighter-title="" data-enlighter-group="">#Plotting
plt.scatter(x, y, s=8)
plt.plot(x_spline, LSQUspline(x_spline), color = 'b')
plt.plot(x_spline, LSQUspline1(x_spline), color = 'g')
plt.show()
</pre>
<p>The final result is displayed in Figure 3; as can be seen, by increasing the number of knots, the spline function better approximates our data points. If we check carefully, both the splines pass for the knots specified in the “t” and “t1” arrays, respectively. Most of the methods previously shown for <em>.UnivariateSpline()</em> work on this function too (for additional documentation please refer to <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.LSQUnivariateSpline.html">https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.LSQUnivariateSpline.html</a> ).</p>
<div class="wp-block-image">
<figure class="aligncenter size-large"><img loading="lazy" width="346" height="251" src="https://blog.finxter.com/wp-content/uploads/2020/12/image-27.png" alt="" class="wp-image-18403" srcset="https://blog.finxter.com/wp-content/uploads/2020/12/image-27.png 346w, https://blog.finxter.com/wp-content/uplo...00x218.png 300w, https://blog.finxter.com/wp-content/uplo...50x109.png 150w" sizes="(max-width: 346px) 100vw, 346px" /><figcaption><strong>Figure 3: </strong>Representation of the two splines defined through the function LSQUnivariateSpline. Both the splines pass for the previously specified knots.</figcaption></figure>
</div>
<h2>Conclusion</h2>
<p>To conclude, in this article, we explored spline functions, their power and versatility. </p>
<p>One thing that is important to keep in mind is that when we are using splines for fitting and interpolating a given set of data points, we should never exceeds with the degree of the polynomials that define the spline; this is to avoid unwanted errors and incorrect interpretation of the initial data. </p>
<p>The process has to be accurately refined, possibly through repetitive iterations to double check the validity of the generated output.</p>
<p>The post <a href="https://blog.finxter.com/fitting-data-with-scipys-univariatespline-and-lsqunivariatespline/" target="_blank" rel="noopener noreferrer">Fitting Data With Scipy’s UnivariateSpline() and LSQUnivariateSpline()</a> first appeared on <a href="https://blog.finxter.com/" target="_blank" rel="noopener noreferrer">Finxter</a>.</p>
</div>


https://www.sickgaming.net/blog/2020/12/...atespline/
Reply



Forum Jump:


Users browsing this thread:
3 Guest(s)

Forum software by © MyBB Theme © iAndrew 2016