Open In Colab

Python Class To Get Descriptive Statistics#

I'm going to create a python class that allows us to reference an object and that its attributes are information related to the main descriptive statistics of a sample of n observations. This is an easy exercise to identify how the logic of classes, objects, functions and attributes works in python.

First of all it is important to identify the formulas of all the descriptive statistics that we are going to build in code using python. It is also important to say that the main idea of this exercise is not to use any pre-existing Python modules.

What Are We Going To Calculate?#

Let's compute the following basic descriptive statistics: the mean, standard deviation, variance, median, kurtosis, skewness, and coefficient of variation. The formulas for those descriptive statistics are as follows:

Mean:

\[\bar{x} = \frac{\sum_{i=1}^{n} x_{i}}{n}\]

Standard Deviation:

\[\sigma = \sqrt{\frac{\sum_{i=1}^{n} (x_{i} - \bar{x})^2}{n-1}}\]

Variance:

\[\sigma^2 = \frac{\sum_{i=1}^{n} (x_{i} - \bar{x})^2}{n-1}\]

Kurtosis:

\[k = \frac{\sum_{i=1}^{n} (x_{i} - \bar{x})^4}{n *\sigma^4}\]

Skewness:

\[k = \frac{\sum_{i=1}^{n} (x_{i} - \bar{x})^3}{n *\sigma^3}\]

Coefficient of Variation:

\[Cv = \frac{\sigma}{|\bar{x}|}\]

Now that we have the formulas of what we want to calculate, we simply have to set up the class and start creating methods for each statistic, even though we could also generate them as attributes of the object.

class statistics:
  
  def __init__(self,lst):
    self.lst = lst
    return None

  def mean(self): 
    return sum(self.lst) / len(self.lst)
  
  def stand_dev(self):
    return (sum([(i - self.mean())**2 for i in self.lst]) / (len(self.lst)-1))**0.5  

  def variance(self):
    return self.stand_dev()**2

  def median(self):
    return (sorted(self.lst)[len(self.lst)//2] if len(self.lst)%2 != 0 
            else (sorted(self.lst)[len(self.lst)//2-1]+sorted(self.lst)[len(self.lst)//2]) / 2)
  
  def kurtosis(self):
    return  sum([(i - self.mean())**4 for i in self.lst]) / (len(self.lst)*self.stand_dev()**4)

  def Skewness(self):
    return  sum([(i -self.mean())**3 for i in self.lst]) / (len(self.lst)*self.stand_dev()**3)
  
  def coeff_variation(self):
    return self.stand_dev() / abs(self.mean())

Let’s Test The Class#

That's it, our class has different methods that allow the object to calculate each statistic, let's try it!. To test our class we are going to generate a normal distribution using numpy, which has approximately mean 10, standard deviation 2.5 and with n equal to 100

import numpy as np
# Let's create the array
lst = np.random.normal(10,2.5,1000)

my_object = statistics(lst) # Nos let's create the object and use the methods

print(f'Mean = {my_object.mean()}')
print(f'Standard Deviation = {my_object.stand_dev()}')
print(f'Variance = {my_object.variance()}')
print(f'Median = {my_object.median()}')
print(f'Kurtosis = {my_object.kurtosis()}')
print(f'Skewness = {my_object.Skewness()}')
print(f'Coefficient of Variation = {my_object.coeff_variation()}')
Mean = 10.04895481625352
Standard Deviation = 2.477276689127902
Variance = 6.1368997944965
Median = 9.998811286033765
Kurtosis = 3.1259615742183056
Skewness = -0.025865929977696902
Coefficient of Variation = 0.24652083071575473

Now that we know some descriptive statistics of this sample we can additionally visualize a BoxPlot of the sample

import seaborn as sns 
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(7,3))
plt.suptitle('BoxPlot Sample')
sns.boxplot(x=lst,palette='rainbow')
plt.axvline(my_object.mean(),label='Mean',linestyle='--')
plt.legend()
plt.show()
../_images/descriptive_statistics_6_0.png

Methods or Attributes…#

Now we are going to build the same class but we change things a bit... The idea now is that each descriptive statistic is configured as an attribute of the object and not as a method

class statistics:
  
  def __init__(self,lst):
    self.lst = lst
    # Let's create more attributes 
    self.mean = sum(self.lst) / len(self.lst)
    self.stand_dev = (sum([(i - self.mean)**2 for i in self.lst]) / (len(self.lst)-1))**0.5  
    self.variance = self.stand_dev**2
    self.median = (sorted(self.lst)[len(self.lst)//2] if len(self.lst)%2 != 0 
            else (sorted(self.lst)[len(self.lst)//2-1]+sorted(self.lst)[len(self.lst)//2]) / 2)
    self.kurtosis = sum([(i - self.mean)**4 for i in self.lst]) / (len(self.lst)*self.stand_dev**4)
    self.skewness = sum([(i -self.mean)**3 for i in self.lst]) / (len(self.lst)*self.stand_dev**3) 
    self.coeff_variation = self.stand_dev / abs(self.mean)

  # Now let's create a summarize method 
  def summarize(self):
    print(f'Mean = {self.mean}')
    print(f'Standard Deviation = {self.stand_dev}')
    print(f'Variance = {self.variance}')
    print(f'Median = {self.median}')
    print(f'Kurtosis = {self.kurtosis}')
    print(f'Skewness = {self.skewness}')
    print(f'Coefficient of Variation = {self.coeff_variation}')
  
my_object = statistics(lst)
my_object.summarize()
Mean = 10.04895481625352
Standard Deviation = 2.477276689127902
Variance = 6.1368997944965
Median = 9.998811286033765
Kurtosis = 3.1259615742183056
Skewness = -0.025865929977696902
Coefficient of Variation = 0.24652083071575473

Some Conclusions#

Understanding the differences between concepts such as methods and attributes is important if you want to start building more complex classes and packages. Both methods and attributes have their special features and can be used in certain ways. I invite you to think and create classes with python for any kind of thing that comes to mind.