How To Create Visualisations Using Matplotlib
A Comprehensive Guide to Mastering Data Visualisation with Matplotlib
Data visualisation is a very important process in the Data Analysis circle because it helps to present findings in the data and tell the story behind the data. Data visualisation is important because it helps in the presentation of findings and solutions to non-technical stakeholders and various audiences about a particular dataset.
Let’s find out how to use Matplotlib, which is a Python library, to tell stories and visualise data.
Introduction
Matplotlib is a Python library used by data analysts and scientists for the sole purpose of visualising data. The Matplotlib library is popular among data analysts because it is easy to use, open-source, and has a very large and vibrant community. Matplotlib can also be used with other libraries, such as the Seaborn library, which is also a visualisation library and was built upon the Matplotlib library.
Overview of Matplotlib
Matplotlib was created by John Hunter in 2002. It is a Python library built on NumPy arrays that can work across all platforms (Windows, Mac, and Linux).
Installation
The Matplotlib library can be installed using the code below.
After installing the Matplotlib library, for it to be used it has to be imported as displayed below;
Creating Basic Plots with Matplotlib
Various kinds of plots can be created using the Matplotlib library; we are going to start with the basic and commonly used plots before moving on to more advanced plots.
Bar plot
A bar plot, also known as a bar chart, is a commonly used graph that is mainly used to represent the values of categorical data. A bar plot will be created using the Matplotlib library; the dataset used is dummy data from a fruit-selling store, and this displays the number of fruits sold in a particular season.
The plt.bar code is used to create the bar plot, with the fruit data on the x-axis and the number of fruits sold on the y-axis. The plt.show() must be used after the creation of any Matplotlib graph because this allows the graph to be displayed. Below is a simple bar plot created using Matplotlib.
Histogram
A histogram is also a commonly used graph that displays the frequency of continuous data. It is also used to illustrate the distribution of data. A histogram can be created using Matplotlib; an example is shown below.
The plt.hist code is used to create the histogram using the Matplotlib library; the data used is inputted into the code, and the number of bins, which is at the discretion of the user, is also inputted. Below is a simple histogram created using the Matplotlib library.
Line plots
A line plot is a graph that is used to display data over a specific period. A line plot makes use of dots connected by lines to display data. A line plot that shows the fluctuation in the stock price of a company in the first half of the year is shown below.
The plt.plot code is used to create the line plot, with the marker o being used to denote what the data points would look like, and this can be changed to whatever shape the user desires.
Scatterplot
The scatterplot is a graph used to illustrate the relationships between two numerical variables. In a scatterplot, it is common to see the formation of patterns if there is a relationship between the two variables that are being plotted. A simple scatterplot created with the Matplotlib library is shown below.
The plt.scatter code is used to create the simple scatterplot, which is displayed below.
There is no recognisable pattern in the scatterplot above, which means that a scatterplot does not necessarily need to have a pattern.
Plot Customization
In the simple plots created earlier, we can see that they all have the default colours of the Matplotlib library and are not properly labelled and titled, so in this segment, some of the plots created earlier would be customised and labelled.
Customising Plot Appearance
The Matplotlib library allows its users the freedom to customise their plots as they deem fit, which includes colour, patterns, gridlines, and shapes for denoting individual points. The scatterplot and bar plot created earlier would be used as examples for this segment.
The scatterplot is customised as shown below:
In the code above, it is shown that the size of each data point has been increased (s = 70), the colour of each data point has been changed (c= = r’), the shape of each data point has also been changed (marker = ‘*’), and the transparency of the data points has also been changed (alpha = 0.5) And finally, gridlines were included in the scatterplot. After customising the previously created scatterplot, the newly created scatterplot is displayed below:
The bar plot is customised as shown below:
In the code above, a different colour was created for each bar (colours = ['green', 'orange', '#adff2f', '#ffae42', '#ebe900']) and applied to the bar plot. The label of the x-axis was also rotated (plt.xticks(rotation = 60)), and the value of each bar was also placed on each of the bars.
Adding Labels and Titles
In this segment, the plots customised earlier will be properly labelled and titled.
The scatterplot is labelled as shown below:
The x-axis was labelled and customised using the plt.xlabel code; the y-axis was labelled and customised using the plt.ylabel code; and the graph itself was labelled and customised using the plt.title.
The properly labelled scatterplot is displayed below.
The bar plot is labelled as shown below:
The x-axis was labelled using the plt.xlabel code, the y-axis was labelled using the plt.ylabel code, and the graph itself was labelled using the plt.title.
Advanced Plots with Matplotlib
Apart from the basic plots that were discussed earlier in this article, the Matplolib library can also be used to create more complex plots, such as 3D plots and multiplots.
3D plots
The Matplotlib library can be used to create various kinds of 3D plots, such as 3D scatterplots, surface plots, volumetric plots, and wireframe plots.
3D scatterplot
The 3D scatterplot is a graph created using cartesian coordinates to illustrate the characteristics of three numerical variables. The 3D version of the previously created simple scatterplot can be created using the same Matplotlib library.
The 3D scatterplot is created as shown below:
Multiplot
Multiplot is when more than one type of graph or the same type of graph is combined into one graph. It can be plotted using either the same values or different values. It can be used when trying to show the difference between a set of values and another set of values or telling a different story with different graphs with the same values.
A Multiplot created using the Matplotlib library is shown below.
The Multiplot created above uses the same values to represent the data in different forms.
Conclusion
In this article, we learned how to create visually appealing visualisations using the Matplolib library. Various graphs were created, ranging from simple graphs like the bar plot to advanced graphs such as the 3D scatterplot, and these graphs can also be customised to your needs.