Factor Pandey’s hair, Data Science at H2O.ai | Editor @wicds.

Infrared vector created by macrovector – www.freepik.com.

Data retrieval is by far one of the most important aspects in data analysis tasks. Preliminary inspections and preliminary inspections, which we perform using an extensive list of visualization tools, give us practical insights into the nature of the data. However, the choice of visualization tool is sometimes more complex than the task itself. On the other hand, we have libraries that are easier to use but not as useful for displaying complex relationships in data. Then there are others who make interactivity but have a remarkable learning curve. Fortunately, some open source libraries have been created that are trying to address this pain point effectively.

This article looks at two such libraries, namely pandas_bokeh and cufflinks. We learn to create plot and bokeh diagrams with the basic syntax of pandas, which we are all happy with. Because the article emphasizes syntax rather than plot types, we limit ourselves to five basic maps, namely, line charts, bar charts, histograms, scatter charts, and pie charts. We first create each of these diagrams with the Panda drawing library and then create them with plot and bokeh, albeit by rotation.

Import the data set

We work NIFTY-50 dataset. The NIFTY 50 index is National Stock Exchange of India Indian stock market benchmark. The material is openly available Kaggle, but we use a subset of data that includes the value of shares in only four sectors, such as banking, pharmaceuticals, IT, and FMCG.

You can download the sample data set from here.

Import the necessary libraries and datasets needed for visualization:

# Importing required modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Reading in the data
nifty_data = pd.read_csv('NIFTY_data_2020.csv',parse_dates=["Date"],index_col="Date")
nifty_data.head()


A combined data framework consisting of NIFTY indices for the banking, pharmaceutical, IT and FMCG sectors.

We can also compile / compile the data by the end of the month. The Pandas Library has sample () function that re-collects time series data.

nifty_data_resample = nifty_data.resample(rule="M").mean()
nifty_data_resample


Now that our data frames are ready, it’s time to visualize them through different plots.

Design with Pandos directly

Let’s start with the most direct drawing technique – pandas drawing functions. If you want to draw a chart using pandas, we call .plot () method in a frame.

Syntax: dataframe.plot ()

plot the method is just a simple wrapper around the matplotlib plt.plot (). We can also specify some additional parameters, such as those listed below:

Some of the important Parameters
--------------------------------

x : label or position, default None
    Only used if data is a DataFrame.
y : label, position or list of label, positions, default None

title: title to be used for the plot

X and y label: Name to use for the label on the x-axis and y-axis.

figsize : specifies the size of the figure object.    
kind : str
    The kind of plot to produce:

    - 'line' : line plot (default)
    - 'bar' : vertical bar plot
    - 'barh' : horizontal bar plot
    - 'hist' : histogram
    - 'box' : boxplot
    - 'kde' : Kernel Density Estimation plot
    - 'density' : same as 'kde'
    - 'area' : area plot
    - 'pie' : pie plot
    - 'scatter' : scatter plot
    - 'hexbin' : hexbin plot.


For a complete list of parameters and their use, see documentation. Let’s now look at ways to create different plots. In this article, we will not explain each plot in detail. We will only focus on syntax, which goes without saying if you have some experience with pandas. To get a detailed idea of ​​the plots of the pandas, This article is useful.

  1. Line chart
nifty_data.plot(title="Nifty Index values in 2020", 
                xlabel="Values",
                figsize=(10,6);



Line diagram with drawing pandas.

  1. Scatterplot
nifty_data.plot(kind='scatter',
        x='NIFTY FMCG index', 
        y='NIFTY Bank index',
        title="Scatter Plot for NIFTY Index values in 2020",
        figsize=(10,6));


Scatterplot by drawing pandas.

  1. Histograms
nifty_data[['NIFTY FMCG index','NIFTY Bank index']].plot(kind='hist',figsize=(9,6), bins=30);


Histogram in which pandas are drawn.

  1. Bar plots
nifty_data_resample.plot(kind='bar',figsize=(10,6));


The bar area dates back to pandas.

nifty_data_resample.plot(kind='barh',figsize=(10,6));


Stacked bar plot with panda drawing.

  1. Drawing tables
nifty_data_resample.index=['Jan','Feb','March','Apr','May','June','July']
nifty_data_resample['NIFTY Bank index'].plot.pie(legend=False, figsize=(10,6),autopct="%.1f");


Pie chart of drawing pandas.

These were some charts that can be created directly with panda data frames. However, these charts lack interactivity and features such as zooming and panning. Let us now exchange these existing diagrams in syntax for their fully interactive equivalents with only a small change in syntax.

Bokeh wallpaper for pandas – drawing with Pandas-Bokeh.

Image by the author.

bokeh the library is clearly distinguished from data visualization. Pandas-Bokeh provides a background for bokeh drawing Pandat, GeoPandasand Pyspark DataFrames. This backend adds the plot_bokeh () method to the DataFrames and Series.

Installation

Pandas-Bokeh can be installed PyPI through pip or conda:

pip install pandas-bokeh

or

conda install -c patrikhlobil pandas-bokeh


Use

The Pandas-Bokeh library should be brought after Pandas, GeoPandas or Pyspark.

import pandas as pd
import pandas_bokeh


You must then define a drawing result, which can be either of two:

pandas_bokeh.output_notebook() # for embedding plots in Jupyter Notebooks.
pandas_bokeh.output_file(filename) # for exporting plots as HTML.


Syntax

Now the drawing API is available for the Pandas DataFrame dataframe.plot_bokeh ().

For more information on drawing outputs, see the reference here or the Bokeh documentation. Now draw all five different plots drawn in the previous section. We use the same materials as above.

import pandas as pd
import pandas_bokeh
pandas_bokeh.output_notebook()


  1. Line chart
nifty_data.plot_bokeh(kind='line') #equivalent to nifty_data.plot_bokeh.line()


Line circle with pandas_bokeh.

  1. Scatterplot
nifty_data.plot_bokeh.scatter(x='NIFTY FMCG index', y='NIFTY Bank index');


Break the plot with pandas_bokeh.

  1. Histograms
nifty_data[['NIFTY FMCG index','NIFTY Bank index']].plot_bokeh(kind='hist', bins=30);


Histogram with pandas_bokeh.

  1. Bar plots
nifty_data_resample.plot_bokeh(kind='bar',figsize=(10,6));


Bar plot with pandas_bokeh.

nifty_data_resample.plot_bokeh(kind='barh',stacked=True);


Stacked plot with pandas_bokeh.

  1. Drawing tables
nifty_data_resample.index=['Jan','Feb','March','Apr','May','June','July']
nifty_data_resample.plot_bokeh.pie(y ='NIFTY Bank index')


Pie chart with pandas_bokeh.

In addition, you can create multiple nested drawing curves for the same plot:

nifty_data_resample.plot_bokeh.pie()


Nested pie chart with pandas_bokeh.

This section saw how we could seamlessly create bokeh curves without significant changes in the syntax for drawing pandas. Now we have the best of both worlds without having to learn any new form.

Plandly Backend for Pandas – drawing with cufflinks.

Image by the author.

Another commonly used library for data visualization is Scheduled. The plot allows you to create interactive charts in Python, R, and JavaScript. As of version 4.8, Plotly released a Lots of Express power background for drawing Panda, which meant not even having to bring a line to create neat-like visualizations.

However, the library I want to mention here is not a direct expression, but an independent third-party wrapper library called around Plotly. Cufflinks. The beauty of cufflinks is that it is more versatile, has more functions, and has a similar API to drawing pandas. This means you just need to add a .iplot () method for drawing diagrams in Pandas data frames.

Installation

Make sure you have installed the plans before installing the cufflinks. Read This instructions.

Use

archive there are lots of useful examples and notebooks to get you started.

import pandas as pd
import cufflinks as cf
from IPython.display import display,HTML
#making all charts public and setting a global theme
cf.set_config_file(sharing='public',theme="white",offline=True)


That’s all. We can now create visualizations with the power of a plot, but with the ease of pandas. The only change in syntax is dataframe.iplot ().

  1. Line chart
nifty_data.iplot(kind='line')


Line map with cufflinks.

  1. Scatterplot

You need to mention drawing mode for the scatter line while creating the scatter plot. The space can be lines, characters, text, or a combination of either.

nifty_data.iplot(kind='scatter',x='NIFTY FMCG index', y='NIFTY Bank index',mode="markers");


Scatterplot with cufflinks.

  1. Histograms
nifty_data[['NIFTY FMCG index','NIFTY Bank index']].iplot(kind='hist', bins=30);


Histogram with cufflinks.

  1. Bar plots
nifty_data_resample.iplot(kind='bar');


Barplot with cufflinks.

nifty_data_resample.iplot(kind='barh',barmode="stack");


Stacked plots with cufflinks.

  1. Drawing tables
nifty_data_resample.index=['Jan','Feb','March','Apr','May','June','July']
nifty_data_resample.reset_index().iplot(kind='pie',labels="index",values="NIFTY Bank index")


Circular diagrams with cufflinks.

The Cufflinks Library provides an easy way to get the power of a plot inside a plot. Syntax similarity is another advantage.

Conclusion

The Bokeh or Plotly plot is self-sufficient in conveying all the information. Based on your choices and preferences, you can choose both or either while focusing on the primary goal of making the visualization more intuitive and interactive at the same time.

Original. Re-posted with permission.

Related to:

LEAVE A REPLY

Please enter your comment!
Please enter your name here