Matplotlib.pyplot
¶Matplotlib
is considered by many as the most basic plotting library in Python.pandas
are built on top of Matplotlib
making it very fundamental for data scientists programming in Python.Matplotlib
library.conda install matplotlib
pip install matplotlib
import matplotlib.pyplot as plt
'eruptions'
: eruption time (in mins)'waiting'
: waiting time to next eruption (in mins)import pandas as pd
faithful = pd.read_csv('https://stat430.hknguyen.org/files/datasets/faithful.csv')
faithful.head()
eruptions | waiting | |
---|---|---|
0 | 3.600 | 79 |
1 | 1.800 | 54 |
2 | 3.333 | 74 |
3 | 2.283 | 62 |
4 | 4.533 | 85 |
# histogram of waiting time
plt.hist(faithful['waiting'])
plt.show()
plt.hist(faithful['waiting'], color='darkorange')
plt.show()
plt.xlabel()
:plt.hist(faithful['waiting'], color='darkorange')
plt.xlabel('Waiting time to the next eruption (in mins)')
plt.show()
plt.ylabel()
to add a y-axis label:plt.hist(faithful['waiting'], color='darkorange')
plt.xlabel('Waiting time to the next eruption (in mins)')
plt.ylabel('Count')
plt.show()
plt.hist(faithful['waiting'], color='darkorange', bins=50)
plt.show()
plt.hist(faithful['waiting'], color='darkorange', bins=5)
plt.show()
plt.hist(faithful['waiting'], color='darkorange')
plt.xlabel('Waiting time to the next eruption (in mins)')
plt.ylabel('Count')
plt.grid(color='lightgrey', linewidth=0.5)
plt.show()
plt.boxplot()
to plot a boxplot:# Boxplot of waiting time
plt.boxplot(faithful['waiting'])
plt.show()
plt.boxplot(faithful['waiting'], vert=False)
plt.show()
plt.boxplot(faithful['waiting'], labels=['waiting'])
plt.ylabel('Time (in mins)')
plt.grid(color='lightgrey', linewidth=0.5)
plt.show()
'Sepal.Length'
: the sepal length in cm.'Sepal.Width'
: the sepal width in cm.'Petal.Length'
: the petal length in cm.'Petal.Width'
: the petal width in cm.'Species'
: the specific Iris specie ('setosa', 'versicolor', 'virginica').iris = pd.read_csv('https://stat430.hknguyen.org/files/datasets/iris.csv')
iris
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
... | ... | ... | ... | ... | ... |
145 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
146 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
147 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
148 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
149 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
150 rows × 5 columns
plt.boxplot([iris['Sepal.Length'], iris['Sepal.Width'], iris['Petal.Length'], iris['Petal.Width']])
plt.grid(color='lightgrey', linewidth=0.5)
plt.show()
plt.boxplot([iris['Sepal.Length'], iris['Sepal.Width'], iris['Petal.Length'], iris['Petal.Width']],
labels=['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width'])
plt.ylabel('Measurement (in cm)')
plt.grid(color='lightgrey', linewidth=0.5)
plt.show()
boxplot()
function provided by pandas
which improves the syntax significantly.Matplotlib
using plt.scatter()
:plt.scatter(x=iris['Sepal.Length'], y=iris['Sepal.Width'])
plt.show()
plt.scatter(x=iris['Sepal.Length'], y=iris['Sepal.Width'], color='darkorange')
plt.xlabel('Sepal Length (in cm)')
plt.ylabel('Sepal Width (in cm)')
plt.title('Iris Sepal Length vs. Sepal Width')
plt.grid(color='lightgrey', linewidth=0.5)
plt.show()
Matplotlib
Inside pandas
¶pandas
library are built upon the functions provided by the Matplotlib
library.faithful['waiting'].plot.hist()
<AxesSubplot:ylabel='Frequency'>
faithful['waiting'].hist()
<AxesSubplot:>
pandas
plotting functions is extremely useful when we want to layer out plots (histograms in this case).iris.plot.hist(alpha=0.5)
<AxesSubplot:ylabel='Frequency'>
plot.box()
or boxplot()
to plot boxplot(s) of column(s) of a DataFrame.iris.plot.box()
<AxesSubplot:>
iris.boxplot()
<AxesSubplot:>
# Scatterplot
iris.plot.scatter(x='Sepal.Length', y='Sepal.Width')
<AxesSubplot:xlabel='Sepal.Length', ylabel='Sepal.Width'>
plot()
and set the kind
keyword to be 'scatter'
for scatter plots.iris.plot(x='Sepal.Length', y='Sepal.Width', kind='scatter')
<AxesSubplot:xlabel='Sepal.Length', ylabel='Sepal.Width'>
Matplotlib.pyplot
and use the functions provided by Pyplot.iris.plot.scatter(x='Sepal.Length', y='Sepal.Width', color='darkorange')
plt.xlabel('Sepal Length (in cm)')
plt.ylabel('Sepal Width (in cm)')
plt.title('Iris Sepal Length vs. Sepal Width')
plt.grid(color='lightgrey', linewidth=0.5)
plt.show()