Say we would like to create a histogram to see how petal length varies in iris flowers. We can do this with the sns.distplot command.
We customize the behavior of the command with two additional pieces of information:
a= chooses the column we'd like to plot (in this case, we chose 'Petal Length (cm)').
kde=False is something we'll always provide when creating a histogram, as leaving it out will create a slightly different plot.
The next type of plot is a kernel density estimate (KDE) plot. In case you're not familiar with KDE plots, you can think of it as a smoothed histogram.
To make a KDE plot, we use the sns.kdeplot command. Setting shade=True colors the area below the curve (and data= has identical functionality as when we made the histogram above).
We're not restricted to a single column when creating a KDE plot. We can create a two-dimensional (2D) KDE plot with the sns.jointplot command.
In the plot below, the color-coding shows us how likely we are to see different combinations of sepal width and petal length, where darker parts of the figure are more likely.
For the next part of the tutorial, we'll create plots to understand differences between the species. To accomplish this, we begin by breaking the dataset into three separate files, with one for each species.
In this case, the legend does not automatically appear on the plot. To force it to show (for any plot type), we can always use plt.legend().