Numerical × Numerical Data
Categorical Lines and Markers
Line plot and scatter plot use numerical values for both x and y axes. In this case, the plot is categorized by such as color, marker symbol, etc.
from whitecanvas import new_canvas
# sample data
df = {
"label": ["A"] * 5 + ["B"] * 5,
"x": [0, 1, 2, 3, 4, 0, 1, 2, 3, 4],
"y": [3, 1, 2, 4, 3, 5, 3, 3, 1, 2],
"some-info": ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"],
}
By setting color=
to one of the column name, lines are split by the column and
different colors are used for each group.
canvas = new_canvas("matplotlib")
canvas.cat(df, "x", "y").add_line(color="label")
canvas.show()
By setting style=
, different line styles are used instead. In the following example,
color="black"
means that all the lines should be the same color (black).
canvas = new_canvas("matplotlib")
canvas.cat(df, "x", "y").add_line(color="black", style="label")
canvas.show()
In the case of markers, you can use symbols to distinguish groups.
canvas = new_canvas("matplotlib")
canvas.cat(df, "x", "y").add_markers(symbol="label")
canvas.show()
The layers implement hover texts by default, based on the input data frame.
canvas = new_canvas("plotly", size=(400, 300))
canvas.cat(df, "x", "y").add_markers(color="label")
canvas.show()
Group-wise line regression can be easily added by with_reg
method.
canvas = new_canvas("matplotlib")
(
canvas.cat(df, "x", "y")
.add_markers(color="label")
.with_reg()
)
canvas.show()
Automatic Creation of Legends
As mentioned in Legend for the Layers, legends can be
automatically created by add_legend
function. In the case of the categorical plot,
the legend is created based on the categories.
canvas = new_canvas("matplotlib")
canvas.cat(df, "x", "y").add_line(color="label")
canvas.add_legend()
canvas.show()
Distribution of Numerical Data
There are several ways to visualize the distribution of numerical data.
- Histogram
- Kernel Density Estimation (KDE)
These representations only use one array of numerical data. Therefore, either x
or y
should be empty in the cat
method.
import numpy as np
rng = np.random.default_rng(12345)
# sample data
steps = np.array([0] * 60 + [3] * 30 + [6] * 40)
df = {
"label": ["A"] * 60 + ["B"] * 30 + ["C"] * 40,
"X": rng.normal(loc=0.0, size=130) + steps,
"Y": rng.normal(loc=1.0, size=130) + steps / 2,
}
x="X"
means that the x-axis being "X" and the y-axis being the count.
Arguments forwards to the histogram
method of numpy
.
canvas = new_canvas("matplotlib")
canvas.cat(df, x="X").add_hist(bins=10)
canvas.show()
To transpose the histogram, use y="X"
.
canvas = new_canvas("matplotlib")
canvas.cat(df, y="X").add_hist(bins=10)
canvas.show()
Histograms can be grouped by color.
canvas = new_canvas("matplotlib")
canvas.cat(df, x="X").add_hist(bins=10, color="label")
canvas.show()
If both x
and y
are set, the plotter cannot determine which axis to use. To tell
the plotter which axis to use, call along_x()
or along_y()
to restrict the
dimension.
canvas = new_canvas("matplotlib")
# canvas.cat(df, x="label", y="X").add_hist(bins=10) # This will raise an error
canvas.cat(df, x="label", y="X").along_y().add_hist(bins=10)
canvas.show()
KDE can be similarly added.
canvas = new_canvas("matplotlib")
canvas.cat(df, x="X").add_kde(color="label")
canvas.show()
2-dimensional histogram can be added by add_hist2d
.
canvas = new_canvas("matplotlib")
canvas.cat(df, x="X", y="Y").add_hist2d(bins=(8, 10), color="label")
canvas.show()
and similarly, 2-dimensional KDE can be added by add_kde2d
.
canvas = new_canvas("matplotlib")
canvas.cat(df, x="X", y="Y").add_kde2d(color="label")
canvas.show()
Note
add_hist
and add_hist2d
returns completely different objects (histogram and
heatmap) and they are configured by different arguments. That's why whitecanvas
split them into two different methods.