Welcome to altair_recipes’s documentation!¶
Introduction to altair_recipes¶
A collection of ready-made statistical graphics for vega.¶
vega
is a statistical graphics system for the web, meaning the plots are displayed in a browser. As an added bonus, it adds interactions, again through web technologies: select data point, reveal information on hover etc. Interaction and the web are clearly the future of statistical graphics. Even the successor to the famous ggplot
for R, ggvis
is based on vega
.
altair
is a python package that produces vega
graphics. Like vega
, it adopts an approach to describing statistical graphics known as grammar of graphics which underlies other well known packages such as ggplot
for R. It represents a extremely useful compromise of power and flexibility. Its elements are data, marks (points, lines), encodings (relations between data and marks), scales etc.
Sometimes we want to skip all of that and just produce a boxplot (or heatmap or histogram, the argument is the same) by calling:
boxplot(data.iris(), columns="petalLength", group_by="species")
because:
- It’s a well known type of statistical graphics that everyone can recognize and understand on the fly.
- Creativity is nice, in statistical graphics as in many other endeavors, but dangerous: there are more bad charts out there than good ones. The grammar of graphics is no insurance.
- While it’s simple to put together a boxplot in
altair
, it isn’t trivial: there are rectangles, vertical lines, horizontal lines (whiskers), points (outliers). Each element is related to a different statistics of the data. It’s about 30 lines of code and, unless you run them, it’s hard to tell you are looking at a boxplot. - One doesn’t always need the control that the grammar of graphics affords. There are times when I need to see a boxplot as quick as possible. Others, for instance preparing a publication, when I need to control every detail.
The boxplot is not the only example. The scatterplot, the quantile-quantile plot, the heatmap are important idioms that are battle tested in data analysis practice. They deserve their own abstraction. Other packages offering an abstraction above the grammar level are:
seaborn
and the graphical subset ofpandas
, for example, both provide high level statistical graphics primitives (higher than the grammar of graphics) and they are quite successful (but not web-based).ggplot
, even if named after the Grammar of Graphics, slipped in some more complex charts, pretending they are elements of the grammar, such asgeom_boxplot
, because sometimes even R developers are lazy. But a boxplot is not a geom or mark. It’s a combination of several ones, certain statistics and so on. I suspect the authors ofaltair
know better than mixing the two levels.
altair_recipes
aims to fill this space above altair
while making full use of its features. It provides a growing list of “classic” statistical graphics without going down to the grammar level. At the same time it is hoped that, over time, it can become a repository of examples and model best practices for altair
, a computable form of its gallery.
There is one more thing. It’s nice to have all these famous chart types available at a stroke of the keyboard, but we still have to decide which type of graphics to use and, in certain cases, the association between variables in the data and channels in the graphics (what becomes coordinate, what becomes color etc.). It still is work and things can still go wrong, sometimes in subtle ways. Enter autoplot
. autoplot
inspects the data, selects a suitable graphics and generates it. While no claim is made that the result is optimal, it will make reasonable choices and avoid common pitfalls, like overlapping points in scatterplots. While there are interesting research efforts aimed at characterizing the optimal graphics for a given data set, their goal is more ambitious than just selecting from a repertoire of pre-defined graphics types and they are fairly complex. Therefore, at this time autoplot
is based on a set of reasonable heuristics derived from decades of experience such as:
- use stripplot and scatterplot to display continuous data, barcharts for discrete data
- use opacity to counter mark overlap, but not with discrete color maps
- switch to summaries (count and averages) when the amount of overlap is too high
- use facets for discrete data
autoplot
is work in progress and perhaps will always be and feedback is most welcome. A large number of charts generated with it is available at the end of the Examples page and should give a good idea of what it does. In particular, in this first iteration we do not make any attempt to detect if a dataset represents a function or a relation, hence scatterplots are preferred over line plots. Moreover there is no special support for evenly spaced data, such as a time series.
Features¶
- Free software: BSD license.
- Fully documented.
- Highly consistent API enforced with autosig
- Near 100% regression test coverage.
- Support for dataframe and vector inputs
- Support for both wide and long dataframe formats.
- Data can be provided as a dataframe or as a URL pointing to a csv or json file.
- All charts produced are valid
altair
charts, can be modified, combined, saved, served, embedded exactly as one.
Examples¶
These examples are taken unedited from the test suite. Look at the body of each test to see howaltair_recipes
can be used.
import altair as alt
import altair_recipes as ar
from altair_recipes.common import viz_reg_test
from altair_recipes.display_pweave import show_test
from vega_datasets import data
Areaplot
@viz_reg_test
def test_areaplot():
return alt.vconcat(
*map(
lambda stack: ar.areaplot(
data.iowa_electricity(),
x="year",
y="net_generation",
color="source",
stack=stack,
),
ar.StackType,
)
)
show_test(test_areaplot)
import altair_recipes as ar
from altair_recipes.common import viz_reg_test
from altair_recipes.display_pweave import show_test
import numpy as np
import pandas as pd
Autocorrelation
@viz_reg_test
def test_autocorrelation():
data = pd.DataFrame(dict(x=np.random.uniform(size=100)))
return ar.autocorrelation(data, column="x", max_lag=15)
show_test(test_autocorrelation)
import altair_recipes as ar
from altair_recipes.common import viz_reg_test
from altair_recipes.display_pweave import show_test
from vega_datasets import data
Barchart
@viz_reg_test
def test_barchart_color():
source = data.barley()
return ar.barchart(source, x="year", y="mean(yield)", color=True)
show_test(test_barchart_color)
import altair_recipes as ar
from altair_recipes.common import viz_reg_test
from altair_recipes.display_pweave import show_test
from vega_datasets import data
Boxplot from melted data
@viz_reg_test
def test_boxplot_melted():
return ar.boxplot(data.iris(), columns=["petalLength"], group_by="species")
show_test(test_boxplot_melted)
Boxplot from cast data
@viz_reg_test
def test_boxplot_cast():
iris = data.iris()
return ar.boxplot(iris, columns=list(iris.columns[:-1]))
show_test(test_boxplot_cast)
Boxplot with color
@viz_reg_test
def test_boxplot_color():
source = data.barley()
return ar.boxplot(
source,
columns=["yield"],
group_by="year",
color=True,
width=800 // len(source["site"].unique()),
).facet(column="site")
show_test(test_boxplot_color)
import altair_recipes as ar
from altair_recipes.common import viz_reg_test
from altair_recipes.display_pweave import show_test
import numpy as np
import pandas as pd
from vega_datasets import data
Heatmap
@viz_reg_test
def test_heatmap():
# Compute x^2 + y^2 across a 2D grid
x, y = np.meshgrid(range(-5, 6), range(-5, 6))
z = x ** 2 + y ** 2
# Convert this grid to columnar data expected by Altair
data = pd.DataFrame({"x": x.ravel(), "y": y.ravel(), "z": z.ravel()})
return ar.heatmap(data, x="x", y="y", color="z")
show_test(test_heatmap)
Count Heatmap
@viz_reg_test
def test_count_heatmap():
source = data.movies.url
return ar.heatmap(
source, x="IMDB Rating", y="Rotten Tomatoes Rating", color="", aggregate="count"
)
show_test(test_count_heatmap)
import altair_recipes as ar
from altair_recipes.common import viz_reg_test, gather
from altair_recipes.display_pweave import show_test
import numpy as np
import pandas as pd
from vega_datasets import data
Histogram
@viz_reg_test
def test_histogram():
return ar.histogram(data.movies(), column="IMDB Rating")
show_test(test_histogram)
Layered Histogram from wide data
@viz_reg_test
def test_layered_histogram_wide():
df = pd.DataFrame(
{
"Trial A": np.random.normal(0, 0.8, 1000),
"Trial B": np.random.normal(-2, 1, 1000),
"Trial C": np.random.normal(3, 2, 1000),
}
)
return ar.layered_histogram(df, columns=["Trial A", "Trial B", "Trial C"])
show_test(test_layered_histogram_wide)
Layered Histogram from long data
@viz_reg_test
def test_layered_histogram_long():
data = pd.DataFrame(
{
"Trial A": np.random.normal(0, 0.8, 1000),
"Trial B": np.random.normal(-2, 1, 1000),
"Trial C": np.random.normal(3, 2, 1000),
}
)
columns = list(data.columns)
ldata = gather(data, key="key", value="value", columns=columns)
return ar.layered_histogram(ldata, columns=["value"], group_by="key")
show_test(test_layered_histogram_long)
import altair_recipes as ar
from altair_recipes.common import viz_reg_test
from altair_recipes.display_pweave import show_test
import numpy as np
import pandas as pd
Qqplot
@viz_reg_test
def test_qqplot():
df = pd.DataFrame(
{
"Trial A": np.random.normal(0, 0.8, 1000),
"Trial B": np.random.normal(-2, 1, 1000),
"Trial C": np.random.uniform(3, 2, 1000),
}
)
return ar.qqplot(df, x="Trial A", y="Trial C")
show_test(test_qqplot)
import altair_recipes as ar
from altair_recipes.common import viz_reg_test
from altair_recipes.display_pweave import show_test
from hypothesis import given
from hypothesis.extra.pandas import columns, data_frames
from vega_datasets import data
Scatterplot
@viz_reg_test
def test_scatterplot():
return ar.scatterplot(
data.iris(),
x="petalWidth",
y="petalLength",
color="sepalWidth",
tooltip="species",
)
show_test(test_scatterplot)
Scatterplot alternate data syntax
@viz_reg_test
def test_scatterplot_alternate_data():
d = data.iris()
return ar.scatterplot(
x=d["petalWidth"],
y=d["petalLength"],
color=d["sepalWidth"],
tooltip=d["species"],
)
show_test(test_scatterplot_alternate_data)
@given(data=data_frames(columns=columns(["a", "b", "c"], dtype=float)))
def test_scatterplot_series(data):
chart1 = ar.scatterplot(data=data[["a", "c"]])
chart2 = ar.scatterplot(x=data["a"], y=data["c"])
assert chart1.to_dict() == chart2.to_dict()
Multiscatterplot at defaults
@viz_reg_test
def test_multiscatterplot_defaults():
return ar.multiscatterplot(data.iris())
show_test(test_multiscatterplot_defaults)
Multiscatterplot with explicit parameters
@viz_reg_test
def test_multiscatterplot_args():
"""Test multiscatterplot."""
return ar.multiscatterplot(
data.iris(), columns=data.iris().columns[:-1], color="species"
)
show_test(test_multiscatterplot_args)
Multiscatterplot alternate data syntax
@viz_reg_test
def test_multiscatterplot_args_alternate():
"""Test multiscatterplot."""
d = data.iris()
return ar.multiscatterplot(
columns=[d["sepalLength"], d["sepalWidth"], d["petalLength"]],
color=d["species"],
)
show_test(test_multiscatterplot_args_alternate)
@given(data=data_frames(columns=columns(["a", "b", "c"], dtype=float)))
def test_multiscatterplot_series(data):
chart1 = ar.multiscatterplot(data=data)
chart2 = ar.multiscatterplot(columns=[data["a"], data["b"], data["c"]])
assert chart1.to_dict() == chart2.to_dict()
import altair_recipes as ar
from altair_recipes.common import viz_reg_test
from altair_recipes.display_pweave import show_test
from vega_datasets import data
Lineplot
@viz_reg_test
def test_lineplot():
return ar.lineplot(
data.iowa_electricity(), x="year", y="net_generation", color="source"
)
show_test(test_lineplot)
import altair_recipes as ar
from altair_recipes.common import viz_reg_test
from altair_recipes.display_pweave import show_test
import numpy as np
import pandas as pd
Stripplot
@viz_reg_test
def test_stripplot():
x = np.array(range(100)) // 10
data = pd.DataFrame(dict(x=x, y=np.random.normal(size=len(x))))
return ar.stripplot(data)
show_test(test_stripplot)
import altair_recipes as ar
import numpy as np
import pandas as pd
from altair_recipes.common import viz_reg_test
from altair_recipes.display_pweave import show_test
Autoplot
Autoplot is very easy to use but can produce a variety of charts that are reasonably appropriate for the data to be displayed. Here is a longish sequence of examples of what autoplot will do with different combinations of up to three categorical or numerical variables and different data sizestest_size = 5000
def rand_cat(x, n):
return (
pd.Series((x + np.random.normal(size=test_size) * n) + 77)
.astype(int)
.apply(chr)
)
np.random.seed(seed=0)
x = np.random.normal(size=test_size)
y = np.random.normal(size=test_size) + x
z = np.random.normal(size=test_size) + y
data = pd.DataFrame(
dict(
x=x,
x_cat=rand_cat(x, 1),
y=y,
y_cat=rand_cat(y, 0.5),
z=z,
z_cat=rand_cat(z, 0.5),
)
)
#
# numvars = ["x", "y", "z"]
# catvars = ["x_cat", "y_cat", "z_cat"]
# n = 0
# for nvars in range(1, 4):
# for ncatvars in range(0, nvars + 1):
# vars = catvars[:ncatvars] + numvars[ncatvars:nvars]
# for nrows in [10, 50, 250, 1000, 5000]:
# n = n + 1
# print(
# """
# #' <h3> Test autoplot #{n}</h3>
#
# @viz_reg_test
# def test_autoplot_{n}():
# return ar.autoplot(data.head({nrows}), columns={vars})
#
# show_test(test_autoplot_{n})
# """.format(
# nrows=nrows, vars=vars, n=n
# )
# )
Test autoplot #1
@viz_reg_test
def test_autoplot_1():
return ar.autoplot(data.head(10), columns=["x"])
show_test(test_autoplot_1)
Test autoplot #2
@viz_reg_test
def test_autoplot_2():
return ar.autoplot(data.head(50), columns=["x"])
show_test(test_autoplot_2)
Test autoplot #3
@viz_reg_test
def test_autoplot_3():
return ar.autoplot(data.head(250), columns=["x"])
show_test(test_autoplot_3)
Test autoplot #4
@viz_reg_test
def test_autoplot_4():
return ar.autoplot(data.head(1000), columns=["x"])
show_test(test_autoplot_4)
Test autoplot #5
@viz_reg_test
def test_autoplot_5():
return ar.autoplot(data.head(5000), columns=["x"])
show_test(test_autoplot_5)
Test autoplot #6
@viz_reg_test
def test_autoplot_6():
return ar.autoplot(data.head(10), columns=["x_cat"])
show_test(test_autoplot_6)
Test autoplot #7
@viz_reg_test
def test_autoplot_7():
return ar.autoplot(data.head(50), columns=["x_cat"])
show_test(test_autoplot_7)
Test autoplot #8
@viz_reg_test
def test_autoplot_8():
return ar.autoplot(data.head(250), columns=["x_cat"])
show_test(test_autoplot_8)
Test autoplot #9
@viz_reg_test
def test_autoplot_9():
return ar.autoplot(data.head(1000), columns=["x_cat"])
show_test(test_autoplot_9)
Test autoplot #10
@viz_reg_test
def test_autoplot_10():
return ar.autoplot(data.head(5000), columns=["x_cat"])
show_test(test_autoplot_10)
Test autoplot #11
@viz_reg_test
def test_autoplot_11():
return ar.autoplot(data.head(10), columns=["x", "y"])
show_test(test_autoplot_11)
Test autoplot #12
@viz_reg_test
def test_autoplot_12():
return ar.autoplot(data.head(50), columns=["x", "y"])
show_test(test_autoplot_12)
Test autoplot #13
@viz_reg_test
def test_autoplot_13():
return ar.autoplot(data.head(250), columns=["x", "y"])
show_test(test_autoplot_13)
Test autoplot #14
@viz_reg_test
def test_autoplot_14():
return ar.autoplot(data.head(1000), columns=["x", "y"])
show_test(test_autoplot_14)
Test autoplot #15
@viz_reg_test
def test_autoplot_15():
return ar.autoplot(data.head(5000), columns=["x", "y"])
show_test(test_autoplot_15)
Test autoplot #16
@viz_reg_test
def test_autoplot_16():
return ar.autoplot(data.head(10), columns=["x_cat", "y"])
show_test(test_autoplot_16)
Test autoplot #17
@viz_reg_test
def test_autoplot_17():
return ar.autoplot(data.head(50), columns=["x_cat", "y"])
show_test(test_autoplot_17)
Test autoplot #18
@viz_reg_test
def test_autoplot_18():
return ar.autoplot(data.head(250), columns=["x_cat", "y"])
show_test(test_autoplot_18)
Test autoplot #19
@viz_reg_test
def test_autoplot_19():
return ar.autoplot(data.head(1000), columns=["x_cat", "y"])
show_test(test_autoplot_19)
Test autoplot #20
@viz_reg_test
def test_autoplot_20():
return ar.autoplot(data.head(5000), columns=["x_cat", "y"])
show_test(test_autoplot_20)
Test autoplot #21
@viz_reg_test
def test_autoplot_21():
return ar.autoplot(data.head(10), columns=["x_cat", "y_cat"])
show_test(test_autoplot_21)
Test autoplot #22
@viz_reg_test
def test_autoplot_22():
return ar.autoplot(data.head(50), columns=["x_cat", "y_cat"])
show_test(test_autoplot_22)
Test autoplot #23
@viz_reg_test
def test_autoplot_23():
return ar.autoplot(data.head(250), columns=["x_cat", "y_cat"])
show_test(test_autoplot_23)
Test autoplot #24
@viz_reg_test
def test_autoplot_24():
return ar.autoplot(data.head(1000), columns=["x_cat", "y_cat"])
show_test(test_autoplot_24)
Test autoplot #25
@viz_reg_test
def test_autoplot_25():
return ar.autoplot(data.head(5000), columns=["x_cat", "y_cat"])
show_test(test_autoplot_25)
Test autoplot #26
@viz_reg_test
def test_autoplot_26():
return ar.autoplot(data.head(10), columns=["x", "y", "z"])
show_test(test_autoplot_26)
Test autoplot #27
@viz_reg_test
def test_autoplot_27():
return ar.autoplot(data.head(50), columns=["x", "y", "z"])
show_test(test_autoplot_27)
Test autoplot #28
@viz_reg_test
def test_autoplot_28():
return ar.autoplot(data.head(250), columns=["x", "y", "z"])
show_test(test_autoplot_28)
Test autoplot #29
@viz_reg_test
def test_autoplot_29():
return ar.autoplot(data.head(1000), columns=["x", "y", "z"])
show_test(test_autoplot_29)
Test autoplot #30
@viz_reg_test
def test_autoplot_30():
return ar.autoplot(data.head(5000), columns=["x", "y", "z"])
show_test(test_autoplot_30)
Test autoplot #31
@viz_reg_test
def test_autoplot_31():
return ar.autoplot(data.head(10), columns=["x_cat", "y", "z"])
show_test(test_autoplot_31)
Test autoplot #32
@viz_reg_test
def test_autoplot_32():
return ar.autoplot(data.head(50), columns=["x_cat", "y", "z"])
show_test(test_autoplot_32)
Test autoplot #33
@viz_reg_test
def test_autoplot_33():
return ar.autoplot(data.head(250), columns=["x_cat", "y", "z"])
show_test(test_autoplot_33)
Test autoplot #34
@viz_reg_test
def test_autoplot_34():
return ar.autoplot(data.head(1000), columns=["x_cat", "y", "z"])
show_test(test_autoplot_34)
Test autoplot #35
@viz_reg_test
def test_autoplot_35():
return ar.autoplot(data.head(5000), columns=["x_cat", "y", "z"])
show_test(test_autoplot_35)
Test autoplot #36
@viz_reg_test
def test_autoplot_36():
return ar.autoplot(data.head(10), columns=["x_cat", "y_cat", "z"])
show_test(test_autoplot_36)
Test autoplot #37
@viz_reg_test
def test_autoplot_37():
return ar.autoplot(data.head(50), columns=["x_cat", "y_cat", "z"])
show_test(test_autoplot_37)
Test autoplot #38
@viz_reg_test
def test_autoplot_38():
return ar.autoplot(data.head(250), columns=["x_cat", "y_cat", "z"])
show_test(test_autoplot_38)
Test autoplot #39
@viz_reg_test
def test_autoplot_39():
return ar.autoplot(data.head(1000), columns=["x_cat", "y_cat", "z"])
show_test(test_autoplot_39)
Test autoplot #40
@viz_reg_test
def test_autoplot_40():
return ar.autoplot(data.head(5000), columns=["x_cat", "y_cat", "z"])
show_test(test_autoplot_40)
Test autoplot #41
@viz_reg_test
def test_autoplot_41():
return ar.autoplot(data.head(10), columns=["x_cat", "y_cat", "z_cat"])
show_test(test_autoplot_41)
Test autoplot #42
@viz_reg_test
def test_autoplot_42():
return ar.autoplot(data.head(50), columns=["x_cat", "y_cat", "z_cat"])
show_test(test_autoplot_42)
Test autoplot #43
@viz_reg_test
def test_autoplot_43():
return ar.autoplot(data.head(250), columns=["x_cat", "y_cat", "z_cat"])
show_test(test_autoplot_43)
Test autoplot #44
@viz_reg_test
def test_autoplot_44():
return ar.autoplot(data.head(1000), columns=["x_cat", "y_cat", "z_cat"])
show_test(test_autoplot_44)
Test autoplot #45
@viz_reg_test
def test_autoplot_45():
return ar.autoplot(data.head(5000), columns=["x_cat", "y_cat", "z_cat"])
show_test(test_autoplot_45)
w = pd.Series(range(10))
no_overlap_data = pd.DataFrame(
dict(
x=pd.concat([w, w - 3]).astype(str),
y=pd.concat([w, w]).astype(str),
z=pd.concat([w, w]).astype(str),
)
)
@viz_reg_test
def test_autoplot_CCC():
return ar.autoplot(no_overlap_data)
show_test(test_autoplot_CCC)
@viz_reg_test
def test_autoplot_CC():
return ar.autoplot(no_overlap_data, columns=["x", "y"])
show_test(test_autoplot_CC)
@viz_reg_test
def test_autoplot_C():
return ar.autoplot(no_overlap_data.head(10), columns=["x"])
show_test(test_autoplot_C)
Installation¶
Stable release¶
To install altair_recipes, run this command in your terminal:
$ pip install altair_recipes
This is the preferred method to install altair_recipes, as it will always install the most recent stable release.
If you don’t have pip installed, this Python installation guide can guide you through the process.
From sources¶
The sources for altair_recipes can be downloaded from the Github repo.
You can either clone the public repository:
$ git clone git://github.com/piccolbo/altair_recipes
Or download the tarball:
$ curl -OL https://github.com/piccolbo/altair_recipes/tarball/master
Once you have a copy of the source, you can install it with:
$ python setup.py install
altair_recipes¶
altair_recipes package¶
Module contents¶
Top-level package for altair_recipes.
-
altair_recipes.
areaplot
(data=None, x=0, y=1, color=None, stack=<StackType.auto: None>, height=600, width=800)[source]¶ Generate an areaplot.
Parameters: - data (altair.Data or pandas.DataFrame or csv or json file URL) – The data from which the statistical graphics is being generated
- x (int, str, pandas Series or a type convertible to it.) – The column containing the data associated with the horizontal dimension
- y (int, str, pandas Series or a type convertible to it.) – The column containing the data associated with the vertical dimension
- color (str or int) – The column containing the data associated with the color of the mark
- stack (StackType) – One of StackType.auto (automatic selection), StackType.true (force), StackType.false (no stacking) and StackType.normalize (for normalized stacked)
- height (int) – The height of the chart
- width (int) – The width of the chart
Returns: An altair Chart.
Return type: type altair.Chart or altair.LayerChart
-
altair_recipes.
autocorrelation
(data=None, column=0, max_lag=None, height=600, width=800)[source]¶ Generate an autocorrelation plot.
Parameters: - data (altair.Data or pandas.DataFrame or csv or json file URL) – The data from which the statistical graphics is being generated
- column (int, str, pandas Series or a type convertible to it.) – The column containing the data to be used in the graphics
- max_lag (int) – Maximum lag to show in the plot, defaults to number of rows in data
- height (int) – The height of the chart
- width (int) – The width of the chart
Returns: An altair Chart.
Return type: type altair.Chart or altair.LayerChart
-
altair_recipes.
autoplot
(data=None, columns=None, group_by=None, height=600, width=800)[source]¶ Automatically choose and produce a statistical graphics based on up to three columns of data.
Parameters: - data (altair.Data or pandas.DataFrame or csv or json file URL) – The data from which the statistical graphics is being generated
- columns (collection of: int, str, pandas Series or a type convertible to it.) – The column or columns to be used in the graphics, defaults to all
- group_by (int, str, pandas Series or a type convertible to it.) – The column to be used to group the data when in long form. When group_by is specified columns should contain a single column
- height (int) – The height of the chart
- width (int) – The width of the chart
Returns: An altair Chart.
Return type: type altair.Chart or altair.LayerChart
-
altair_recipes.
barchart
(data=None, x=0, y=1, color=False, height=600, width=800)[source]¶ Generate a barchart.
Parameters: - data (altair.Data or pandas.DataFrame or csv or json file URL) – The data from which the statistical graphics is being generated
- x (int, str, pandas Series or a type convertible to it.) – The column containing the data associated with the horizontal dimension
- y (int, str, pandas Series or a type convertible to it.) – The column containing the data associated with the vertical dimension
- color (bool) – Whether to also use color to encode the same data as the x coordinate
- height (int) – The height of the chart
- width (int) – The width of the chart
Returns: An altair Chart.
Return type: type altair.Chart or altair.LayerChart
-
altair_recipes.
boxplot
(data=None, columns=None, group_by=None, color=False, height=600, width=800)[source]¶ Generate a boxplot.
Parameters: - data (altair.Data or pandas.DataFrame or csv or json file URL) – The data from which the statistical graphics is being generated
- columns (collection of: int, str, pandas Series or a type convertible to it.) – The column or columns to be used in the graphics, defaults to all
- group_by (int, str, pandas Series or a type convertible to it.) – The column to be used to group the data when in long form. When group_by is specified columns should contain a single column
- color (bool) – Whether to also use color to encode the same data as the x coordinate
- height (int) – The height of the chart
- width (int) – The width of the chart
Returns: An altair Chart.
Return type: type altair.Chart or altair.LayerChart
-
altair_recipes.
layer
(*layers, **kwargs)[source]¶ Layer charts: a drop in replacement for altair.layer that does a deepcopy of the layers to avoid side-effects and lifts identical datasets one level down to top level.
-
altair_recipes.
lineplot
(data=None, x=0, y=1, color=None, height=600, width=800)[source]¶ Generate a lineplot.
Parameters: - data (altair.Data or pandas.DataFrame or csv or json file URL) – The data from which the statistical graphics is being generated
- x (int, str, pandas Series or a type convertible to it.) – The column containing the data associated with the horizontal dimension
- y (int, str, pandas Series or a type convertible to it.) – The column containing the data associated with the vertical dimension
- color (str or int) – The column containing the data associated with the color of the mark
- height (int) – The height of the chart
- width (int) – The width of the chart
Returns: An altair Chart.
Return type: type altair.Chart or altair.LayerChart
-
altair_recipes.
heatmap
(data=None, x=0, y=1, color=2, opacity=None, aggregate='average', height=600, width=800)[source]¶ Generate a heatmap.
Parameters: - data (altair.Data or pandas.DataFrame or csv or json file URL) – The data from which the statistical graphics is being generated
- x (int, str, pandas Series or a type convertible to it.) – The column containing the data associated with the horizontal dimension
- y (int, str, pandas Series or a type convertible to it.) – The column containing the data associated with the vertical dimension
- color (str or int) – The column containing the data associated with the color of the mark
- opacity (str) –
- column containing the data that determines opacity of the mark (The) –
- aggregate (str) – The aggregation function to set the color of each mark, see https://altair-viz.github.io/user_guide/encoding.html#encoding-aggregates for available options
- height (int) – The height of the chart
- width (int) – The width of the chart
Returns: An altair Chart.
Return type: type altair.Chart or altair.LayerChart
-
altair_recipes.
histogram
(data=None, column=0, height=600, width=800)[source]¶ Generate a histogram.
Parameters: - data (altair.Data or pandas.DataFrame or csv or json file URL) – The data from which the statistical graphics is being generated
- column (int, str, pandas Series or a type convertible to it.) – The column containing the data to be used in the graphics
- height (int) – The height of the chart
- width (int) – The width of the chart
Returns: An altair Chart.
Return type: type altair.Chart or altair.LayerChart
-
altair_recipes.
layered_histogram
(data=None, columns=None, group_by=None, height=600, width=800)[source]¶ Generate multiple overlapping histograms.
Parameters: - data (altair.Data or pandas.DataFrame or csv or json file URL) – The data from which the statistical graphics is being generated
- columns (collection of: int, str, pandas Series or a type convertible to it.) – The column or columns to be used in the graphics, defaults to all
- group_by (int, str, pandas Series or a type convertible to it.) – The column to be used to group the data when in long form. When group_by is specified columns should contain a single column
- height (int) – The height of the chart
- width (int) – The width of the chart
Returns: An altair Chart.
Return type: type altair.Chart or altair.LayerChart
-
altair_recipes.
multiscatterplot
(data=None, columns=None, group_by=None, color=None, opacity=1, tooltip=None, height=600, width=800)[source]¶ Generate many scatterplots.
Based on several columns, pairwise.Parameters: - data (altair.Data or pandas.DataFrame or csv or json file URL) – The data from which the statistical graphics is being generated
- columns (collection of: int, str, pandas Series or a type convertible to it.) – The column or columns to be used in the graphics, defaults to all
- group_by (int, str, pandas Series or a type convertible to it.) – The column to be used to group the data when in long form. When group_by is specified columns should contain a single column
- color (str or int) – The column containing the data associated with the color of the mark
- opacity (float) – A constant value for the opacity of the mark
- tooltip (str or int) – The column containing the data associated with the tooltip text
- height (int) – The height of the chart
- width (int) – The width of the chart
Returns: An altair Chart.
Return type: type altair.Chart or altair.LayerChart
-
altair_recipes.
qqplot
(data=None, x=0, y=1, height=600, width=800)[source]¶ Generate a quantile-quantile plot.
Parameters: - data (altair.Data or pandas.DataFrame or csv or json file URL) – The data from which the statistical graphics is being generated
- x (int, str, pandas Series or a type convertible to it.) – The column containing the data associated with the horizontal dimension
- y (int, str, pandas Series or a type convertible to it.) – The column containing the data associated with the vertical dimension
- height (int) – The height of the chart
- width (int) – The width of the chart
Returns: An altair Chart.
Return type: type altair.Chart or altair.LayerChart
-
altair_recipes.
scatterplot
(data=None, x=0, y=1, color=None, opacity=1, tooltip=None, height=600, width=800)[source]¶ Generate a scatterplot.
Parameters: - data (altair.Data or pandas.DataFrame or csv or json file URL) – The data from which the statistical graphics is being generated
- x (int, str, pandas Series or a type convertible to it.) – The column containing the data associated with the horizontal dimension
- y (int, str, pandas Series or a type convertible to it.) – The column containing the data associated with the vertical dimension
- color (str or int) – The column containing the data associated with the color of the mark
- opacity (float) – A constant value for the opacity of the mark
- tooltip (str or int) – The column containing the data associated with the tooltip text
- height (int) – The height of the chart
- width (int) – The width of the chart
Returns: An altair Chart.
Return type: type altair.Chart or altair.LayerChart
-
altair_recipes.
smoother
(data=None, x=0, y=1, window=None, interquartile_area=True, height=600, width=800)[source]¶ Generate a smooth line plot with optional IRQ shading area.
Parameters: - data (altair.Data or pandas.DataFrame or csv or json file URL) – The data from which the statistical graphics is being generated
- x (int, str, pandas Series or a type convertible to it.) – The column containing the data associated with the horizontal dimension
- y (int, str, pandas Series or a type convertible to it.) – The column containing the data associated with the vertical dimension
- window (int) – The size of the smoothing window
- interquartile_area (interquartile_area: bool) – Whether to plot the IRQ as an area
- height (int) – The height of the chart
- width (int) – The width of the chart
Returns: An altair Chart.
Return type: type altair.Chart or altair.LayerChart
-
class
altair_recipes.
StackType
[source]¶ Bases:
enum.Enum
An enumeration.
-
auto
= None¶
-
false
= False¶
-
normalize
= 'normalize'¶
-
true
= True¶
-
-
altair_recipes.
stripplot
(data=None, columns=None, group_by=None, color=None, opacity=1, height=600, width=800)[source]¶ Generate a stripplot.
Parameters: - data (altair.Data or pandas.DataFrame or csv or json file URL) – The data from which the statistical graphics is being generated
- columns (collection of: int, str, pandas Series or a type convertible to it.) – The column or columns to be used in the graphics, defaults to all
- group_by (int, str, pandas Series or a type convertible to it.) – The column to be used to group the data when in long form. When group_by is specified columns should contain a single column
- color (str or int) – The column containing the data associated with the color of the mark
- opacity (float) – The value of the constant opacity of the mark (use to counter overlap)
- height (int) – The height of the chart
- width (int) – The width of the chart
Returns: An altair Chart.
Return type: type altair.Chart or altair.LayerChart
Contributing¶
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
You can contribute in many ways:
Types of Contributions¶
Report Bugs¶
Report bugs at https://github.com/piccolbo/altair_recipes/issues.
If you are reporting a bug, please include:
- Your operating system name and version.
- Any details about your local setup that might be helpful in troubleshooting.
- Detailed steps to reproduce the bug.
Fix Bugs¶
Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.
Propose Features¶
The types of new features we can think of are of two types. First is more flexibility for charts that altair_recipes can produce already, e.g. the recent addition of height and width controls; second is entirely new types of chars. As to the first, we are trying to balance two aims: keeping it simple and making it powerful enough to cover common visualization needs. This isn’t very precise, but we will try to make it more so over time. Controlling the width seemed a necessity. Changing a color palette, maybe not so much (it can also be controlled with altair
’s configure_*
methods). As to entirely new types of chart, we’d like to include any charts that are in widespread use in data analysis practice, which may have a scientific article or a wikipedia entry devoted to them or other supporting evidence of statistical relevance. Chart types that have been used once or are implemented in a single library, like the jointplot, are not good candidates. To propose a new feature, please open a new issue with description, rationale, an example and, ideally, sample implementation in altair
or vega-lite
.
Implement Features¶
Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it. A new type of chart will require a new test.
Write Documentation¶
altair_recipes could always use more documentation, whether as part of the official altair_recipes docs, in docstrings, or even on the web in blog posts, articles, and such.
Submit Feedback¶
The best way to send feedback is to file an issue at https://github.com/piccolbo/altair_recipes/issues.
If you are proposing a feature:
- Explain in detail how it would work.
- Keep the scope as narrow as possible, to make it easier to implement.
- Remember that this is a volunteer-only project, and that contributions are welcome :)
Get Started!¶
Ready to contribute? Here’s how to set up altair_recipes for local development.
Fork the altair_recipes repo on GitHub.
Clone your fork locally:
$ git clone git@github.com:your_name_here/altair_recipes.git
Install your local copy into a virtualenv. This is how you set up your fork for local development:
$ curl -sSL https://raw.githubusercontent.com/sdispater/poetry/master/get-poetry.py | python #if needed, or other method to install poetry $ cd altair_recipes/ $ poetry install
Create a branch for local development:
$ git checkout -b <branch-name>
Where <branch-name> can be as simple as
issue-<issue-number>
but should always end with-<issue-number>
. Now you can make your changes locally.When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:
$ flake8 altair_recipes tests $ make test $ tox # in the works
Commit your changes and push your branch to GitHub:
$ git add . $ git commit -m "Your detailed description of your changes." $ git push origin <branch-name>
Submit a pull request through the GitHub website.
Pull Request Guidelines¶
Before you submit a pull request, check that it meets these guidelines:
- The pull request should include tests. Coverage should never decrease (check with make coverage)
- If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add new chat types to the list in README.rst.
- The pull request should work for Python 3.5 and 3.6, or as listed in file travis.yml. Check https://travis-ci.org/piccolbo/altair_recipes/pull_requests and make sure that the tests pass for all supported Python versions.
Tips¶
To run a subset of tests:
$ py.test tests.test_altair_recipes
Tests should be decorated with @viz-reg-test
and produce an altair chart. This will save the json output for regression testing and produce an html file for visual inspection.
Deploying¶
A reminder for the maintainers on how to deploy. Make sure all your changes are committed (including an entry in HISTORY.rst). Then run:
$ bumpversion patch # possible: major / minor / patch
We use semantic versioning. Then:
$ git push
$ git push --tags
Travis will then deploy to PyPI if tests pass (not implemented yet, use make release
)
Credits¶
Development Lead¶
- Antonio Piccolboni <antonio@piccolboni.info>
Contributors¶
None yet. Why not be the first?
History¶
0.9.0 (2020-06-11)¶
- Fixed color in boxplot
- Upgrade to altair 4. Mandatory. Let me know if you need compatibility with 3.x.x
0.8.0 (2019-10-16)¶
- Added lineplots and areaplots #11 and #12
0.7.1 (2019-10-07)¶
Accepts vector data in addition to dataframe, as in:
import altair_recipes as ar from numpy.random import normal ar.scatterplot(x=normal(size=100), y=normal(size=100))
0.6.5 (2019-10-01)¶
- Make ipython dep optional (for pweave support). Use piccolbo’s pweave fork (upstream doesn’t pass its own tests) for doc generation. Adapt to breaking changes in autosig (a dependency).
0.6.4 (2019-09-18)¶
- Switched to poetry for package management
0.6.0 (2019-01-25)¶
- Fine tuned API:
- no faceting but all returned charts are facet-able
- Color made a bool option when separate color dim can’t work
- Eliminated some special cases from autoplot for very small datasets
- Some refactor in boxpolot and autoplot to shrink, clarify code
0.5.0 (2019-01-17)¶
- Autoplot for automatic statistical graphics
- Stripplots and barcharts
0.4.0 (2018-09-25)¶
- Custom height and width for all charts
0.3.2 (2018-09-21)¶
- Dealt with breaking changes from autosig, but code is simpler and paves the way for some new features
0.3.1 (2018-09-20)¶
- Addressing a documentation mishap
0.3.0 (2018-09-20)¶
- Better readme and a raft of examples
- Some test flakiness addressed
0.2.4 (2018-08-29)¶
- One more issue with col resolution
- Switch to using docstring support in autosig
0.2.3 (2018-08-29)¶
- Some issues with processing of columns and group_by args
- Fixed travis-ci build (3.6 only, 3.5 looks like a minor RNG issue)
0.2.2 (2018-08-28)¶
- Switch to a simpler, flatter API a la qplot
- Added two types of heatmaps
- Extensive use of autosig features for API consistency and reduced boilerplate
- Fixed build to follow requests model (pip for users, pipenv for devs)
0.1.2 (2018-08-14)¶
- Fixed a number of loose ends particularly wrt docs
0.1.0 (2018-08-06)¶
- First release on PyPI.