Plotting in python
Jack of all trades, old master of none
October 8, 2014 — August 1, 2023
I’m visualising data in Python because it is the lingua franca of my team. I’d like to use it for real-time and interactive or publication-quality, but I won’t be inconsolable if I cannot achieve both simultaneously.
Visualisation is not an especially strong suit of Python; the strong suit is hodgepodge, decoupage, bricolage, and, uh, potpourri. Therefore my solution is to cobble something together, or better, to use someone else’s cobbling.
1 Matplotlib
The default option. Well documented, ubiquitous. Reliable. Built upon a foundation of design choices aged poorly. Basic plots are easy; nuanced plots are a maze of margin tweaking, overlapping incompatible helper libraries, confusing naming and API collision. A classic example of a tool which is easier to use by copy-pasting from Stack Overflow than to use yourself. Complicated enough that I made a new notebook. See matplotlib.
2 Send it to R
R plotting is good so it can be worth the overhead to export data to R.
3 Holoviz
A suite of visualisation tools and guides called Holoviz includes a lot of plotting infrastructure. Fresh, enthusiastic.
HoloViz tools build on the many excellent visualization tools available in the scientific Python ecosystem, allowing you to access their power conveniently and efficiently. The core tools make use of Bokeh’s interactive plotting, Matplotlib’s publication-quality output, and Plotly’s interactive 3D visualizations. Panel lets you combine any of these visualizations with output from nearly any other Python plotting library, including specific support for seaborn, altair, vega, plotnine, graphviz, ggplot2, plus anything that can generate HTML, PNG, or SVG.
HoloViz tools and examples generally work with any Python standard data types (lists, dictionaries, etc.), plus Pandas or Dask DataFrames and NumPy, Xarray, or Dask arrays, including remote data from the Intake data catalog library. They also use Dask and Numba to speed up computations along with algorithms and functions from SciPy.
HoloViz tools are designed for general-purpose use, but also support some domain-specific datatypes like graphs from NetworkX and geographic data from GeoPandas and Cartopy and Iris.
Panel can be used with yt for volumetric and physics data and SymPy or LaTeX for visualising equations.
HoloViz tools provide extensive support for Jupyter notebooks, as well as for standalone web servers and exporting as static files.
Is it good? Some think so, notably Sophia Yang, who wrote some intros, e.g.
She explains:
HoloViz allows users to build Python visualization and interactive dashboard with super easy and flexible Python code. It provides the flexibility to choose among several API backends, including bokeh, matplotlib, and plotly, so you can choose different backends based on your preferences. Plus, it’s 100% open source!
Unlike the other Python viz and dashboarding options, HoloViz is very serious about supporting every reasonable context in which you might want to use a Python viz or app tool:
- a Jupyter notebook,
- a Python file,
- a batch job generating PDFs or SVGs or PNGs or GIFs,
- as part of an automated report,
- as a standalone server,
- as a standalone .html file on a website.…
Each of the alternative technologies supports a few of those cases well but lets all the rest slide. HoloViz minimizes the friction and cost of switching between all of these contexts, because that’s the reality of any scientist or analyst—as soon as you publish it, people want changes! Once you have your Dash app; that’s all you have, but once you have a Panel app, you can go back to Jupyter the next day and start right where you left off.
- Panel builds interactive dashboards and apps. It’s like R Shiny, but more powerful. I can’t say enough how much I love Panel.
- hvPlot is easier than any other plotting libraries in my experience, especially if you like to plot Pandas DataFrames. With one line of code, hvPlot will provide you an interactive plot with all the nice built-in functionalities you want.
- HoloViews is a great tool for data exploration and data mining through visualization.
- GeoViews plots geographic data.
- Datashader handles big data visualization. Using Numba (Python compiler) and Dask (distributed computing), Datashader creates meaningful visualizations of large datasets very quickly. I absolutely love Datashader and love the beautiful plots it generates.
- Param creates declarative user-configurable objects.
- Colorcet creates colormaps.
4 Plotly
Originally a mostly browser-based visualisation, plotly’s native Python support (source) is supposed to be quite good and quite general these days since you can embed browser tech in other things easily. It supports high-resolution print-quality graphics, vector rendering and so on. Certainly the Plotly library is hipper than Matplotlib and seems to incorporate the input of some graphic designers from the internet, which Matplotlib seems to do rarely because it is old and/or confusing and/or unlikely to pop up as a highlight in your web portfolio since the main target is scientific journals.
Credit: I am indebted to Andy MacKinlay for reminding me Plotly is a thing. See plotly.
5 Bokeh
bokeh does “big-data” and streaming-based browser graphing for Python. Its website probably looks the nicest out of everything I’ve mentioned, which says something important about priorities. However, its print-output seems to be bad; this is a web-oriented tool.
Bokeh is a Python library for creating interactive visualizations for modern web browsers. It helps you build beautiful graphics, ranging from simple plots to complex dashboards with streaming datasets. With Bokeh, you can create JavaScript-powered visualizations without writing any JavaScript yourself.
6 Vega/Altair
Browser visualiser Vega is available for Python, via the library Altair. USP: easy interactives. Vega-Lite provides a Grammar of Graphics approach, like ggplot2.
7 Misc browser options
jupyter notebooks have a rich enough API to integrate various more exotic pure-browser graphics options; In fact, since you are now using the web browser, you can inspect a menu at browser datavis.
Here are some hacks:
superset is Airbnb’s python+browser interactive data exploration tool; filed under dashboards.
mpl3d plugs browser d3.js into jupyter to emulate matplotlib.
The mpld3 package is extremely easy to use: you can simply take any script generating a matplotlib plot, run it through one of mpld3’s convenience routines, and embed the result in a web page.
2D only, AFAICT.
8 GR
GR.py wraps GR, a cross-platform visualisation framework:
GR is a universal framework for cross-platform visualization applications. It offers developers a compact, portable and consistent graphics library for their programs. Applications range from publication quality 2D graphs to the representation of complex 3D scenes. […]
GR is essentially based on an implementation of a Graphical Kernel System (GKS) and OpenGL. […] GR is characterized by its high interoperability and can be used with modern web technologies and mobile devices. The GR framework is especially suitable for real-time environments.
It will also function as a matplotlib backend. GR is ugly brutalist in its graph presentation, but it works fine.
9 Visdom
Visdom is
A flexible tool for creating, organising, and sharing visualizations of live, rich data. Supports Torch and Numpy.
It pumps graphs to a visualisation server enabling some kind of shared visualisation of a thing of interest. You want more of a pitch?
- Visdom aims to facilitate visualization of (remote) data with an emphasis on supporting scientific experimentation.
- Broadcast visualizations of plots, images, and text for yourself and your collaborators.
- Organize your visualization space programmatically or through the UI to create dashboards for live data, inspect results of experiments, or debug experimental code.
It sounds like one could e.g. build an SGD diagnostic convergence diagram using this as an alternative to Tensorboard.
10 Plotting networks in particular
- INRIA’s Tulip has fans
- Visualising a NetworkX graph in the IPython notebook with d3.js
In browser datavis I found Sigma.js; there are surely more JS graph visualisations.
11 TomViz
- OpenChemistry/tomviz: Cross platform, open source application for the processing, visualization, and analysis of 3D tomography data
- tomviz for tomographic visualization of nanoscale materials
The Tomviz project is developing a cross platform, open source application for the processing, visualization, and analysis of 3D tomographic data. It features a complete pipeline capable of processing data from alignment, reconstruction, and segmentation through to displaying, visualising, and interacting with 3D reconstructions of tomographic data. Many of the data operators are available as editable Python scripts that can be modified in the interface to experiment with different techniques. The pipeline can be saved to disk, and a number of common file formats are supported for importing and exporting data.
12 VisPy
VisPy is OpenGL-backed data visualisation, focusing on science (ooh!). It also offers a Matplotlib compatibility layer. Here are some howtos:
It seems to require more writing of OpenGL shaders than one would like to draw a line graph.
However, there are less messy looking tools in the ecosystem: napari is a multidimensional image viewer leveraging Vispy.
13 Mayavi
Mayavi is an opinionated open-source commercially-backed interactive 3D visualiser. The source code repository is worryingly hard to find. For future reference, it’s here.
On a similar tip, although looking more basic and more bitrotten, is vtk — if I understand correctly, VTK is the engine used by Mayavi? Better maintained and possibly still vtk-based is Paraview, which supports pluggable backends.
14 Not exactly graphing libraries
Disney (!) has a game library Panda3d, that seems to do all the fun things
even more bareback, more-or-less-directly calling into openGL, but seriously, I’m a statistician, not a coder. I could also hand-pulp hemp to make my own graph paper to draw my visualisations, drawn in home-made iron gall ink, but I would find it equally hard to argue that it was an efficient prioritisation.
I haven’t used PREdator (although I understand it’s been around longer than I. Heh.)(Wiedemann, Bellstedt, and Görlach 2014)
15 Bayes in particular
ArviZ is a Python package for exploratory analysis of Bayesian models. Includes functions for posterior analysis, data storage, sample diagnostics, model checking, and comparison.
The goal is to provide backend-agnostic tools for diagnostics and visualizations of Bayesian inference in Python, by first converting inference data into xarray objects. See here for more on xarray and ArviZ usage and here for more on
InferenceData
structure and specification.
16 General image reading and writing
- Imageio is a workhorse Python image system.
17 Animations
17.1 GIFs
Of course, what we all truly want is animated GIFs. Here is a classic using Python, Pillow. See also the specialised array2gif.
17.2 manim
3b1b’s manim is a curious passion project to create animations through code. It is famous on e.g. YouTube for powerful examples like this one.
See manim for more documentation; I am currently using this tool and might document some stuff there.