Category Archives: Collecting Insight

Ten Simple Rules for Better Data Visualization

Nicolas Rougier and friends have prepared an excellent outline of how to prepare superior data graphics.  Their focus is on preparing effective data charts, and their article dwells on how to prepare individual charts for academic publication.  However, their points are valid in a wider sense (and overlap with some of Edward Tufte’s guidelines.)

I am synopsizing and expanding for my own benefit.

Rule 1:  Know Your Audience

Rule 2:  Identify Your Message

Rule 3:  Adapt The Figure To The Support Medium

Rule 4:  Captions Are Not Optional

Rule 5:  Do Not Trust The Defaults

Rule 6:  Use Color Effectively

Rule 7:  Do Not Mislead The Reader

Rule 8:  Avoid “Chartjunk”

Rule 9:  Message Trumps Beauty

Rule 10:  Get The Right Tool

In that last rule, they inventory a number of open source tools useful to preparing data graphics for presentation and publication.  I want to capture these for my own skills planning purposes.

MatPlotLib is a python plotting library that comes with a huge gallery of examples that cover virtually all scientific domains.

R provides a wide variety of statistical and graphical techniques, and is highly extensible.

Inkscape is a professional vector graphics editor. It cab also read a PDF file in order to extract figures and transform them.

TikZ and PGF are TeX packages for creating graphics programmatically.

GIMP is a photo compositing application that can quickly retouch an image or add some legends or labels.

ImageMagick is a software suite to create, edit, compose, or convert bitmap images. It can be used to quickly convert an image into another format.

D3.js offers an easy way to create and control interactive data-based graphical forms which run in web browsers.

Cytoscape is for visualizing complex networks and integrating these with any type of attribute data.

Circos was originally designed for visualizing genomic data but can create figures from data in any field.

(Image courtesy of cocoparisenne at Pixabay)


Getting Value from Data

I really like this business value extraction loop outlined by Lynn Langit in one of her courses on Lynda.  Lynn is particularly addressing getting value when moving from relational to non-relational databases, but I believe her point clearly extends to getting value from data in general.

She outlines a process that can be iterated over projects, applications, or data sets.

  1. Formulate clear business questions.
  2. Select the solution that provides a path to answering those questions.
  3. Find, load, and clean all the source data.
  4. Query the data.
  5. Present (or visualize) the data.
  6. Iterate.

I look forward to seeing how other professionals extend or surpass this approach.  I particularly look forward to developing my own insight into real-world details of this process.