Makeover Monday, 2017 #17

This week Tableau themselves chose the data; a celebration of the importance of data skills based on LinkedIn’s annual Top Skills reports. The Tableau example includes a bump chart and box and whisker plot, whereas the LinkedIn version is a simple list per country in a slide share style:

LinkedInSkillsReport



.

Thoughts on the original charts

The LinkedIn version is simple and clear. You can see which skills are ranked highest and how that ranking has changed since last year. However it is hard to compare the countries to each other or to the overall global ranking (you have to flick back and forth in the slide deck). Also you can’t see how rankings have changed from years prior to 2015.

The Tableau version addresses some of these issues. It’s still a little hard to compare countries to the overall global ranking. The bump chart shows the change in ranking across multiple years, but personally I find bump charts a little hard to follow particularly if there’s no labelling or colour legend.

 

Is there such a thing as spark dots?

My first idea was to build a set of spark lines showing the current and historical rankings for each skill. Unfortunately this didn’t look great as some skills don’t have rankings in some years. The spark  lines looked a bit disjoint and in some cases were just spark dots, which presumably isn’t a real thing!

 

So, what could I add to fix this?

Hang on, why did I want to add something when the data was just a ranked list? Instead I decided to step right back and try a simple styled list. I’d still want to show change in rank from 2015, but would leave out comparisons to 2014. This would still present a challenge as I’d need to balance the emphasis on the skill versus the information on ranking changes, and I’d need to deal with cases where there was no change in ranking or the skill was new to the rankings. I’d also have the challenge of comparing different countries.

Now to be honest I passed a bit on this second challenge, deciding that with limited space (and audience attention) it was better to just provide a comparison to the overall global ranking and the option to chose the country. So I wasn’t going to address all of the points I noted with the originals, but I was happy that I’d be honouring  the nature of the source data.

Here is the finished viz:

LinkedInSkillsMakeover



.

The core of the viz is a pretty basic matrix of text by rank and country. Check out the version on Tableau Public if you want to dig into anything in particular. You may note that time could be spent making the workbook more robust as I have hardcoded 2015 and 2016 as the years to use, instead of using LOD or table calcs to find the latest and immediately previous year. There were two particular tricks I used this week that seemed to be worth elaborating on though:

 

Showing just the selected country and the global list

To handle this I didn’t just want to use the existing country field as a filter because I’d be relying on the user to always have one country plus “Global” selected. Instead I created a parameter with all countries except global (tip: you can use the “add from field” option when creating a parameter to save typing all of the options in). I could then add a workbook filter on a calculated field to only include global and the selected country:

Step3


I also created a calculated field for  ”country order” so that global always appeared in the right hand column. Otherwise sort order would be alphabetic and sometimes global would appear in the left hand column.

 

Colouring the text in EACH column differently

This was quite fiddly. I’d already created calculations to show an up arrow, down arrow and relevant text for the supporting label about each skill. I then realised that I wanted the selected country to stand out more than the global list. To do this I ended up creating two calculations, one for global which would return NULL when country was not global and one for the selected country which would return NULL in the opposite situation. The idea here is that only one of the calculated fields ever returns a value so I could give each a different colour safe in the knowledge that only one of them would be displayed in any one cell. Note the lack of an ELSE block in the calculation below which ensures NULL is returned. I could have used ELSE “” given that the calculation is used in label text of course, which may have been more maintainable?

Step2



.
Step1

.

If you’re wondering what those strange symbols are in the label definition above then it’s just that I’ve selected a font of Wingdings 3 for the calculated fields that return and up and down arrow.

 

Job done!

Another great week for honing Tableau skills. Nothing I’ve done this week is rocket science by any means, but as someone still learning the product, and about data visualisation as a whole, it feels like I’m making progress. And of course it’s pretty cool to see that some of the skills used are in demand!

Makeover Monday, 2017 #16

Week 16 and it’s back to some big data on EXASOL thanks to Eva and Johannes. This time over 750 million rows of GP practice prescribing data for the UK from 2010 – 2017. The nominal challenge to makeover some of the charts in a House of Commons research briefing based on the data.

orig



.

Like most of the Makeover Monday community I didn’t find the graphs particularly exciting or engaging. but I did find the research briefing to be interesting and it covered some of the key questions I had in my mind about prescriptions as a topic, such as what was happening with antibiotic use.

Also like others I didn’t simply try to recreate the graphs in the briefing. Actually that’s not entirely true, I do tend to start these makeovers by recreating at least one of the graphs or data tables. Recreating something is a good way to check that I’m getting the right numbers and helps me to orientate myself to the data. For my actual makeover though I wanted to dive into some of the detail like many others. One of the early submissions by Adam Crahen identified the growth in prescriptions of Apixaban. I knew a bit about anticoagulants, so this felt like a good category to drill into:

DASH



.

The first step was to identify the main anticoagulants being prescribed. I picked these four as an obvious story seemed to emerge; the move away from the traditionally prescribed Warfarin to three new novel oral anticoagulants. A little insider knowledge made me include Dabigatran as it doesn’t really stand out from the crowd (the remaining grey lines) unless you look closely.

Next I sourced a reusable image relating to the vascular system. Now at this point I got pretty excited. I thought that I could blend the arteries, etc from the image into the lines in the graphs. So I had the timeline running down the viz and was trying to resize things to line up nicely. End result? It looked like terrible! Back to the drawing board and I’m glad that I did.

Instead I decided to use the image as a backdrop to some context (using the title and the number of anticoagulant items prescribed compared to the overall number of items prescribed) and for aspects of the colour pallet. A much cleaner result I think whilst still retaining some visual impact.

The final step was to source detail for the labels on the left (the story) and edit this down to fit. Reducing the text down to what I ended up with was probably the second hardest part of the process! If I were revisiting this visualisation I’d tie the story into the headline a little more by specifically mentioning heart attacks in the section on conditions. I would also try to pull in some antiplatelet prescription data as aspirin has a part to play in the story.

In terms of new things tried in Tableau, this week I went with an entirely floating layout for the dashboard to see if that helped with alignment variations between devices and Tableau Public. The approach of floating everything seems to have helped a bit, although I maybe should have split the story points into individual text objects so that they would better align to the aspects of the graphs that they referred to?

On a final note it was great to play with EXASOL again – a pleasure exploring and working with this large data set remotely.

My makeover is also available on Tableau Public.

Workout Wednesday: practicing level of detail and table calcs in Tableau

I don’t usually participate in Tableau #WorkoutWednesday, but this week the chart we had to reproduce featured my team so I had to give it a go!

Here is the finished chart, which is also available on Tableau Public:

How many times has a team been top of the premier league

 



The source data included the match day (week), team name and their total points at the end of that match day. Here are the first two rows to give you an idea:

Match day Team Total points
1 Arsenal 0
2 Arsenal 3

So the challenge is to:

  1. Look at the total points each team has at the end of a game week
  2. Rank them to find out which teams are top that week (NB: in actual fact only one team would be top and goal difference would be applied if there was a tie, but the purpose of the challenge wasn’t to get into that. Also it was pre-acknowledged that the data didn’t cover a full season).
  3. Count up the number of times each team was top

The idea was to do that once using level of detail calculations and again using table calculations. Now to be honest both of these do my head in at times given where I’m at with learning Tableau so it’s a great challenge to try even if the mighty foxes weren’t featured!

Lets start with the table calculations because they are sometimes perceived to be harder.

Firstly I calculated the rank for each team based on their total points at the end of each match day, using RANK working along the list of teams. I.e. ”give me the rank of each team based on their total points”:

calc1

 


Next I calculated the number of match days each team was at the top using a running sum of :1 when the team was top and 0 when they weren’t, this time along match day.  I.e. “give me a running count of the number of times the team has been at the top of the league as we progress through the match days”:
calc2



The final piece of the puzzle was to be able to filter the results down to the final match day, without impacting the table calculations above (filters are generally applied before table calculations in Tableau so you apply your filter and all your table calculation seem to break! They don’t really, they’re just being applied on a subset of the data now, but it feels like they break when you’re new to them). The trick here – and I wish I could remember where I picked this up so that I could credit that person – is to filter on another table calculation. That way the filter is applied after table calculations are done. The calculation is pretty simple: “is this the last match day?”:

calc3



After taking this approach (one calculation along team, the next along match day, and then filtering to final match day) I saw that other people had stopped at a version of the first table calculation. They could then plot a series of stacked bars (of value 1) for each week where a team was top. Cunning! And it saves getting your head around the double table calculation right? But on the downside you’ve got stacked bars where you didn’t really need them, and you need to use a reference line for an overall label. On the upside your head didn’t explode due to table calculations on top of table calculations. And the next poor analyst who has to maintain your workbook doesn’t curse you to all of your colleagues!

So what about the level of detail version? Well it’s basically doing the same thing, but in a different way of course. I haven’t broken this down into multiple calculations so bear with me. Starting in the middle with the first calculation {FIXED [Match Day]: MAX([Points Total])} I’m saying “tell me what the maximum total points were, regardless of team given that my viz is at the team level of detail, on any match day”* by fixing at the match day level of detail. We can then identify the teams with a points total that matches this max. These are the teams at the top of the table for that match day. We given them a 1 and others a 0. We can then SUM the 1s and 0s up at the Team Name level of detail (i.e. across all match days) to give us a final count:

calc4



*I wonder if I should have used EXCLUDE instead of FIXED in the level of detail calculations, but the above worked so I stuck with it.

Another little trick in the workout was to get the axis labels to appear at the top rather than the bottom (in Tableau continuous measures get a bottom axis whereas discrete measures get a category header at the top). The trick here, which I had to Google (leading me back to Andy’s blog of course!) was to duplicate the measures and drag them on to the top of the panel. I then had to clear the axis label and marks at the bottom (you can’t just unclick “show header” as you lose the top labels too).

All in all a fun challenge and some great Tableau calculation practice.

Makeover Monday, 2017 #15

Uh oh! Making over a 2009 graph by Andy Kriebel this week. No pressure. The graph looks at the correlation between gold and oil prices with the premise that they’re closely related. You can see Andy’s own makeover here. Another of my favourites is one by Mike Cisneros, which you can read about here. I love the story telling element Mike achieves; something I was aiming for myself.

I started off looking at each price as a percentage of its average over the period (because an ounce of gold is much more expensive than a barrel of crude oil). This allowed me to produce an area chart showing which commodity was most “expensive” compared to it’s average price at each point during the period:

Dashboard 1



.

I didn’t love this as I couldn’t quite work out what I was trying to tell the viewer, so moved on!

Next up I considered scatter plots by decade with the aim of highlighting outlying points that weren’t fitting the correlation. My thinking was to display one scatter plot per decade – ideally in a tile like view. I probably should have pursued this idea as I think it would have allowed me to address a couple of concerns with the original chart. Instead I developed the idea of a tile per decade, focussing on four key fluctuations in the gold oil price ratio:

gold oil ratio 2



.

I really like the way I can make each tile focus in on a key fluctuation. The tile look was pretty straightforward to achieve. I created four separate charts – one for each decade – and set the background colour for each chart panel and axis. I appreciate that I’m not addressing every change in the ratio of gold to oil price, and perhaps haven’t even picked the most important ones, but it does allow a simple story to emerge for the reader. My story points focus on oil prices and were sourced from Wikipedia (1980s oil glut; 2000s energy crisis; 2010s oil glut). It would have been nice to add a parameter allowing the reader to chose what sort of story they wanted to read – one skewed to oil, gold or a mix. Unfortunately time didn’t allow.

In hindsight I think I may have added too much to each tile. The mini spark lines for prices added a bit of context for each story point, but had to be layered on as floating charts which was fiddly. They could also be misread to indicate that the gold and oil prices were quite close at certain points  because I haven’t synchronised the axis. I’ve tried to address that issue with a note, but in doing so I’ve added more to an already crowded first tile. Learning point: once I start adding I often end up adding more; better to take away! In my defence I did do some pruning. Initially there was a large number at the top right of each tile showing the closing ratio for each decade. I removed that number as it seemed better, and more space effective, to label the key line points I was referring to in the story points.

The story points are added as floating labels. The positioning of these varies depending on what screen resolution I’m using – you’ll see that the Tableau Public version doesn’t look quite the same. It would have been nice to make the viz more robust by using labelled points instead of the floating text.

All in all though a good opportunity to focus on story telling, and learn about oil and gold prices!

Makeover Monday, 2017 #14

The Guardian recently reported on the a PWC report into the risk of UK job losses from breakthroughs in robotics and artificial intelligence. Whilst technological unemployment isn’t a new concept and the report and article certainly do comment on job creation too, the percentages at risk in some sectors are quite striking.

The graphic in the article appears to be clean and easy to read at a glance, however I found it a little hard to interpret the “job automation at potential high risk” percentage, particularly as it is not immediately comparable to the other percentage shown, and that the definition of high risk is itself based on a percentage. The article does go on to clarify the value to be fair but it did initially confuse me.

 

Can we automate our own job to do a makeover?

Before tackling a makeover in Tableau I thought it would be interesting to see if the act of doing a data visualisation could actually be automated, e.g. by IBM Watson Analytics. Signing up for a 30-day IBM Watson trial was quick and easy after which I could upload the source data spreadsheet and have it prepared for analysis. Once prepared I could either ask a structured question of the data set or choose from some recommended forms of analysis for the measures and dimensions identified. I didn’t spend long on this, and whilst a visualisation could indeed be produced with little input from me, it certainly had some flaws. For example the total/all dimension value was included which skews the chart.

Here is the viz that was produced:

UK jobs at risk of automation - IBM Watson



.

Doing the makeover manually – much better?

Moving on to the Tableau makeover, I really liked a submission by Tamara Gross and have borrowed some of the styling from that. My makeover follows below. I’ve converted the percentage of jobs at risk into a percentage of total UK jobs so that it is more comparable with the overall share of employment – we don’t lose the actual proportion at risk as that is apparent from the stacked bars. However we gain a set of percentages that sum up to the overall headline of 30%. The viz is also available on Tableau Public.

DASH



.

But surely we can automate some of the data visualisation process?

Continuing the theme of whether the data visualisation process can be automated I took a look at the Narrative Science Narratives for Tableau Google Chrome extension. This extension is intended to give you an automated explanation of a Tableau visualisation, in “natural, conversational language everyone can understand”. How much time do you spend crafting a headline and narrative for your charts or dashboards? Could this extension really automate that hard work? I took it for a test drive with my makeover.

First up you need to open the link above in Chrome, click ”Try it Now” and install the extension. You then get a leaf icon in the top right of your Chrome tool bar. Click the icon and you should be prompted to register. Once registered if you visit one of your visualisations in Tableau Public and click the icon again you’ll be prompted to select the dimensions and measures that you want to write about:

step01



.

This bit took some fiddling! I couldn’t get the narrative to work without including “measure names” in the dimensions list despite not wanting to write about measure names. Perhaps this hiccup is a reflection of undue complexity in my visualisation though. For simplicity I removed all measures except the percentage share of UK jobs at risk measure.

Once you click next you get a draft narrative and can then tweak various settings, like specifying the word “industry” rather than “entity”, whether the measure is a number or percentage and whether higher values are good, bad or neutral. After changing these settings you can go back to the narrative which will have refreshed.

I selected the least verbose narrative, unlike if I’d written it myself!

step04



.

I’m reasonably happy with the end result. I wouldn’t use it as is, but parts of it would certainly be a great starting point for a narrative. Here’s the start of the text produced:

This analysis measures Percentage share of jobs at high risk by Industry and by Measure Names.

Average Percentage share of jobs at high risk is 4.53% across all 20 industries. Wholesale and retail trade stood out with a very high Percentage share of jobs at high risk value. The individual industries discussed in detail are selected based on the total Percentage share of jobs at high risk across all subcategories.

For Domestic personnel and self-subsistence:

Average Percentage share of jobs at high risk is 0.02% across all three subcategories (representing 0.08% of the total Percentage share of jobs at high risk across all industries).”

I’ve emphasised the aspects I’d keep which nicely highlight a couple of the industries that many in the Makeover Monday community identified to talk about.

Want to find out more about natural language and Tableau? Check out this Tableau blog post if you haven’t already.

 

Wrapping up

In a week where the topic was automation of jobs due to breakthroughs in robotics and AI it was really interesting to try to automate aspects of my own job using AI / analytics tools. I’ve clearly just scratched the surface in terms of what I’ve tried, but my impression is that tools like those discussed can certainly augment the job of data analysis and visualisation. However, there will still be a reliance on human intuition and design skills in this particular field for some time.