One-step Tableau Prep solutions

This quarter I set myself the goal to learn more about Tableau Prep, and a key part of that has been participating in the weekly #PreppinData challenges. Something I’ve noticed, and have been super intrigued about, is that some participants have been posting one-step solutions. That wasn’t surprising during beginner month, but now I’m seeing one-step flows covering reasonably complex multi-step data transformations. Cool!

This week I took a deeper dive into one of those one-step solutions to learn, and share, how they’re being done.

Two quick things first: They say a magician never reveals their secrets, so my apologies in advance to those Tableau Prep magicians who’d rather not see the “magic” shared. Also a hat tip to Hiroaki Morita, who’s week 7 solution I picked as the example to dive into. If I don’t do the techniques justice that’s on me, not Hiroaki!

Right, what do we mean by a “single step” solution? Basically, a single step between the input and output, like so:

For comparison here is my solution to week 7, which has two inputs and then four steps before the output:

The first thing you may notice is that there is only one source: a UNION to pull the two required data sets together. That might seem strange to you as the two tables of data are quite different. I’m going to add a quick “clean” step off the union to take a look at what it’s producing:

What we’re seeing here, boxed in red, is that the two tables are indeed unioned together, and so we get two chunks of data. The first chunk has the columns and rows from the first table of data (the couples and when their relationship started), and the second chunk has the columns and rows from the second table (the gift relevant to each year of a relationship – this challenge was about finding the right gift for each couple based on the length of their relationship).

This union approach seems to be a hallmark of single step solutions; get all of the data into one place/table so that we can operate over it in a single step. It does leave us with a challenge though as it’s not structured or related in the sorts of ways we’re used to, so let’s see how that is dealt with in the single clean step.

There are four sub-steps (or changes) in that clean step:

The first change is to calculate a consistent field across the two sets of data – in this case a “number of valentines days as a couple”. Where Year is NULL (no data) we’ll use the relationship start date [A] Where Year is not NULL we’ll remove the st, nd, rd, th letters from that Year field to just leave a number [B]. In a multi-step solution this would be the consistent field to join the two data sets together.

Because we have all the data in one data set we don’t need to join, instead we need to “look up” the gift from chunk B and plug it into chunk A (where it is missing). And we want to do that based on the consistent field calculated above. So in the screenshot above we can see that we have “number of valentines days as a couple” = 4 on the top row (for “The Loves”) but no gift (it is null). But we also have a row in chunk B with “number of days…” = 4 where we do have the gift (“Fruit/Flowers”).

The next change handles the look up. It uses a FIXED level of detail expression to say “get me the maximum gift from across the whole data set, for this number of valentines days”. Aggregations like MAX will ignore NULLs so we in essence look up the gift from chunk B:

This is potentially another hallmark of one-step solutions then: lookup the value you need from further down the combined data set, rather than using joins.

The fourth change is to filter down to just chunk A, chunk B was only there for the look up after all:

After that the solution simply removes the unecesary columns to be ready to produce the output. Clever, eh!

For me the key points of this one-step solution were:

  1. Get all of the data into one place/table so that we can operate over it in a single step.
  2. Lookup the value you need from further down the combined data set, rather than using joins.
  3. Filter out the data we only pulled in for the lookup.

I hope that you found this as intriguing as I did. And if you’re interested to see more one-step magic keep an eye on the #prep1stepclub X/Twitter hash tag!

UPDATE: Following a really good discussion with another member of the data community I thought I’d add a few notes about why and when you might use a one-step solution. Our conclusion was that one of the strengths of Tableau Prep is it’s clear, easy to understand and maintain, visual layout. Maintainability is a really important consideration, so you may never* use a one-step solution in production, favouring clarity and maintainability instead. However for your own professional development one-step solutions present a useful challenge. They introduce a constraint that forces you to think about problems differently, and in all likelihood use product features that you wouldn’t normally use. That gives you good practice. And afterall, why do we climb hills and mountains that we could otherwise go around?

* Although I say “never” I should point out that I haven’t performance tested common one-step solution patterns against their more natural counterparts. Consequently there may be some benefits (or indeed further drawbacks) that I’m not yet aware of. 

Transform field per line data with Tableau Prep (1)

I recently answered a question on the Tableau Community Forums about transposing data from a text file that had a field per line, with a line of dashes separating records. I’m not sure what the formal name for this format is, but there are similarities with RecFiles.

Here is an example:

I don’t know of a way to use data formated like that directly in Tableau Desktop. But we can use Tableau Prep to transform it into a more natural row per record format!

In this post I’ll cover how I suggested doing that for the forum question. And I plan to follow up with some more generic and robust options in a future post.

First lets take a look at the overall flow given the data above:

What we have here is:

  • An input step (on the left) to load in the file
  • A “clean” step to add a record number to each line
  • Three steps to separate the lines for each field
  • Join steps (Name+Age and Name+Age+Eyes) to join the data set for each field back together to give a traditional record structure
  • An output step to write out as CSV

Diving into each of these components:

The input step defines a split on TAB, headers (F1, F2, etc.), and enables the built in source row number that Prep can add. This row number will be important for identifying a record number next.

The next step adds a record number and removes the dashes which act as a record divider:

Record number is calculated using:

DIV([Source Row Number]-1,4)+1

This is basically just a whole number division (using DIV) of the row number by the number of rows per record, including the divider (4 in this case). Then we filter out the rows with the dashes to get rid of the record dividers. Note that I’ve also neatened up the field names in column F1 above to remove the colon.

Caveat: Because the record number is based on an expected number of fields, this approach won’t be robust enough to handle optional fields that do not appear on some records. This is one reason I’d like to come back and do another post on the topic!

Next we have a clean up step per field to grab just that field and it’s row number, including renaming the column header (F2) to the field name. Here is the step for “Name”:

This is repeated per field (annoyingly if you have a large number of fields!) but starts to get us closer to what looks like a row per record.

At this point though each step gives us a record with just one of the fields, and its record number. So next we need to join these up, two at a time, to bring the fields per record back together:

One more of these joins gives us a final output like this…

… meaning that we’ve successfully transformed a data set where each field is on it’s own line, into a more traditional row per record / CSV format, which is much more suited to analysis in a tool like Tableau Desktop.

Watch this space for part 2 where I dig into alternative and more robust approaches – e.g. to handle optional fields!

UPDATE: part 2 is now available.

Tableau Prep and #PreppinData 2024 week 8

#PreppinData 2024 week 8 – a “what if?” analysis of two different customer loyalty reward systems for Prep Air. Aiming to identify cost and number of customers benefiting.

The “estimated yearly flights” calculation tripped me up for a while, out thinking it with a datediff on days, and only when the flights spanned more than a year. The challenge just required a division by the number of years flown over! I enjoyed expanding the data set throughout the flow (pivoting the benefits, joining onto cost per benefit, and then joining onto those tiers less then or equal to each customer’s tier) to then roll back up at the end.

PD 2024 Wk 8

 

Tableau Prep and #PreppinData 2024 week 6

The #PreppinData 2024 week 6 challenge was to find the latest salary per staff member and summarise their tax position given UK income tax bands.

We’re now into intermediary level challenges and so there are less prescriptive steps, and more options to solve the problem your way. For me the problem had two key parts: (1) get the latest row per staff member; and (2) the various calculations for salary and tax paid based on tax bands.

For part one I introduced an aggregage step to get the maximum row number per staff member, and then joined that back onto the input to return only the detals for that last row. I wondered if an alternative might have been to use a “level of detail” expression, with a filter, in a single step.

For part two I included all of the calculations in one “clean” step. I did hardcode the tax bands, and probably could have used a mapping table to allow for reuse in future years. I also hardcoded the sum of month [1] to [12], but perhaps could have found a way to allow for less months to be in the data set, in case the flow needed to be run mid-year.

PD 2024 Wk 6

Tableau Prep and #PreppinData 2024 week 5

The final week of beginner month for #PreppinData involved a bit more complexity around joins, calculations and outputs. On top of Tableau’s getting started tutorial #PreppinData has been a great way to get into Tableau Prep. I’ve invested about 12-15 hours of time and feel like I’ve got a good initial grasp of the product.

The challenge walk through provided less info on the “how”, which in some ways was quite nice as I felt more license to solve the problem my way. On the other hand I wonder if I should have made my flow less complex instead of aiming for one data set that could then be filtered down to the different outputs.

#PreppinData 2024 week 5 solution flow

Tableau Prep and #PreppinData 2024 week 4

#PreppinData 2024 week covered using join types, in this case to understand which seats aren’t chosen given a seating plan and booking data.

I had a preconceived idea of the solution here, as I’m used to using LEFT OUTER JOIN in SQL and then having a WHERE clause that returns rows where the result from the right hand table IS NULL. So I was expecting to have a join and then a filter in my flow. However, Tableau Prep has some additional join types that let you return entries where there are only values in the left table, only values in the right table, or even a “not inner” join for entries only in the left or right but not in both. I gave the left only option a go and it did the job nicely! Great how you can click on the segment of the venn diagram representation of the join to select the type too.

PreppinData week 4 solution using left only join

 

Extra left and right only join types in Tableau Prep

Tableau Prep and #PreppinData 2024 week 3

Two lots of Tableau Prep practice this week. A forum question (see end of post) and #PreppinData 2024 week 3. The challenge for #PreppinData was to join targets from a spreadsheet, with a sheet per quarter, to our previous sales figures. And then to calculate difference from target. Similar union and clean up steps to previous challenges to get to the point where there are two data sets to join, and where we have consistent fields in both (first letter of class, and a month number). Then the join is pretty straightforward:

Tableau prep solution for PreppinData week 3, showing join


The forum question involved duplicating a set of rows – once for each value in a comma separated list in one of the columns. And then filtering out any cases where a value in another column appeared in the list. What I found interesting about Tableau Prep in this case was that I can specify a wildcard search for the pivot (B below), but for the initial split whilst I can select “all” it does still hardcode the number of columns split out (A below). So one of the tasks would robustly handle the introduction of more values in the comma separated list, but the other task would not … I think. The workarounds I came across seemed to be to work out how many values you could have in the string and specify enough splits to handle that number. I wonder if that could be improved…

forum-question-2024-01-a

Tableau Prep and #PreppinData 2024 week 2

Week two of getting to grips with Tableau Prep and I decided to countinue with #PreppingData. The team of Carl Allchin, Jenny Martin and Tom Prowse do a great job of picking challenges that gradually introduce you to functionality. This week covered unions, aggregation and reshaping data using pivots. I was particularly interested in pivots, as that’s a frequent challenge people have on the Tableau forums where we talk about data prep being a good option.

A more complex flow this time, and it probably took around 45-60 minutes. Below is my finished flow. And following that I share some more thoughts on this weeks use of Tableau Prep in the form of a quick “I like, I wish, I wonder” retro. Spoiler alert, there was a wow moment for me this week!

Week 2 solution:

My PreppinData week 2 solution


I like:

  • I liked that I could copy and paste the first aggregation step and then change the type of aggregation and the description on the copy.
  • And there was a moment of genuine delight when I realised I could copy the pivot step and paste against a different input (with the same pattern). I loved that!
    Tableau Prep option to paste in a step



I wish:

  • The fields didn’t reorder between steps. This is probably my OCD talking though!
  • I could define a sort order for the output. It would make it easier to sense check the output (granted it is easy to open in Excel and sort though).

I wonder:

  • If I needed to aggregate before pivoting, as the pivot step allowed me to pick an aggregation?
  • if it would be valuable to have a quick create option for pivot “rows to columns”? It took me a while to spot the option to change the pivot direction (screenshot below), when my brain was looking for an “unpivot” option instead.
    Tableau Prep pivot option to switch between Rows to Columns and Coliumns to Rows

Getting started with Tableau Prep

I’ve been meaning to explore Tableau Prep for a while and finally took it for a test drive.

Many data professionals have experienced the need to prepare and clean data prior to analysis in tools like Tableau Desktop. Classic examples are: splitting data out of a single combined field; un-pivoting when each year of a measure is in a separate column; or maybe combining sales data from multiple differently formatted sources.

For me data preparation has usually been in our SQL Server / Data Warehouse environment, using SQL or enterprise integration and ETL tools like SSIS, WhereScape RED, or ADF & Databricks. For others it’s likely to have involved tidy up in Excel, a tool like EasyMorph, or even custom development in Python. But for analysts a number of these tools aren’t always accessible.

Tableau Prep is a tool that makes it easier and faster for analysts to combine, shape, and clean data for analysis within tools like Tableau. It provides a visual low-code environment for repeatable “flows” to get your data ship-shape. I’m often answering Tableau community forum questions with the comment that “yes you can do this in Desktop, but it would be easier in a data preparation stage”. Hence I’ve been keen to give it a go myself.

How did I get started? Three easy steps!

1. Downloaded Tableau Prep. You can download the free trial, then if you have a Creator license you can enter your license key. Chances are if you’re a Tableau Desktop user you have a Creator license.

2. Followed the Get Started with Tableau Prep Builder tutorial on the Tableau help site. I found that this took 1-2 hours, and was a nicely paced introduction to the UI and basic features. I like how it encouraged you to “give something a go” – there is always an undo button!

3. Completed the #PreppinData 2024 week 1 challenge. This took about 30 minutes as most of what I needed had been covered above!

PreppinData is one of the amazing projects that is run by, and for, Tableau community members. If you’ve participated in Makeover Monday, Workout Wednesday, Back to Viz Basics, or one of the other community projects, then the approach will be familiar to you. Each week the organisers post a challenge to social media and their site. The challenge comes with a currated input data set, some requirements for you to try in Tableau Prep (or other data prep tool of your choice), and the expected output. You can give the challenge a go, share you solution and see what other members of the community come up with. A great way to learn and practice!

Week 1 challenge – tidy and split some fake airline loyalty card data into two files ready for analysis.

My solution:

- One clean step to: split out the combined field using two custom splits (Flight Details on //, From-To on -); remove now unecesary fields; rename and fix up data types;  and replace 1/0 with Yes/No.

- Two clean steps to produce the two filtered lists (card holders and non-card holders).

- Two corresponding output steps to output the required CSV files.

Summary screenshot below. Impressions so far … loving the tool! So much neater and repeatable than a series of Excel sheets, and seemed more accessible for analysts than other tools I’ve used.

PreppinData 2024 week 1 solution summary showing flow and steps