Transform field per line data with Tableau Prep (2)

A couple of weeks ago I wrote about a Tableau Prep approach to transposing data from a text file that had a field per line, with another line separating records. At the time I noted that the approach wasn’t robust enough to handle optional fields, and that it would be annoying to need a join per field in cases where you had a large number of fields. In this follow up post I look at an alternative that doesn’t have those drawbacks.

The basic approach is to use a pivot of rows to columns, along with a record number to group fields from the same record. I also introduce a mapping table to allow for less hardcoding of field name and the record separator.

Here is the example source data I have used:

You’ll notice that Hobby and Food are optional fields that don’t appear on every record.

I also have an Excel sheet the defines the record structure:

Here you will see that I map a Field Label (which appears in the source date) to a Field Name that I want in the output. There is also a field called “(end)” which defines the record separator – in this case a blank line.

The Tableau Prep flow is as follows:

This flow:

  • Loads the source and structure files
  • Prepares them to allow them to be joined
  • Joins them together so that we have field names and know the “end of record” lines
  • Calculates a record number by looking for the “end of record” lines
  • Pivots the data from rows to columns
  • And outputs the result

When loading the source data I generate column names (F1, F2) and switch on the extra source row number field (which will be required for the record number calc later):

The “prep to join” steps replace NULLs with empty strings (for the record separator lines) and, for the source data, renames F1 and F2 to Field Label and Value.

The join step is now fairly straightforward, just joining the source data to the structure info on Field Label:

It results in the data like this:

For the pivot step we need something that groups fields for a record together. The “calc record num” step does that, as well as a little tidy up to remove now unnecessary fields and the record separator lines.

The calc for Record Number is using a running sum to work through the source lines in order (ORDERBY [Source Row Number] ASC), generating a number that increments every time we hit an “end of record” line:

 [Source Row Number] ASC:
 IF [Field Name]='(end)' THEN 1 ELSE 0 END

A couple of notes re this calc. It actually makes the “end of record” line part of the next record, but as those lines will be filtered out anyway that doesn’t matter. I also have a follow up calc that adds 1 – it isn’t really necessary but Record Number “0″ looks a bit strange! This is the part that avoids the problem in my previous approach where optional fields weren’t catered for. Previously I was using whole number division to generate a record number from the source row number, and that relied on a set number of fields. The running sum option just looks for the record separators.

Next up, a basic pivot translates the rows to columns:

The pivot requires an aggregation for the values, but as we only have one value per row we can pick either MIN or MAX here. This pivot avoids the join per field that I had in the previous solution.

The output step then outputs the results, in this case:

NB: bold/line added after output.

The new approach successfully handles optional fields, doesn’t require a join per field, and has the added benefit of a mapping file to define structure.¬†One downside is that the field/column order is reversed, but as the resulting CSV is for analysis in Tableau (say) that doesn’t really matter. In my first run I didn’t include the “Food” field; adding that in for a second run. So the solution is reasonably plug and play! You can use the same flow with your own source data and structure definition, you just have to right click the pivot step and “refresh” for it to pick up new field names.

A packaged flow for use in Tableau Prep is available on the Tableau Forums.