Transform field per line data with Tableau Prep (1)

I recently answered a question on the Tableau Community Forums about transposing data from a text file that had a field per line, with a line of dashes separating records. I’m not sure what the formal name for this format is, but there are similarities with RecFiles.

Here is an example:

I don’t know of a way to use data formated like that directly in Tableau Desktop. But we can use Tableau Prep to transform it into a more natural row per record format!

In this post I’ll cover how I suggested doing that for the forum question. And I plan to follow up with some more generic and robust options in a future post.

First lets take a look at the overall flow given the data above:

What we have here is:

  • An input step (on the left) to load in the file
  • A “clean” step to add a record number to each line
  • Three steps to separate the lines for each field
  • Join steps (Name+Age and Name+Age+Eyes) to join the data set for each field back together to give a traditional record structure
  • An output step to write out as CSV

Diving into each of these components:

The input step defines a split on TAB, headers (F1, F2, etc.), and enables the built in source row number that Prep can add. This row number will be important for identifying a record number next.

The next step adds a record number and removes the dashes which act as a record divider:

Record number is calculated using:

DIV([Source Row Number]-1,4)+1

This is basically just a whole number division (using DIV) of the row number by the number of rows per record, including the divider (4 in this case). Then we filter out the rows with the dashes to get rid of the record dividers. Note that I’ve also neatened up the field names in column F1 above to remove the colon.

Caveat: Because the record number is based on an expected number of fields, this approach won’t be robust enough to handle optional fields that do not appear on some records. This is one reason I’d like to come back and do another post on the topic!

Next we have a clean up step per field to grab just that field and it’s row number, including renaming the column header (F2) to the field name. Here is the step for “Name”:

This is repeated per field (annoyingly if you have a large number of fields!) but starts to get us closer to what looks like a row per record.

At this point though each step gives us a record with just one of the fields, and its record number. So next we need to join these up, two at a time, to bring the fields per record back together:

One more of these joins gives us a final output like this…

… meaning that we’ve successfully transformed a data set where each field is on it’s own line, into a more traditional row per record / CSV format, which is much more suited to analysis in a tool like Tableau Desktop.

Watch this space for part 2 where I dig into alternative and more robust approaches – e.g. to handle optional fields!

UPDATE: part 2 is now available.