Is it so wrong to have a favourite statement in a programming language? Probably. However, I am not embarrassed to admit that in QlikView mine is ApplyMap.
On face value it is a humble little function, which I tend to explain to the un-initiated as being “a bit like a VLookup in Excel”. This however is to do it a great disservice.
So, why do I hold ApplyMap in such great affection?
Basically it helps make light work of fixing problems in the data model that can otherwise lead to serious performance issues. Let me demonstrate.
Let us start with the assertion that too many tables in your data model is a bad thing (a topic for another article). So, you don’t want a situation where you have many lookups joined to your fact table. This can be resolved by creating a mapping table and switching out the ID with the value using ApplyMap. In its simplest form you just create a mapping table and give it a name, this name is then used in the ApplyMap statement – which then swaps the ID for the value. The basic syntax looks like this:
Simple enough. Just a couple of quick points after the first example:
– The mapping table must have only two columns
– The first column is always the ID and second column always the value
– The column names in the mapping table are not relevant
– The mapping table only exists during load
– No associations are made by column name
– Getting the table name right is essential
– No error is raised if you use a mapping table name that doesn’t exist
– It is easy to miss the MAPPING prefix on the load – check this before reload!
Right, now we have the basics down we can look further.
The source from the mapping table can be database, spreadsheet, resident table, QVD (requires a workaround in v9 and below) or as in this example an inline table:
This is great for very simple fixed replacements, such as swapping Y / N with Yes / No. A single mapping table can be used many times in a script – which can be useful for these sort of applications.
If you are of a SQL background you may now be thinking that you could resolve lookups like these with a JOIN in SQL – maybe a view on the SQL Server that resolves these links pre-load? If so, let me stop you there. This approach is simple but can drastically increase load times as the full description will be sent from the database to QlikView many times (once per row). With ApplyMap the much shorter ID is sent many times and is switched out for the longer description during the QlikView load. Your network and users will thank you for not building a view up front!
Now, where ApplyMap starts to set itself apart is in its third (optional) parameter. If omitted the function returns the ID – this is great for debugging, but can produce field values that look untidy. The third parameter allows you to return something more useful, such as a fixed string like Unknown:
Alternatively you could opt for a different field from the same table instead:
Or build something using a mixture of both:
With the third parameter ApplyMap has already exceeded what can be done with a join. I shall revisit the third parameter later with something which I think really sets ApplyMap apart from the rest of the field.
The third parameter (or its omission) neatly deals with the problems that can arise if your lookup table is missing rows. The missing rows can then easily be identified and fixed at source. But what if there are duplicates? With a SQL JOIN statement this can be disastrous – as a duplicate row is caused in your fact table often leading to double counting. With ApplyMap the first instance of the key in the mapping table is used and the subsequent duplicate values are ignored – avoiding any duplication of fact data. If you wish to identify duplicates in your lookup tables this can be done in a separate QlikView document – or another tool if you wish. Rest assured that with ApplyMap duplicates will not cause you to get incorrect numbers.
As with associations between tables in QlikView, mappings can only be done on a single key field. This can be dealt with elegantly by building a composite key on both the mapping table and the table being mapped to:
The good thing here is that if a long concatenation of fields is required to get a unique key it does not take up valuable space in memory in the final document as the field is dropped once it has been used for the mapping.
If multiple fields are required from a table then you may consider joining. Alternatively you can create multiple maps from the same table. This will be more efficient at run time, but requires more code in the load script which can become unweildy. I tend to draw the line at around four or five fields from a single table for this approach:
Be aware that as soon as there is a one to many situation between tables (ie. not a straight lookup) then a separate table must be loaded into the data model – rather than using a mapping table.
The parameters of the ApplyMap statement can be fixed strings, other functions and fields from the row being loaded, this example has a mix of those things:
When building mapping tables you are sometimes presented with getting information from tables that themselves have ID’s in. These can be dealt with by using an ApplyMap statement in the build of the mapping table, like this:
Various permutations of using one map within the creation of another are possible. All the techniques discussed above can also be layered on top of other mapping tables – perhaps a composite key for one mapping table built from a field value and the output from another ApplyMap statement?
Another way that mapping tables can be combined is by using an ApplyMap statement as the one of the parameters of another ApplyMap statement. In this example if nothing is found when looking up on the first field then a different map on another field is tried:
As you can see one ApplyMap can be nested happily inside another one. This nesting can go to as many levels as you require.
In it’s simplest form ApplyMap can serve a very important purpose with very little code required. By nesting maps, combining with other functions and concatenating; complex manipulations can be done on data load that save large amounts of memory and processor time at run time – when it is needed most.
This little statement has made my life easier on countless occasions – I don’t think it is too wrong to call it my favorite?