Recently I blogged on the pitfalls of not investing sufficient effort at the start of a QlikView project to avoid problems later on (see the post Start Your QlikView Project The Right Way). In this article I advised that the single most significant factor in whether a QlikView project is a success is a well designed data model. Here I would like to share some of my suggestions for getting the data model right, as this could save you a lot of pain if you are just about to build a QlikView Document or indeed a whole environment.
First of all, it is crucial that you decide exactly which columns and rows you require and only pull them in to the data model. Redundant fields can cause unncessary confusion, performance issues and ultimately extra effort. Prototyping with more fields is acceptable – but strip back once you know what is required.
Careful consideration also has to be given to the fact that QlikView associations work in a very different way to SQL joins. As you probably already know; any fields with the same name in more than one table will associate. If there are more than one field with the same name a synthetic key is created – and this is always a ‘bad thing’. If you see extra tables in your data model with a dollar sign in the name you need to revise and improve your model.
In my experience it is almost always better to concatenate fact tables together rather than to try and join them. This way you can have many columns with the same name and the values in them will appear just as one field in QlikView. Selecting a value in that field will pull out values from each of the concatenated tables. This technique is especially useful when pulling data from different sources, or applying budget or forecast data into your model. Concatenation is ideally done by having exactly the same fields in all tables you wish to concatenate (concatenation will then happen automatically) or by forcing a concatenation with a CONCATENATE statement. If using the latter method make sure you are aware of the impact on Optimized QVD Loads.
Whenever you are building on top of a database with a number of fair sized tables get the business to consider exactly what they need from each table – columns and rows. If they have direct access to the back end database they can investigate the tables that way. However, what I generally do is build a separate QlikView document for each source table (so there are no join issues) that they can use for this investigation. Each document lists each field in that table in a multi box. This allows users to see every value in that table and decide whether it is required or not. Typically I will add a Table Box of all values (with a calculation condition to limit rows) so they can see all of the raw data in context. Encourage the business users to be ruthless with this shedding of fields and rows – it could really help as the project progresses. A good example of shedding rows is historical data – most businesses change their shape considerably over a a few years – so comparisons with data more than a couple of years old is usually not relevant.
During this data identification process also get the users to think about how the fields should be referred to in the UI. Consistency and clarity are essential. I will often provide spreadsheets of the source field names (or even a printout if they are paper-people) and get them to fill in the required ‘user friendly’ field names. Rename to these field names as soon as you extract from the database – using an ‘as’ statement in the load. Insert spaces in field names where these are not present in the source – this will require you to use square brackets in expressions but it will save retyping legends each time a field is used. Also any underscores and the like should be removed at this point.
Do all you can to simplify and rationalise the data model in the load script. This is better than having lots of joins in the finished data model. The best way of doing this is with ApplyMap (for more details see my previous posting ApplyMap – Is It So Wrong?). Pre-calculate any values you can in the load script to save processor time later (perhaps calculate margins or FX values?). Also, change any codes into values (eg. Y/N into Yes/No) thinking always of what your users will expect to see. I always add a ‘Count’ field to each table (ie. ’1 as CustomerCount,’) – as this can greatly simplify expressions later. In short, the more work you can do in your load script the better – the benefits to document performance and development time can be immense.
It is almost always (and I had to think twice about the ‘almost’ there) a good idea to have a QVD layer built into your reload strategy. These are essential for incremental loads, but are also very useful for chunking up your data extracts into manageable portions. It can also aid in the breaking down of the development / UAT / promote cycle into segments that are simpler to unit test. Extracting of data from multiple tables in the source databases can be done concurrently if there is a separate ‘QVD Load’ routine for each source table.
When data is loaded into your presentation layer test any remaining joins carefully. As previously mentioned do not allow synthetic keys as whilst these can work some of the time they will always hurt performance and can cause crashes - always build your own composite keys if required. Watch out for issues with non-numeric join keys that can be caused by rogue spaces or mismatch of capitalisation. The best thing to remember when testing a data model is to assume nothing.
Whilst QlikView is often viewed as a tool for producing flashy charts that are dynamic and engaging it is in fact nothing without a solid data model behind it. In most developments more than half of the time needs to be spent on perfecting the data model. If you ensure that you invest that time then you have a far greater chance of success in your QlikView projects.





Nice one Steve!
Actually, this is the kind of post that, if fed properly, gives you a real idea on how to approach to the earlier stages of the QlikView development.
I’d stress on the AutoNumber() types of functions for linking tables and differentiate since the very beginning the fields which values will be shown and the fields that will be used for expressions, joins, maps, links and so. For example, a MonthName() field to show the actual date properly formatted in the charts, and another Num() field to be used in set analysis. This might not be worth doing when a document only pulls a few hundreds of thousands of records, but boosts performance dramatically when the document works with millions of rows.
Finally, it depends largely on the requirements and the data model of course, but the fewer the steps between tables the faster the model will work. It’s always preferable to use more disk space storing more fields (numeric, strings) than make more complex expressions that will result on slower performance and poorer user experience.
Thanks Steve for an interesting post. Like you also mentioned in your post I would really like to stress the importance of doing as much of the calculations as possible in the load scripts, rather than in the sheet objects.
I also like to separate my load scripts into tabs based on the tabs I will use in the UI. That makes it a bit clearer (together with comments of course), especially if someone else will work on the data model in the future. Cheers Mathias
Miguel said:
For example, a MonthName() field to show the actual date properly formatted in the charts, and another Num() field to be used in set analysis.
I never thought of that and could be a large driver in performance. Great tip. Oh yeah, and great article as well, Steve.
Hi Steve,
Thanks for the great post. You suggested to combine multiple fact tables in to a single fact. Can you shed some light on how to handle the data model in case there are fact tables with different levels of granularity.
Regards,
Sanjay
Hi Sanjay – thanks for your comment. Ideally, if the most granular data sums up to your aggregated data, you would only bring in the lowest level of granularity. QlikView will sum up across many rows very quickly. If the granular data doesn’t roll up to the top level (perhaps the lowest level is based on transaction date whilst your rolled up data is at invoice date) it is best to have both sets of data with separate fields for the values. So, to take the example above, you would pull in the transaction level data into a field called Transaction Amount and force a zero into the Invoice Amount field. Then you would concatenate the invoice level data in with a value in the Invoice Amount field and a zero in the Transaction Amount field. I have projects where there are several different types of data concatenated in this way.
Nice article Steve, there is no substitute for a solid data model in any BI project.
Regards,
Nagendhran Rangarajan
I agree with you, Steve!
Good data model is the key for one sucessful QV dashboard.
As my point of view and habit, I prefer the dimensional modeling for QV model. I found the synthetic key is always big problem in QV model based on traditional table relationship(ER mode), some problems we need to fix, like loop bug!
Based on dimensional model(Star schema), relationship of tables is quite clear and our QV job will be also high efficient. so most of time, this dimensional model will be obtained from DM and EDW. Most of job for table integration and clear will be implemented by ETL staff, QV guys should focus on dashboard design rather than cleaning synthetic key bug.
Hi Steve,
Regarding concatenating two fact tables into one. If both of the tables data is in hundred millions of record then concatenating may impact the performance as we need to use set analysis or if clause in expression for differentiating the data. We can take the example of Transaction and Invoice amount.
What should be the solution in that case.
Regards,
Anosh Nathaniel
Hi Anosh,
You will find that using Set Analysis to access half of the rows in one huge fact table is much more performant than accessing values in separate tables with a complex join key. In the case of invoice line items and invoice details there will be an obvious join key, and for simplicity with relatively small numbers of rows I might go for two tables with a join – however with serious numbers of rows the single fact will be better. The other advantage of concatenation is that if you have any other commonality (eg. Date, Year, Qtr) etc. you can have single fields for these – rather than needing separate fields with slightly different names on each of the tables.
Hi Steve,
Thanks for your reply. I understood what you want to say. I have few questions regarding implementation and bug fixing of Qlikview appliation.
1. How the tabs in Qlikview for datamodel script and QVD generation script should be created? Should each tab be created based on database tables or they should be created based on UI tabs? Which approach should be better in terms of code quality and maintainability?
2. If an production Qlikview application is taking ~1 hour to load and there is any changes/modification/bug comes, doing a little change in script and loading it fully to see the effect is very cumbersome process. We can also partially load the script but it can only be used to check whether our changes are syntactically correct.
Any suggestion how can we speed up this process?
Hi Anosh,
I personally always have a tab per main datasource in my load scripts – with no bearing on UI tabs at all. Other tabs that may be present are Mapping Tables, Looping Constructs, Variables, Clean-Up.
If you have an hour long load there is probably some optimisation that can take place – certainly incremental loads if those are not already in place.
If you can split out the creation of different QVD files into separate generators this can mean that you can work on one at a time that can save a lot of time – it also means you can schedule things to run concurrently which can also speed up overall run times. Where I do have multiple QVD creates in a single app I will often put each tab in an IF statement, with a set of flags on the first tab to select which tabs should run.
Debug with limited rows can be useful – but make sure you comment out any QVD writes if you use this (particularly if used for incrementals). Also, putting a TOP statement in your SQL script rather than using the QlikView debug can be quicker – and means you can get fewer rows from just one of the tables in your load.
It’s always a case of finding the best solution for the specific set of circumstances – no two implementations are ever exactly the same!