Recently I blogged on the pitfalls of not investing sufficient effort at the start of a QlikView project to avoid problems later on (see the post Start Your QlikView Project The Right Way). In this article I advised that the single most significant factor in whether a QlikView project is a success is a well designed data model. Here I would like to share some of my suggestions for getting the data model right, as this could save you a lot of pain if you are just about to build a QlikView Document or indeed a whole environment.
First of all, it is crucial that you decide exactly which columns and rows you require and only pull them in to the data model. Redundant fields can cause unncessary confusion, performance issues and ultimately extra effort. Prototyping with more fields is acceptable – but strip back once you know what is required.
Careful consideration also has to be given to the fact that QlikView associations work in a very different way to SQL joins. As you probably already know; any fields with the same name in more than one table will associate. If there are more than one field with the same name a synthetic key is created – and this is always a ‘bad thing’. If you see extra tables in your data model with a dollar sign in the name you need to revise and improve your model.
In my experience it is almost always better to concatenate fact tables together rather than to try and join them. This way you can have many columns with the same name and the values in them will appear just as one field in QlikView. Selecting a value in that field will pull out values from each of the concatenated tables. This technique is especially useful when pulling data from different sources, or applying budget or forecast data into your model. Concatenation is ideally done by having exactly the same fields in all tables you wish to concatenate (concatenation will then happen automatically) or by forcing a concatenation with a CONCATENATE statement. If using the latter method make sure you are aware of the impact on Optimized QVD Loads.
Whenever you are building on top of a database with a number of fair sized tables get the business to consider exactly what they need from each table – columns and rows. If they have direct access to the back end database they can investigate the tables that way. However, what I generally do is build a separate QlikView document for each source table (so there are no join issues) that they can use for this investigation. Each document lists each field in that table in a multi box. This allows users to see every value in that table and decide whether it is required or not. Typically I will add a Table Box of all values (with a calculation condition to limit rows) so they can see all of the raw data in context. Encourage the business users to be ruthless with this shedding of fields and rows – it could really help as the project progresses. A good example of shedding rows is historical data – most businesses change their shape considerably over a a few years – so comparisons with data more than a couple of years old is usually not relevant.
During this data identification process also get the users to think about how the fields should be referred to in the UI. Consistency and clarity are essential. I will often provide spreadsheets of the source field names (or even a printout if they are paper-people) and get them to fill in the required ‘user friendly’ field names. Rename to these field names as soon as you extract from the database – using an ‘as’ statement in the load. Insert spaces in field names where these are not present in the source – this will require you to use square brackets in expressions but it will save retyping legends each time a field is used. Also any underscores and the like should be removed at this point.
Do all you can to simplify and rationalise the data model in the load script. This is better than having lots of joins in the finished data model. The best way of doing this is with ApplyMap (for more details see my previous posting ApplyMap – Is It So Wrong?). Pre-calculate any values you can in the load script to save processor time later (perhaps calculate margins or FX values?). Also, change any codes into values (eg. Y/N into Yes/No) thinking always of what your users will expect to see. I always add a ‘Count’ field to each table (ie. ’1 as CustomerCount,’) – as this can greatly simplify expressions later. In short, the more work you can do in your load script the better – the benefits to document performance and development time can be immense.
It is almost always (and I had to think twice about the ‘almost’ there) a good idea to have a QVD layer built into your reload strategy. These are essential for incremental loads, but are also very useful for chunking up your data extracts into manageable portions. It can also aid in the breaking down of the development / UAT / promote cycle into segments that are simpler to unit test. Extracting of data from multiple tables in the source databases can be done concurrently if there is a separate ‘QVD Load’ routine for each source table.
When data is loaded into your presentation layer test any remaining joins carefully. As previously mentioned do not allow synthetic keys as whilst these can work some of the time they will always hurt performance and can cause crashes - always build your own composite keys if required. Watch out for issues with non-numeric join keys that can be caused by rogue spaces or mismatch of capitalisation. The best thing to remember when testing a data model is to assume nothing.
Whilst QlikView is often viewed as a tool for producing flashy charts that are dynamic and engaging it is in fact nothing without a solid data model behind it. In most developments more than half of the time needs to be spent on perfecting the data model. If you ensure that you invest that time then you have a far greater chance of success in your QlikView projects.
Welcome to the Quick Intelligence blog - a space where I post my thoughts on all things QlikView and related topics. 

Nice one Steve!
Actually, this is the kind of post that, if fed properly, gives you a real idea on how to approach to the earlier stages of the QlikView development.
I’d stress on the AutoNumber() types of functions for linking tables and differentiate since the very beginning the fields which values will be shown and the fields that will be used for expressions, joins, maps, links and so. For example, a MonthName() field to show the actual date properly formatted in the charts, and another Num() field to be used in set analysis. This might not be worth doing when a document only pulls a few hundreds of thousands of records, but boosts performance dramatically when the document works with millions of rows.
Finally, it depends largely on the requirements and the data model of course, but the fewer the steps between tables the faster the model will work. It’s always preferable to use more disk space storing more fields (numeric, strings) than make more complex expressions that will result on slower performance and poorer user experience.
Thanks Steve for an interesting post. Like you also mentioned in your post I would really like to stress the importance of doing as much of the calculations as possible in the load scripts, rather than in the sheet objects.
I also like to separate my load scripts into tabs based on the tabs I will use in the UI. That makes it a bit clearer (together with comments of course), especially if someone else will work on the data model in the future. Cheers Mathias
Miguel said:
For example, a MonthName() field to show the actual date properly formatted in the charts, and another Num() field to be used in set analysis.
I never thought of that and could be a large driver in performance. Great tip. Oh yeah, and great article as well, Steve.
Hi Steve,
Thanks for the great post. You suggested to combine multiple fact tables in to a single fact. Can you shed some light on how to handle the data model in case there are fact tables with different levels of granularity.
Regards,
Sanjay
Hi Sanjay – thanks for your comment. Ideally, if the most granular data sums up to your aggregated data, you would only bring in the lowest level of granularity. QlikView will sum up across many rows very quickly. If the granular data doesn’t roll up to the top level (perhaps the lowest level is based on transaction date whilst your rolled up data is at invoice date) it is best to have both sets of data with separate fields for the values. So, to take the example above, you would pull in the transaction level data into a field called Transaction Amount and force a zero into the Invoice Amount field. Then you would concatenate the invoice level data in with a value in the Invoice Amount field and a zero in the Transaction Amount field. I have projects where there are several different types of data concatenated in this way.
Nice article Steve, there is no substitute for a solid data model in any BI project.
Regards,
Nagendhran Rangarajan