Welcome to the Quick Intelligence blog - a space where I post my thoughts on all things QlikView and related topics. Steve Dark is an independent QlikView consultant based in Bracknell, Berkshire, UK.

Perfect Your QlikView Data Model

Recently I blogged on the pitfalls of not investing sufficient effort at the start of a QlikView project to avoid problems later on (see the post Start Your QlikView Project The Right Way).  In this article I advised that the single most significant factor in whether a QlikView project is a success is a well designed data model.  Here I would like to share some of my suggestions for getting the data model right, as this could save you a lot of pain if you are just about to build a QlikView Document or indeed a whole environment.

First of all, it is crucial that you decide exactly which columns and rows you require and only pull them in to the data model.  Redundant fields can cause unncessary confusion, performance issues and ultimately extra effort.   Prototyping with more fields is acceptable – but strip back once you know what is required.

Careful consideration also has to be given to the fact that QlikView associations work in a very different way to SQL joins.  As you probably already know; any fields with the same name in more than one table will associate.  If there are more than one field with the same name a synthetic key is created – and this is always a ‘bad thing’.  If you see extra tables in your data model with a dollar sign in the name you need to revise and improve your model.

Avoid multiple joins in your data model

In my experience it is almost always better to concatenate fact tables together rather than to try and join them.  This way you can have many columns with the same name and the values in them will appear just as one field in QlikView.  Selecting a value in that field will pull out values from each of the concatenated tables.  This technique is especially useful when pulling data from different sources, or applying budget or forecast data into your model.  Concatenation is ideally done by having exactly the same fields in all tables you wish to concatenate (concatenation will then happen automatically) or by forcing a concatenation with a CONCATENATE statement.  If using the latter method make sure you are aware of the impact on Optimized QVD Loads.

Whenever you are building on top of a database with a number of fair sized tables get the business to consider exactly what they need from each table – columns and rows.  If they have direct access to the back end database they can investigate the tables that way.  However, what I generally do is build a separate QlikView document for each source table (so there are no join issues) that they can use for this investigation.  Each document lists each field in that table in a multi box.  This allows users to see every value in that table and decide whether it is required or not.  Typically I will add a Table Box of all values (with a calculation condition to limit rows) so they can see all of the raw data in context.  Encourage the business users to be ruthless with this shedding of fields and rows – it could really help as the project progresses.  A good example of shedding rows is historical data – most businesses change their shape considerably over a a few years – so comparisons with data more than a couple of years old is usually not relevant.

Multiboxes Allow Checking Of All Fields

During this data identification process also get the users to think about how the fields should be referred to in the UI.  Consistency and clarity are essential.  I will often provide spreadsheets of the source field names (or even a printout if they are paper-people) and get them to fill in the required ‘user friendly’ field names.  Rename to these field names as soon as you extract from the database – using an ‘as’ statement in the load.  Insert spaces in field names where these are not present in the source – this will require you to use square brackets in expressions but it will save retyping legends each time a field is used.  Also any underscores and the like should be removed at this point.

Do all you can to simplify and rationalise the data model in the load script.  This is better than having lots of joins in the finished data model.  The best way of doing this is with ApplyMap (for more details see my previous posting ApplyMap – Is It So Wrong?).  Pre-calculate any values you can in the load script to save processor time later (perhaps calculate margins or FX values?).  Also, change any codes into values (eg. Y/N into Yes/No) thinking always of what your users will expect to see.  I always add a ‘Count’ field  to each table (ie. ’1 as CustomerCount,’) – as this can greatly simplify expressions later.  In short, the more work you can do in your load script the better – the benefits to document performance and development time can be immense.

It is almost always (and I had to think twice about the ‘almost’ there) a good idea to have a QVD layer built into your reload strategy.  These are essential for incremental loads, but are also very useful for chunking up your data extracts into manageable portions.  It can also aid in the breaking down of the development / UAT / promote cycle into segments that are simpler to unit test.  Extracting of data from multiple tables in the source databases can be done concurrently if there is a separate ‘QVD Load’ routine for each source table.

When data is loaded into your presentation layer test any remaining joins carefully.  As previously mentioned do not allow synthetic keys as whilst these can work some of the time they will always hurt performance and can cause crashes - always build your own composite keys if required.  Watch out for issues with non-numeric join keys that can be caused by rogue spaces or mismatch of capitalisation.  The best thing to remember when testing a data model is to assume nothing.

Whilst QlikView is often viewed as a tool for producing flashy charts that are dynamic and engaging it is in fact nothing without a solid data model behind it.  In most developments more than half of the time needs to be spent on perfecting the data model.  If you ensure that you invest that time then you have a far greater chance of success in your QlikView projects.

9 responses to “Perfect Your QlikView Data Model”

  1. Miguel Angel

    Nice one Steve!

    Actually, this is the kind of post that, if fed properly, gives you a real idea on how to approach to the earlier stages of the QlikView development.

    I’d stress on the AutoNumber() types of functions for linking tables and differentiate since the very beginning the fields which values will be shown and the fields that will be used for expressions, joins, maps, links and so. For example, a MonthName() field to show the actual date properly formatted in the charts, and another Num() field to be used in set analysis. This might not be worth doing when a document only pulls a few hundreds of thousands of records, but boosts performance dramatically when the document works with millions of rows.

    Finally, it depends largely on the requirements and the data model of course, but the fewer the steps between tables the faster the model will work. It’s always preferable to use more disk space storing more fields (numeric, strings) than make more complex expressions that will result on slower performance and poorer user experience.

  2. Mathias Wallgren

    Thanks Steve for an interesting post. Like you also mentioned in your post I would really like to stress the importance of doing as much of the calculations as possible in the load scripts, rather than in the sheet objects.
    I also like to separate my load scripts into tabs based on the tabs I will use in the UI. That makes it a bit clearer (together with comments of course), especially if someone else will work on the data model in the future. Cheers Mathias

  3. Aaron Couron

    Miguel said:

    For example, a MonthName() field to show the actual date properly formatted in the charts, and another Num() field to be used in set analysis.

    I never thought of that and could be a large driver in performance. Great tip. Oh yeah, and great article as well, Steve.

  4. Sanjay

    Hi Steve,

    Thanks for the great post. You suggested to combine multiple fact tables in to a single fact. Can you shed some light on how to handle the data model in case there are fact tables with different levels of granularity.

    Regards,
    Sanjay

  5. Nagendhran CR

    Nice article Steve, there is no substitute for a solid data model in any BI project.

    Regards,
    Nagendhran Rangarajan

  6. Richard He

    I agree with you, Steve!

    Good data model is the key for one sucessful QV dashboard.

    As my point of view and habit, I prefer the dimensional modeling for QV model. I found the synthetic key is always big problem in QV model based on traditional table relationship(ER mode), some problems we need to fix, like loop bug!

    Based on dimensional model(Star schema), relationship of tables is quite clear and our QV job will be also high efficient. so most of time, this dimensional model will be obtained from DM and EDW. Most of job for table integration and clear will be implemented by ETL staff, QV guys should focus on dashboard design rather than cleaning synthetic key bug.

  7. Anosh Nathaniel

    Hi Steve,
    Regarding concatenating two fact tables into one. If both of the tables data is in hundred millions of record then concatenating may impact the performance as we need to use set analysis or if clause in expression for differentiating the data. We can take the example of Transaction and Invoice amount.
    What should be the solution in that case.

    Regards,
    Anosh Nathaniel

Leave a Reply