Recently I blogged on the pitfalls of not investing sufficient effort at the start of a QlikView project to avoid problems later on (see the post Start Your QlikView Project The Right Way). In this article I advised that the single most significant factor in whether a QlikView project is a success is a well designed data model. Here I would like to share some of my suggestions for getting the data model right, as this could save you a lot of pain if you are just about to build a QlikView Document or indeed a whole environment.
First of all, it is crucial that you decide exactly which columns and rows you require and only pull them in to the data model. Redundant fields can cause unncessary confusion, performance issues and ultimately extra effort. Prototyping with more fields is acceptable – but strip back once you know what is required.
Careful consideration also has to be given to the fact that QlikView associations work in a very different way to SQL joins. As you probably already know; any fields with the same name in more than one table will associate. If there are more than one field with the same name a synthetic key is created – and this is always a ‘bad thing’. If you see extra tables in your data model with a dollar sign in the name you need to revise and improve your model.
In my experience it is almost always better to concatenate fact tables together rather than to try and join them. This way you can have many columns with the same name and the values in them will appear just as one field in QlikView. Selecting a value in that field will pull out values from each of the concatenated tables. This technique is especially useful when pulling data from different sources, or applying budget or forecast data into your model. Concatenation is ideally done by having exactly the same fields in all tables you wish to concatenate (concatenation will then happen automatically) or by forcing a concatenation with a CONCATENATE statement. If using the latter method make sure you are aware of the impact on Optimized QVD Loads.
Whenever you are building on top of a database with a number of fair sized tables get the business to consider exactly what they need from each table – columns and rows. If they have direct access to the back end database they can investigate the tables that way. However, what I generally do is build a separate QlikView document for each source table (so there are no join issues) that they can use for this investigation. Each document lists each field in that table in a multi box. This allows users to see every value in that table and decide whether it is required or not. Typically I will add a Table Box of all values (with a calculation condition to limit rows) so they can see all of the raw data in context. Encourage the business users to be ruthless with this shedding of fields and rows – it could really help as the project progresses. A good example of shedding rows is historical data – most businesses change their shape considerably over a a few years – so comparisons with data more than a couple of years old is usually not relevant.
During this data identification process also get the users to think about how the fields should be referred to in the UI. Consistency and clarity are essential. I will often provide spreadsheets of the source field names (or even a printout if they are paper-people) and get them to fill in the required ‘user friendly’ field names. Rename to these field names as soon as you extract from the database – using an ‘as’ statement in the load. Insert spaces in field names where these are not present in the source – this will require you to use square brackets in expressions but it will save retyping legends each time a field is used. Also any underscores and the like should be removed at this point.
Do all you can to simplify and rationalise the data model in the load script. This is better than having lots of joins in the finished data model. The best way of doing this is with ApplyMap (for more details see my previous posting ApplyMap – Is It So Wrong?). Pre-calculate any values you can in the load script to save processor time later (perhaps calculate margins or FX values?). Also, change any codes into values (eg. Y/N into Yes/No) thinking always of what your users will expect to see. I always add a ‘Count’ field to each table (ie. ‘1 as CustomerCount,’) – as this can greatly simplify expressions later. In short, the more work you can do in your load script the better – the benefits to document performance and development time can be immense.
It is almost always (and I had to think twice about the ‘almost’ there) a good idea to have a QVD layer built into your reload strategy. These are essential for incremental loads, but are also very useful for chunking up your data extracts into manageable portions. It can also aid in the breaking down of the development / UAT / promote cycle into segments that are simpler to unit test. Extracting of data from multiple tables in the source databases can be done concurrently if there is a separate ‘QVD Load’ routine for each source table.
When data is loaded into your presentation layer test any remaining joins carefully. As previously mentioned do not allow synthetic keys as whilst these can work some of the time they will always hurt performance and can cause crashes – always build your own composite keys if required. Watch out for issues with non-numeric join keys that can be caused by rogue spaces or mismatch of capitalisation. The best thing to remember when testing a data model is to assume nothing.
Whilst QlikView is often viewed as a tool for producing flashy charts that are dynamic and engaging it is in fact nothing without a solid data model behind it. In most developments more than half of the time needs to be spent on perfecting the data model. If you ensure that you invest that time then you have a far greater chance of success in your QlikView projects.