Everyone has their way of creating a QlikView calendar script. Here, however, I argue you should ditch that script and do something different.
The Traditional Calendar
Ever since my initial QlikView training I have frequently seen examples of calendar scripts and how to build them. Torben Seebach’s recent Community post being a great example of how you can super charge your calendar create script. Long ruminations on how to obtain the min and max values over a large data set have largely been driven by setting the variables ready to enumerate for a calendar script (though this is not the only reason you may want to get a min and max value out of a table).
The Inefficient Calendar
Calendar tables are useful for splitting out a single date into all the myriad of ways that date can be categorised; it’s year, month, quarter, day etc. etc.. I’ve seen some calendar tables with many many fields in them. Herein lies my first problem with calendar tables; I always advise keeping the number of fields in your data model to a minimum to avoid confusion and clutter. A pre-canned calendar script with many dimensions encourages developers to have more fields than than they actually need. Another thing I try to keep to a minimum is joins in the data model. Whilst these are not generally a problem, there is a slight overhead in each hop that need to be made to pull all data together for an expression. As date fields are common dimensions expressions involving them, in a linked calendar, over large data sets will start to perform less well. Finally, no matter how well you optimise the script, there is an overhead to building your load script each time you reload your front end application.
The Alternative Calendar
What I tend to do as an alternative to the calendar table is to build my date dimensions in the fact tables themselves. The code to split a date into it’s constituent parts is the same as when done in a separate table, and if manipulation is required to derive the date (for example converting from a text value in a CSV) then this can be done in a preceding load. Yes, there is an overhead in calculating the date parts for each row, but QlikView’s expression caching should kick in here, and I’ll wager that this overhead is nothing compared to the drive\network lag of pulling your data. If you have an incremental load then the dates are not recalculated on old rows. If you have many dates in your table you can choose which fields you break out into what dimensions – based on need rather than what is in your saved script.
Most of the advantages of using a calendar script and separate table can be obtained, without the disadvantages in the section above.
In this code you can see I am deriving a few date based dimensions in among other fields:
Plugging The Gaps
One of the potential disadvantages of the approach above is that you can have gaps in the date field. in many cases this is a moot point, as either there is data for every day or it doesn’t matter that there isn’t. Having a month list box with only a few months in may look strange (I tend not to worry about this though), but this can be overcome by a force concatenate of missing months (using WHERE NOT EXISTS). If a report is required where you show all days where nothing happened then the same approach may be used for days, in this case remember you can derive all your dimensions as you append the dates.
Another disadvantage, if you have incremental loads, is that it is easier to add new date dimensions to a calendar script than to stored QVDs. It is not to difficult when your dates are in-line though. If you spend new rows to an existing QVD on each load simply add the new field to the increment part of the load, and also when loading from Dave QVD – this will break the optimised load on first run but you can restore this by reverting to the saved field for subsequent runs. If you have QVDs that are not appended to you will need to create an ad-hoc QVD generator to add the new field to all stored QVDs – this is a relatively simple task.
A Further Thought
If calendar tables hanging off your fact table is what you want (perhaps performance is not an issue for you) you could generate a calendar QVD with all the date parts in for as long a period as you like. This can then be loaded with a WHERE EXISTS into the presentation document. Each field that date break downs are required for simply requires the generic fields names in the QVD to be renamed, to avoid loops and synthetic keys. This way the min and max deriving and the loop is not required and your front end load is simply optimised loads from a number of QVDs – which is clean and tidy.
This is a topic that I am very keen to hear other people’s thoughts on, particularly those who know details of QlikView’s inner workings. There are people I know of who are QlikView architecture experts who use calendars, so perhaps I am on the wrong tack here? What I do know is that fewer tables can perform better and calculating date dimensions in-line does not adversely affect data loads.
Please discuss below, I am ready to be challenged on this.