As a QlikView consultant I am often expected to give hard and fast answers to questions that don’t necessarily have them. Perennial favourites are ‘how big a server do I need’, ‘should I bring in all columns’ and ‘should we build one big document or many smaller ones’. The answer to the first two are ‘show me the data’ and ‘almost certainly not’ respectively. The third question warrants a bit more discussion.
With section access taking care of security and segregation of data very nicely these considerations do not necessarily dictate separate documents. On the other hand users are used to swapping between documents and browser tabs, so multiple documents do not present a particular issue so should be fine.
So, how should the call be made? Often with large data volumes the answer comes down to memory. Each open document needs to have all of its data in memory. Also, every user accessing a document has a user cache of about 10% of this size. So, if a user accesses just the small amount of data they wish to see in a large document (that contains a large amount of data they don’t need) they are using more memory than they need to be. But, on the other hand, if separate documents are created for each users needs, but there are common dimensions between them, then that data is duplicated in memory many times. To find the best approach takes some analysis and thought.
Where segregation can certainly help is when there is a definite distinction between the data. For example, it may be a good idea to have a ‘recent’ document with a rolling thirty days of line level transactions, then another document covering all time with the data aggregated to a selected set of dimensions. Again this takes some thought and planning.
One thing that would strongly suggest the the creation of multiple documents is the appearance of ‘data islands’. These are tables of data that do not link to other data in the model in any way. Whilst this can be useful (and is fine) for lookup tables, multiple fact tables are a bad thing. The two main reasons for this are that it causes confusing behavior on selections and the danger of someone mixing dimensions in a chart and causing a Cartesian product. The main reason people tend to try and put multiple fact tables in a single document is QlikView’s document CAL licencing – where there is an extra cost for creating a separate document. This, to my mind, is not an excuse for bad design and could lead to documents that simply fail to perform.
If data from disparate systems need to be shown in a single space then a concatenated fact table is required rather than data islands – but that is another post.
If you are feeling forced into splitting a document due to its size ensure you are following best practices in your data model design. I have blogged previously on getting the best performance from QlikView, but the headline things to consider are granularity of data (eg. how many dp), number of joins in the model (keep to an absolute minimum) and redundant columns. Adopting a robust QVD strategy is also essential. Creating a subset of data to develop against is also very helpful.
With the answer being swayed by so many factors such as user requirements, performance, licencing, and to an extent preference it is little wonder there is no hard and fast answer. I hope this article has given you a good idea of the things you should be considering though.
Hi Steve,
Interesting question and some good points to consider.
I seem to remember that data islands are forbidden by the Document CAL license agreement, though I cannot find it anywhere on the current price list. Regardless of license issues, I do agree that applications with data islands are prime candidates for splitting up.
One specific scenario that I have used a few times with applications that contain a lot of transaction data is to create two versions; an aggregated version that can drill-down/through into a non-aggregated version. Depending on usage patterns, this can save quite some memory. I’ve found that most users tend to use the aggregated version and only a few will, incidentally, drill-through to the non-aggregated version to see the transaction data.
Kind regards,
Barry
Hi Barry,
Thanks for your comments. I believe that the licence restrictions around data islands were removed at either v9 SR2 or SR3 – it was very unpopular due to people having lookup tables in their existing documents which caused them not to work when migrating from version 8.5 to 9.0.
Excellent point about the ability to link from one document to another. I have used this technique a few times, but not yet for drill to detail.
Love the video you have put up by the way, showing QlikView-esque functionality using Excel and VBA. Keep up the good work at QlikFix – a very valuable resource.
– Steve
“Documents with multiple logical islands are normally not allowed. Multiple logical islands are only allowed, if the additional tables are unconnected and contain only few records or a single column.
In addition, the document may not contain any loosely coupled tables.
Finally, the cardinality (that is, the number of distinct values) of the key fields must decrease when moving away from the fact table.”
Page 82.
Chapter 5 Licencing.
Qlik View Server Reference Manual v11.
Thanks for pointing this out from the manual. It was the case back a few versions that QlikView simply would not work with two large data islands. This did not go down well with customers, as it broke many existing apps, so then that restriction was then lifted (from the product, if not the licence agreement). Now it is just a bad idea, from a performance and maintainability point of view, but I do know of apps where the customer has insisted on a ‘helicopter view’ with disparate data all in one app. That said, it is almost always possible to associate on a date key if nothing else.