As a QlikView consultant I am often expected to give hard and fast answers to questions that don’t necessarily have them. Perennial favourites are ‘how big a server do I need’, ‘should I bring in all columns’ and ‘should we build one big document or many smaller ones’. The answer to the first two are ‘show me the data’ and ‘almost certainly not’ respectively. The third question warrants a bit more discussion.
With section access taking care of security and segregation of data very nicely these considerations do not necessarily dictate separate documents. On the other hand users are used to swapping between documents and browser tabs, so multiple documents do not present a particular issue so should be fine.
So, how should the call be made? Often with large data volumes the answer comes down to memory. Each open document needs to have all of its data in memory. Also, every user accessing a document has a user cache of about 10% of this size. So, if a user accesses just the small amount of data they wish to see in a large document (that contains a large amount of data they don’t need) they are using more memory than they need to be. But, on the other hand, if separate documents are created for each users needs, but there are common dimensions between them, then that data is duplicated in memory many times. To find the best approach takes some analysis and thought.
Where segregation can certainly help is when there is a definite distinction between the data. For example, it may be a good idea to have a ‘recent’ document with a rolling thirty days of line level transactions, then another document covering all time with the data aggregated to a selected set of dimensions. Again this takes some thought and planning.
One thing that would strongly suggest the the creation of multiple documents is the appearance of ‘data islands’. These are tables of data that do not link to other data in the model in any way. Whilst this can be useful (and is fine) for lookup tables, multiple fact tables are a bad thing. The two main reasons for this are that it causes confusing behavior on selections and the danger of someone mixing dimensions in a chart and causing a Cartesian product. The main reason people tend to try and put multiple fact tables in a single document is QlikView’s document CAL licencing – where there is an extra cost for creating a separate document. This, to my mind, is not an excuse for bad design and could lead to documents that simply fail to perform.
If data from disparate systems need to be shown in a single space then a concatenated fact table is required rather than data islands – but that is another post.
If you are feeling forced into splitting a document due to its size ensure you are following best practices in your data model design. I have blogged previously on getting the best performance from QlikView, but the headline things to consider are granularity of data (eg. how many dp), number of joins in the model (keep to an absolute minimum) and redundant columns. Adopting a robust QVD strategy is also essential. Creating a subset of data to develop against is also very helpful.
With the answer being swayed by so many factors such as user requirements, performance, licencing, and to an extent preference it is little wonder there is no hard and fast answer. I hope this article has given you a good idea of the things you should be considering though.