In part three of my Back To Basics series of QlikView blog posts I look at QVDs. What they are, how to create them and why you should be using them in your QlikView project.
QlikView QVDs play a crucial part in the majority of QlikView implementations. As such I have blogged about their use many times previously; such as in this post on Incremental Loads and this one on Optimised Loads. Now that I am going Back To Basics in this series of posts I felt it was a good point to go back to first principles with QVDs.
What Is A QVD?
At a basic level a QVD is a flat data file with the extension .qvd. It can store a single table of data (is. each row shares the same column list) and is typically created in the load script of a QVW file. The structure of the file is essentially an XML format, with various bits of header information stored at the top of the file and the rest of the data beneath. One of the beauties of a QVD file is that it is compressed on creation using the same algorithms as QlikView uses to store data in memory – so the files can be incredibly small for the amount of data they contain. Loading from QVD back into memory is blindingly fast as the format of the file mirrors how QlikView addresses data in RAM.
Why Should You Use QVDs?
One of the points that is often noted when pitching QlikView is that it does not need a data warehouse or predefined data cube to work off. This is completely true, but the requirement to have a logical data layer often remains. This is particularly true if the data being collected is coming from multiple systems or is being distributed via multiple QlikView applications. QVDs can fulfil this requirement admirably. QVDs are also essential when you want to adopt an incremental load strategy.
Personally I would go as far as saying your implementations should always be built on QVDs – except perhaps where the data source is a simple Excel spreadsheet that is stored locally. There are many reasons I would suggest this, some of these are:
- Decoupling data extract from data presentation
- Ability to do parallel data extracts
- Easier unit testing of parts of the load process
- Incremental loads
- Sharing of extracted data between presentation apps
- Ability to scale the solution when data volumes grow
- Ability to delegate responsibility for different parts of the data load to different teams
The question you should be asking when designing your data load strategy is not why you should be using QVDs – but rather are there any reasons why you shouldn’t be using them (the reasons here are very few).
How Do I Create A QVD?
Typically QVDs are created during the execution of your QlikView load script. A STORE statement writes the current contents of a single table in your data model to a file on the disk. The syntax for this is:
STORE TableName INTO ..\Data\FileName.qvd (QVD);
Note that the STORE command can also be used to write data into a comma or tab delimited file – but that is a topic for another blog post.
QVDs can be created during the execution of any QlikView load script, but best practice is to have separate applications which deal solely with the creation of QVD files. This “extraction layer” then handles all interaction with source databases. Depending on the size and complexity of the data being loaded I may create one app for each table being extracted or a single app may create all QVDs for the solution (or anything else in between). What is important though is that the front end can be reloaded quickly from locally stored data files. This ability to refresh the presentation layer quickly can massively speed up development.
How Do I Use My QVDs?
Once you have created a data layer consisting of a number of QVDs you simply load from each QVD file in the same way you would a CSV or Excel file. The syntax is:
FROM ..\Data\FileName.qvd (qvd);
Note that where with CSV or Excel files the load statement contains a chunk of information about file formats this is not required with a QVD load. You can also use the wizard in the load script by clicking the File button and locating the QVD in the folder browser.
When loading from a QVD you can apply a number of transformations, such as renaming columns, excluding rows and adding derived columns. Be aware though that many of these transformations will cause your QVD load to be non-optimised (which will make the load up to 100 times slower) please see this blog post for more information: QlikView Optimised Loads.
By loading from multiple QVD’s into a single application you can build up your associative data model. This could involve some data from QVDs (perhaps originally from different data sources) and maybe some small lookup type data values from yet another source.
As well as being a very quick way of loading data into QlikView apps QVDs can also be useful for archive – due to the excellent compression which is used in their creation. You could, for example, store dated copies of a data set into QVDs; your front end would then typically work off the latest version, but it would be possible to point it to an archive version of the data if required.
QVD’s are QlikView’s proprietary way of staging and storing data. They can be used to provide a logical data tier in your solution and in some cases could remove the need for a data warehouse. Using QVDs you can share data between multiple applications without having to keep going back to the source data, you can also chunk up the data load process into manageable steps.
If you are presently using QlikView but not using QVDs then I strongly recommend that you consider the advantages of doing so.