QlikView Optimised QVD Loads

In previous articles I have mentioned how critical it is to ensure your loads from QVD are optimised, but have not gone into the detail of how to do this. This post rectifies that.  Here I explain what an optimised load is, why you should use them and how to perform them.

The headline grabbing fact here is that optimised QVD loads are up to 100 times quicker than non-optimised ones.  That makes a lot of difference if you are watching a reload dialog, it even makes a lot of difference to your server performance if your reload is running on a schedule.

The reason for the vast difference is related to the much publicised compression algorithm that QlikView uses when storing data for in memory analysis. QVD files are stored in a format that mirrors the compression used in memory (which is why QVD files are so small on disk) and during an optimised load the data is sent directly from disk to memory directly in the same compressed format. When a non-optimised load is performed this is not the case.

So why not make all loads from QVD optimised? The simple fact is that some operations require the data to be unpacked, modified and then re-packed.   This significantly slows the process.  Just about any change to the data on the way out of the file and into memory will cause a load to be non-optimised.

Some examples of things that will cause a non-optimised load are:

– Adding new fields to the table
– Deriving new values from a field in the QVD
– Retrieving a field twice
– Most WHERE conditions
– Joining to an existing in memory table
– Loading data into a mapping table

In contrast the things you are allowed to do are:

– Rename fields
– Omit fields
– Do a simple one field WHERE EXISTS on a field returned in the record set

This sounds hugely restrictive, but then most things you would want to achieve can be coded for. For example, if you need to add fields – do this in the QVD generate routine rather than when reading the QVD. Similarly, if you need to derive a value do this when you generate the QVD also. Even complex WHERE statements can be done by deriving flags or composite keys in the QVD generate routine and then doing a simple WHERE EXISTS on a temporary table (even if that temporary table is just a single row from an in-line table).

In fact, optimised QVD loads with a WHERE EXISTS clause on each subsequent load statement is a simple but effective way of quickly building documents which contain related subsets of data – but that is something for another post.

So, how do you know if your load is optimised?  Well, the first way is by noticing it is still running when you return to your desk with a fresh cup of coffee.  The other is by checking the load progress dialog.  Optimised loads show the text qvd optimized as the data is being pulled from the QVD – in contrast no message is shown when the load is non-optimised.

Optimised QVD Load

Always look out for that text and if it is not there when loading from a QVD then there will be merit in reviewing your load script to make the load optimised.

When the incredible speed of an optimised load is really essential is when you look to build an incremental load strategy and fresh data from source databases are combined with data previously stored in QVD’s.  If the retrieval of the old data is not quick and efficient then the whole point of the incremental load is eroded.

It is worth noting here however, that that a non-optimised load from a local QVD file will still typically be much much faster than from any other data source.  Sometimes non-optimised loads can not be avoided (or the development required to avoid them is not worth the time saving).

Hopefully this article has given you the information you need to make sure you loads are optimised, or make an informed decision to allow non-optimised loads.

As I have said in previous articles the back end of your QlikView document is at least as important as the front end – and ensuring optimised loads is an important part of getting the back end correct.

By | 2016-06-25T01:45:28+00:00 February 20th, 2013|Load Script, QlikView Tutorial|34 Comments

About the Author:

Steve is owner and principal consultant at Quick Intelligence. He is a Qlik Luminary, Qlik Community MVP and Technical Editor of a number of QlikView Books.

34 Comments

  1. adam cooke March 30, 2013 at 1:14 am - Reply

    Just thought it would be worth noting a quick way to unoptimise a QVD is by using WHERE 1=1 statement.

    • Steve Dark March 30, 2013 at 5:59 pm - Reply

      Hi Adam – this used to be required back when you could not do an optimised load into a MAPPING LOAD statement. That restriction was removed in (I think) version 10. I can now not think of any situation where you would want to force an unoptimised load.

      The most common way I have seen people causing inadvertent unoptimised loads is by adding a counter to a table (eg. 1 as InvoiceCount,), even that will make your load many times slower – so the counter must be added at QVD create time.

  2. Daniel Rozental April 5, 2013 at 4:21 pm - Reply

    Steve, another tip to keeping your loads optimized when concatenating two tables.

    If the 2nd table contains the all the fields in the first and then some other ones the load will be optimized.

    Example 1 – Only first load will be optimized
    TABLE:
    LOAD
    FIELD1,
    FIELD2,
    FIELD3
    FROM
    B.QVD
    (qvd);

    concatenate(TABLE)
    LOAD
    FIELD1,
    FIELD2
    FROM
    A.QVD
    (qvd);

    Example 2 – Both loads are optimized

    TABLE:
    LOAD
    FIELD1,
    FIELD2
    FROM
    A.QVD
    (qvd);

    concatenate(TABLE)
    LOAD
    FIELD1,
    FIELD2,
    FIELD3
    FROM
    B.QVD
    (qvd);

    • Steve Dark April 5, 2013 at 4:39 pm - Reply

      Thanks Daniel – yes that is exactly how you can ensure concatenated loads are optimised. Sometimes you will find that you need to add dummy fields, perhaps with null values, to some QVD’s so that they concatenate onto others.

  3. Ralf Becher December 11, 2013 at 1:00 pm - Reply

    Hi Steve,

    you wrote “Most WHERE conditions”. Which are known Where conditions which can be used with Optimized Load (except commonly known exists)?

    Best,
    Ralf

    • Steve Dark December 11, 2013 at 1:28 pm - Reply

      Hi Ralf, the only two I am aware of are WHERE EXISTS and WHERE NOT EXISTS, and even then there are restrictions around their use. Notably that the field name must be the same in the table you are loading and the one you are checking the field against, also the field used in the EXISTS must be one of the fields being loaded from the QVD (you can drop it afterwards, if need be).

      • Ralf Becher December 11, 2013 at 1:45 pm - Reply

        Another amazing fact is that a LOAD DISTINCT from QVD can be optimized too. So, in my case I could use it to solve a slow Where condition:

        /* Slow Load, not qvd optimized:
        Bookings:
        LOAD * From Bookings.qvd (qvd)
        Where ID>0; // ..or with other condition: Not IsNull(ID)
        */

        // qvd optimized
        ExistingID:
        LOAD DISTINCT ID From Bookings.qvd (qvd);

        // qvd optimized
        Bookings:
        LOAD * From Bookings.qvd (qvd)
        Where Exists(ID);

        Drop Table ExistingID;

        – Ralf

        • Steve Dark December 11, 2013 at 3:01 pm - Reply

          That’s a good tip Ralf. I’ve noticed that there are more things that you can do now with regards to keeping things optimised than when I wrote the article. For instance, if you load from a QVD with a LEFT JOIN prefix that can now be optimised (as long as the other criteria are met about not modifying data).

  4. Håkan December 11, 2013 at 1:01 pm - Reply

    When you address QVD generate routine – do you then mean resident load?

    • Steve Dark December 11, 2013 at 1:31 pm - Reply

      Hi Håkan, no, by QVD generate routine I mean the QVW file that is used to load data from source and write out a QVD. This should be kept separate from the QVW which has your analysis in it. It is in this routine that all of the manipulation of fields (such as building composite keys) should be done.

  5. […] The headline grabbing fact here is that optimised QVD loads are up to 100 times quicker than non-optimised ones.  That makes a lot of difference if you are watching a reload dialog, it even makes a lot of difference to your server performance if your reload is running on a schedule. — by Steve Dark via QlikIntelligence.co.uk […]

  6. Debbie Pyykkonen November 6, 2014 at 7:41 pm - Reply

    Steve, what is best practice in optimizing Group By loads for large tables. should you group by as few fields as possible and use aggregate functions such as firstvalue() for text fieldsalong with your max() sum() etc… or should you use less aggregates and group by as many fields as possible to still allow for the specific group level you are after?

    • Steve Dark November 6, 2014 at 8:21 pm - Reply

      Hi Debbie. That is a good question! I would always look to put all fields in the group by statement – simply to avoid the risk of removing values accidentally with FirstSortedValue – the only aggregation functions I would typically use would be to aggregate numeric fields. If optimizing performance to the n’th degree is important you would have to try benchmarking over a serious amount of data. My gut feel would be more group by fields would be more performant – but QlikView can often surprise on things like this.

  7. […] QlikView Optimized QVD Loads via QuickIntelligence Shilpan Want to become QlikView developer? Enroll in my course at […]

  8. Joydip Chakraborty May 20, 2015 at 7:10 pm - Reply

    Hello Steve. You explained the topic clearly. But I have a question. In the third paragraph in the second line you are saying that “QVD files are stored in a format that mirrors the compression
    used in memory”. What you mean by “mirrors the compression” ? Please explain it. I am stuck in this point.
    Thanks in advance.

    • Steve Dark May 20, 2015 at 7:32 pm - Reply

      Hi Joydip. Thanks for your comment. Much of what is really neat about QlikView is based around the clever way that they compress the data so that large volumes of data can be stored in memory. Obviously having the data in memory is one thing, getting it there in the first place is another. By using exactly the same compression algorithms when QVDs are created as to when data is placed in memory the Qlik load script can move the data directly from one location to the other without having to transform it.

      By ‘mirrors’ I meant that it is exactly the same, but in a different location.

  9. Joydip Chakraborty May 20, 2015 at 7:45 pm - Reply

    Thanks Steve for your quick reply. I got the point. Can you please explain how Where Exist() works ?

    • Steve Dark May 20, 2015 at 10:42 pm - Reply

      Hi Joydip. The WHERE EXISTS compares data in a field that has already been loaded during the load script to data in the table that is presently being loaded. In its simplest form the field name in memory is the same as the field name in the file being loaded (this is a prerequisite for optimised load). You can kind of think of it as doing an inner join between the table being loaded and some previously loaded data.

  10. Srinivasan R June 30, 2015 at 5:07 am - Reply

    Steve
    I used the WHERE EXISTS condition , I could not find any data in the table. loding data for 2014,2015,2016,2017

    • Steve Dark June 30, 2015 at 6:09 am - Reply

      Hi Srinivasan.

      Were you doing a match against a field with just the year in the QVD? How did you create the field you were looking up against?

      One potential problem is with data types, if the year is stored as a string in the QVD, and a numeric in the data model the WHERE EXISTS will return no rows. The code should be as simple as:

      Temp_Year:
      LOAD
      Year
      INLINE [
      Year
      2014
      2015
      2016
      2017
      ];

      MainData:
      LOAD
      *
      FROM MyQVD.qvd (qvd)
      WHERE EXISTS (Year)
      ;

      DROP TABLE Temp_Year;

      If you run the code without the WHERE EXISTS line and the Drop and then add the Year field as a listbox in the app you will see if the values are the same (as fields from both table will show separately in the listbox) or whether the year from both sources shows separately.

      Good luck!

  11. Albert October 26, 2015 at 7:48 pm - Reply

    Hi Steve, this is a very interesting article. I see a lot of comments around the web and in books pushing to use QVD’s. This requires using a “QVD generator.qvw”. However, for my use, I only have one QVW and in development, I am often reloading (either limited or partial). If I had to manage two QVW’s, I’m not sure if this would speed up my development process since I would have to reload “QVD generator.qvd” and then reload my main QVW.

    Is there still any value in using QVD’s given I only use one main QVW for my dashboard?

    Worth mentioning, my main fact table contains 300M+ rows. Reload times are 5+ hrs–this is due to the complex joins (currently trying to make this more efficient via SQL/incremental loads).

    • Steve Dark October 27, 2015 at 7:30 am - Reply

      QVD files are more important then ever when you have that number of rows. You can use these to do incremental loads, where a partial load needs a full refresh on failure. There are usually ways of simplifying complex joins. I have speed a month end load process that took a couple of days down to just a couple of hours in this way before now.

      Given the joins and volumes I would expect you to have many generators, which will allow you to reload things at different frequencies and unit test.

      I have a separate blog post on incremental loads. I would recommend a read of that.

  12. Ramesh A November 17, 2016 at 2:28 am - Reply

    Great article , Steve.

  13. James Richmond September 18, 2017 at 5:01 pm - Reply

    How can you get the indicator in the log file that it is an optimized load? Nice that it is in the Load Progress Dialog but – for many reasons – having it listed in the actual log would be much more preferable. Can this be done?

    • Steve Dark September 19, 2017 at 6:25 am - Reply

      Hi James, I agree that this would be more useful, so that you can look back after the event. If it is not there by default though I’m not sure how you would go about making it happen. One to raise to support, perhaps?

  14. vinod September 29, 2017 at 3:51 pm - Reply

    if i uses exist then is it optimized or un optimized?

    • Steve Dark September 29, 2017 at 4:12 pm - Reply

      WHERE EXISTS is optimised, provided there is only one done for each QVD load, and the field exists in the data-model already and is included in the LOAD list. For example WHERE EXISTS (SalesID) would be optimised, but WHERE EXISTS (SalesID, [Sales Identifier]) would not be.

  15. Jishnu January 29, 2018 at 9:14 am - Reply

    Hi Steve

    Really helpful article.
    Any way to tackle multiple WHERE clauses in a single QVD load? One way might be to create a new field in the generator that is the concatenation of the WHERE fields. But if I don’t want to touch the generator script, any way to do it in the main application itself?

    Also will mixmatch() break the optimized load?

    Thanks

    • Steve Dark February 1, 2018 at 4:18 pm - Reply

      To get optimised loads you need to plan ahead a bit. Concatenating fields to do optimised WHERE EXISTS is a common use case. Most things will break an optimised load, I’m reasonably sure MixMatch will.

  16. Matus Kelemen February 1, 2018 at 4:05 pm - Reply

    Hi,
    I created Idea in community for optimized load info in the log.
    https://community.qlik.com/ideas/4815
    BR,

    Matus

  17. Yossi May 31, 2018 at 5:38 am - Reply

    HI,

    I’m trying to load data for a Tree Hierarchy and wondering if INCREMENTAL LOAD would help me.
    Is a total of 800k (Customer, Enroller, Name).

    The loading goes fast without delays, but when qlick start to built hierarchies (during the Loading process) it got stuck after around 300M hierarchies.

    Will incremental load help the hierarchy process (during the loading process) or it helps for the loading itself only ?

    Thanks

    • Steve Dark May 31, 2018 at 7:23 am - Reply

      If you can work out old hierarchies and then store those to QVD and then only work out new hierarchies on each load, then yes it can. It all depends on whether you know when each row changed and if you can uniquely identify each row. This post on incremental load may help? https://www.quickintelligence.co.uk/qlikview-incremental-load/

  18. Rob Jackson June 2, 2018 at 7:43 am - Reply

    Hello Steve, I read an article https://community.qlikview.com/thread/60910 that suggests that using applymap is faster than Where Exists. I have tried this, and it does not say it is optimised, but it is pretty fast. Do you have any further knowledge on this?

    • Steve Dark June 2, 2018 at 8:01 am - Reply

      That is certainly interesting. The sort of speed difference they are seeing is certainly the sort of improvement you could get from an Optimised load. However, I would have said that the WHERE EXISTS would be optimised and the ApplyMap not. In their test they may have done something that caused the WHERE EXISTS not to be be optimised. The ApplyMap may have caused less rows to be returned, which would speed it up. Or it could have been exactly as described. Even if it was though, it may not be the same now – the post was from 2012, and the engine was overhauled for Qlik Sense, so even QlikView has a re-written back end now.

      The only way to know is to benchmark it. If you try it out please post back here. Thanks!

Leave A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.