In previous articles I have mentioned how critical it is to ensure your loads from QVD are optimised, but have not gone into the detail of how to do this. This post rectifies that. Here I explain what an optimised load is, why you should use them and how to perform them.
Please note that this post was written prior to the release of Qlik Sense but QVD loads in Qlik Sense will also be either optimised or non-optimised for all of the same reasons.
The headline grabbing fact here is that optimised QVD loads are up to 100 times quicker than non-optimised ones. That makes a lot of difference if you are watching a reload dialog, it even makes a lot of difference to your server performance if your reload is running on a schedule.
The reason for the vast difference is related to the much publicised compression algorithm that QlikView uses when storing data for in memory analysis. QVD files are stored in a format that mirrors the compression used in memory (which is why QVD files are so small on disk) and during an optimised load the data is sent directly from disk to memory directly in the same compressed format. When a non-optimised load is performed this is not the case.
So why not make all loads from QVD optimised? The simple fact is that some operations require the data to be unpacked, modified and then re-packed. This significantly slows the process. Just about any change to the data on the way out of the file and into memory will cause a load to be non-optimised.
Some examples of things that will cause a non-optimised load are:
- Adding new fields to the table
- Deriving new values from a field in the QVD
- Retrieving a field twice
- Most WHERE conditions
- Joining to an existing in memory table
- Loading data into a mapping table
In contrast the things you are allowed to do are:
- Rename fields
- Omit fields
- Do a simple one field WHERE EXISTS on a field returned in the record set
This sounds hugely restrictive, but then most things you would want to achieve can be coded for. For example, if you need to add fields – do this in the QVD generate routine rather than when reading the QVD. Similarly, if you need to derive a value do this when you generate the QVD also. Even complex WHERE statements can be done by deriving flags or composite keys in the QVD generate routine and then doing a simple WHERE EXISTS on a temporary table (even if that temporary table is just a single row from an in-line table).
In fact, optimised QVD loads with a WHERE EXISTS clause on each subsequent load statement is a simple but effective way of quickly building documents which contain related subsets of data – but that is something for another post.
So, how do you know if your load is optimised? Well, the first way is by noticing it is still running when you return to your desk with a fresh cup of coffee. The other is by checking the load progress dialog. Optimised loads show the text qvd optimized as the data is being pulled from the QVD – in contrast no message is shown when the load is non-optimised.
Always look out for that text and if it is not there when loading from a QVD then there will be merit in reviewing your load script to make the load optimised.
When the incredible speed of an optimised load is really essential is when you look to build an incremental load strategy and fresh data from source databases are combined with data previously stored in QVD’s. If the retrieval of the old data is not quick and efficient then the whole point of the incremental load is eroded.
It is worth noting here however, that that a non-optimised load from a local QVD file will still typically be much much faster than from any other data source. Sometimes non-optimised loads can not be avoided (or the development required to avoid them is not worth the time saving).
Hopefully this article has given you the information you need to make sure you loads are optimised, or make an informed decision to allow non-optimised loads.
As I have said in previous articles the back end of your QlikView document is at least as important as the front end – and ensuring optimised loads is an important part of getting the back end correct.
Just thought it would be worth noting a quick way to unoptimise a QVD is by using WHERE 1=1 statement.
Hi Adam – this used to be required back when you could not do an optimised load into a MAPPING LOAD statement. That restriction was removed in (I think) version 10. I can now not think of any situation where you would want to force an unoptimised load.
The most common way I have seen people causing inadvertent unoptimised loads is by adding a counter to a table (eg. 1 as InvoiceCount,), even that will make your load many times slower – so the counter must be added at QVD create time.
Steve, another tip to keeping your loads optimized when concatenating two tables.
If the 2nd table contains the all the fields in the first and then some other ones the load will be optimized.
Example 1 – Only first load will be optimized
TABLE:
LOAD
FIELD1,
FIELD2,
FIELD3
FROM
B.QVD
(qvd);
concatenate(TABLE)
LOAD
FIELD1,
FIELD2
FROM
A.QVD
(qvd);
Example 2 – Both loads are optimized
TABLE:
LOAD
FIELD1,
FIELD2
FROM
A.QVD
(qvd);
concatenate(TABLE)
LOAD
FIELD1,
FIELD2,
FIELD3
FROM
B.QVD
(qvd);
Thanks Daniel – yes that is exactly how you can ensure concatenated loads are optimised. Sometimes you will find that you need to add dummy fields, perhaps with null values, to some QVD’s so that they concatenate onto others.
Hi Steve,
you wrote “Most WHERE conditions”. Which are known Where conditions which can be used with Optimized Load (except commonly known exists)?
Best,
Ralf
Hi Ralf, the only two I am aware of are WHERE EXISTS and WHERE NOT EXISTS, and even then there are restrictions around their use. Notably that the field name must be the same in the table you are loading and the one you are checking the field against, also the field used in the EXISTS must be one of the fields being loaded from the QVD (you can drop it afterwards, if need be).
Another amazing fact is that a LOAD DISTINCT from QVD can be optimized too. So, in my case I could use it to solve a slow Where condition:
/* Slow Load, not qvd optimized:
Bookings:
LOAD * From Bookings.qvd (qvd)
Where ID>0; // ..or with other condition: Not IsNull(ID)
*/
// qvd optimized
ExistingID:
LOAD DISTINCT ID From Bookings.qvd (qvd);
// qvd optimized
Bookings:
LOAD * From Bookings.qvd (qvd)
Where Exists(ID);
Drop Table ExistingID;
– Ralf
That’s a good tip Ralf. I’ve noticed that there are more things that you can do now with regards to keeping things optimised than when I wrote the article. For instance, if you load from a QVD with a LEFT JOIN prefix that can now be optimised (as long as the other criteria are met about not modifying data).
When you address QVD generate routine – do you then mean resident load?
Hi Håkan, no, by QVD generate routine I mean the QVW file that is used to load data from source and write out a QVD. This should be kept separate from the QVW which has your analysis in it. It is in this routine that all of the manipulation of fields (such as building composite keys) should be done.
[…] The headline grabbing fact here is that optimised QVD loads are up to 100 times quicker than non-optimised ones. That makes a lot of difference if you are watching a reload dialog, it even makes a lot of difference to your server performance if your reload is running on a schedule. — by Steve Dark via QlikIntelligence.co.uk […]
Steve, what is best practice in optimizing Group By loads for large tables. should you group by as few fields as possible and use aggregate functions such as firstvalue() for text fieldsalong with your max() sum() etc… or should you use less aggregates and group by as many fields as possible to still allow for the specific group level you are after?
Hi Debbie. That is a good question! I would always look to put all fields in the group by statement – simply to avoid the risk of removing values accidentally with FirstSortedValue – the only aggregation functions I would typically use would be to aggregate numeric fields. If optimizing performance to the n’th degree is important you would have to try benchmarking over a serious amount of data. My gut feel would be more group by fields would be more performant – but QlikView can often surprise on things like this.
[…] QlikView Optimized QVD Loads via QuickIntelligence Shilpan Want to become QlikView developer? Enroll in my course at […]
Hello Steve. You explained the topic clearly. But I have a question. In the third paragraph in the second line you are saying that “QVD files are stored in a format that mirrors the compression
used in memory”. What you mean by “mirrors the compression” ? Please explain it. I am stuck in this point.
Thanks in advance.
Hi Joydip. Thanks for your comment. Much of what is really neat about QlikView is based around the clever way that they compress the data so that large volumes of data can be stored in memory. Obviously having the data in memory is one thing, getting it there in the first place is another. By using exactly the same compression algorithms when QVDs are created as to when data is placed in memory the Qlik load script can move the data directly from one location to the other without having to transform it.
By ‘mirrors’ I meant that it is exactly the same, but in a different location.
Thanks Steve for your quick reply. I got the point. Can you please explain how Where Exist() works ?
Hi Joydip. The WHERE EXISTS compares data in a field that has already been loaded during the load script to data in the table that is presently being loaded. In its simplest form the field name in memory is the same as the field name in the file being loaded (this is a prerequisite for optimised load). You can kind of think of it as doing an inner join between the table being loaded and some previously loaded data.
Steve
I used the WHERE EXISTS condition , I could not find any data in the table. loding data for 2014,2015,2016,2017
Hi Srinivasan.
Were you doing a match against a field with just the year in the QVD? How did you create the field you were looking up against?
One potential problem is with data types, if the year is stored as a string in the QVD, and a numeric in the data model the WHERE EXISTS will return no rows. The code should be as simple as:
Temp_Year:
LOAD
Year
INLINE [
Year
2014
2015
2016
2017
];
MainData:
LOAD
*
FROM MyQVD.qvd (qvd)
WHERE EXISTS (Year)
;
DROP TABLE Temp_Year;
If you run the code without the WHERE EXISTS line and the Drop and then add the Year field as a listbox in the app you will see if the values are the same (as fields from both table will show separately in the listbox) or whether the year from both sources shows separately.
Good luck!
Hi Steve, this is a very interesting article. I see a lot of comments around the web and in books pushing to use QVD’s. This requires using a “QVD generator.qvw”. However, for my use, I only have one QVW and in development, I am often reloading (either limited or partial). If I had to manage two QVW’s, I’m not sure if this would speed up my development process since I would have to reload “QVD generator.qvd” and then reload my main QVW.
Is there still any value in using QVD’s given I only use one main QVW for my dashboard?
Worth mentioning, my main fact table contains 300M+ rows. Reload times are 5+ hrs–this is due to the complex joins (currently trying to make this more efficient via SQL/incremental loads).
QVD files are more important then ever when you have that number of rows. You can use these to do incremental loads, where a partial load needs a full refresh on failure. There are usually ways of simplifying complex joins. I have speed a month end load process that took a couple of days down to just a couple of hours in this way before now.
Given the joins and volumes I would expect you to have many generators, which will allow you to reload things at different frequencies and unit test.
I have a separate blog post on incremental loads. I would recommend a read of that.
Great article , Steve.
How can you get the indicator in the log file that it is an optimized load? Nice that it is in the Load Progress Dialog but – for many reasons – having it listed in the actual log would be much more preferable. Can this be done?
Hi James, I agree that this would be more useful, so that you can look back after the event. If it is not there by default though I’m not sure how you would go about making it happen. One to raise to support, perhaps?
if i uses exist then is it optimized or un optimized?
WHERE EXISTS is optimised, provided there is only one done for each QVD load, and the field exists in the data-model already and is included in the LOAD list. For example WHERE EXISTS (SalesID) would be optimised, but WHERE EXISTS (SalesID, [Sales Identifier]) would not be.
Hi Steve
Really helpful article.
Any way to tackle multiple WHERE clauses in a single QVD load? One way might be to create a new field in the generator that is the concatenation of the WHERE fields. But if I don’t want to touch the generator script, any way to do it in the main application itself?
Also will mixmatch() break the optimized load?
Thanks
To get optimised loads you need to plan ahead a bit. Concatenating fields to do optimised WHERE EXISTS is a common use case. Most things will break an optimised load, I’m reasonably sure MixMatch will.
HI,
I’m trying to load data for a Tree Hierarchy and wondering if INCREMENTAL LOAD would help me.
Is a total of 800k (Customer, Enroller, Name).
The loading goes fast without delays, but when qlick start to built hierarchies (during the Loading process) it got stuck after around 300M hierarchies.
Will incremental load help the hierarchy process (during the loading process) or it helps for the loading itself only ?
Thanks
If you can work out old hierarchies and then store those to QVD and then only work out new hierarchies on each load, then yes it can. It all depends on whether you know when each row changed and if you can uniquely identify each row. This post on incremental load may help? https://www.quickintelligence.co.uk/qlikview-incremental-load/
Hello Steve, I read an article https://community.qlik.com/t5/QlikView-App-Dev/comparing-quot-where-exists-quot-and-quot-applymap-quot/td-p/383836 that suggests that using applymap is faster than Where Exists. I have tried this, and it does not say it is optimised, but it is pretty fast. Do you have any further knowledge on this?
That is certainly interesting. The sort of speed difference they are seeing is certainly the sort of improvement you could get from an Optimised load. However, I would have said that the WHERE EXISTS would be optimised and the ApplyMap not. In their test they may have done something that caused the WHERE EXISTS not to be be optimised. The ApplyMap may have caused less rows to be returned, which would speed it up. Or it could have been exactly as described. Even if it was though, it may not be the same now – the post was from 2012, and the engine was overhauled for Qlik Sense, so even QlikView has a re-written back end now.
The only way to know is to benchmark it. If you try it out please post back here. Thanks!
Hi, what is about optimized load in QlikView 12.4? We have updated our QlikView environment from the Version 12.2 SR9 to 12.4 SR1. With 12.2 worked optimized load fine. With 12.4 no more..
I’ve just done an optimised load in 12.40.20000.0 and then added 1 as a to the load and it was non-optimised. There may be some rule which didn’t cause non-optimised load previously but now does. Are you doing a straight load of all rows? If they have broken optimised loads in a release they will need to fix it PDQ in a service release, as I have loads that would go from minutes to many hours if optimised was broken.
Hi Steve,
I have noticed that Qlik keeps the load optimised , only when you are either using 1) simple where exists clause OR 2) alias fields …. when trying to alias filed names and also use where exists in the same load … then the optimised load is broken. The Qlik Help manual is not very clear on this . I though you could alias fields and also use a where exists clause both at same time but unfortunately this breaks optimised load.
Is this is a product bug or is this how optimised load works ?
Thanks
Nat
Hi Nat,
This is expected, given how WHERE EXISTS works when the field names do not match. You can most likely optimise your load by using a RENAME FIELD statement before the load, so that the field being matched to has the same name as the field in the QVD being loaded from. You can then do another RENAME FIELD after the load to make the fieldname whatever you need it to be.
Hope that helps.
Steve