How To Convert Drop Folder Files to QVD

Recently I posted a solution to a thread on Qlik Community that I thought would be worth sharing with readers of this blog. The question was around speading up the load times from an increasing number of CSV files arriving in a drop folder. The solution I gave used a couple of helpful techniques. Here I look to take those techniques further.

The CSV Advantage

CSV files are a good way of shifting data between disparate systems. They are completely universal and most data tools have a way of dealing with them. However, if you are used to the speed you get from an optimised QVD load; CSV load times can be a drag. This is due to the fact that CSV files do not benefit from the clever way that QVD files are constructed, where repeated values are not stored on every line.

If you use QlikView a lot, at some point you will have to deal with many CSVs. So, how can you deal with them effectively?

The Basic Solution

In the Community post the question was around an increasing number of dated files. Each file containing rows to append to the existing data. Volumes were large, but not massive.

Given that QVDs load so much quicker than CSVs all we needed to do was load old data from QVD and new data from CSV – creating QVD files from the new CSVs as we go. The slow CSV load for each file would therefore only need to be done once.

The code I suggested to do this was as follows:


for each vFile in FileList('..\Data\*.csv')
     let vQVD = replace(vFile, '.csv', '.qvd');

     if alt(FileSize('$(vQVD)'), 0) = 0 then
          NextFile:
          LOAD
              *
          FROM $(vFile)
          (txt, codepage is 1252, no labels, delimiter is ',', msq)
          ;

          STORE NextFile INTO $(vQVD) (qvd);

          DROP TABLE NextFile;
     end if
next

Data:
LOAD
	*
FROM ..\Data\*.qvd (qvd)
;

The key thing to note about this approach is how quick the file iteration and exists check takes place – this surprised me when I first tried this approach. When running over hundreds of files that have already been processed this routine still takes next to no time. The load from multiple QVDs at the end is optimised, so again is very quick.

The thing that I like about this approach is the simplicity. If, for some reason, you wanted to rebuild some or all of the QVDs you simply delete the QVDs you want to be rebuilt.

Taking It Further

There are many ways in which this approach could be improved upon. Here are a few thoughts on doing this.

I would always add some fields to any load – to make front end processing quicker. This can be done in this code when each CSV is loaded:


          NextFile:
          LOAD
              1 as RowCounter,
              filename() as SourceFile,
              *
          FROM $(vFile)
          (txt, codepage is 1252, no labels, delimiter is ',', msq)
          ;

If you wanted to reload a CSV file when it’s date has changed since the QVD was last built; the FileSize could be added to with a FileDate check.


if alt(FileSize('$(vQVD)'), 0) = 0 or FileTime('$(vQVD)') < FileTime('$(vFile)') then

If you wanted to be able to delete one of the CSVs and have the data from the corresponding QVD not loaded, you could replace the load * from the QVD replaced with the following code:


for each vFile in FileList('..\Data\*.csv')
	let vQVD = replace(vFile, '.csv', '.qvd');
	
	Data:
	LOAD
		*
	FROM $(vQVD) (qvd)
	;
next

If optimisation is critical to you then loading from a single appended QVD will be quicker. This is because duplicate values are only removed within one QVD, not across multiple QVDs – the same data loaded from one QVD will be much quicker than loading from many. In order to achieve this data must be appended from each file (CSV or QVD) and stored into a single QVD. As there is no longer a QVD for each source file, a flag file is required to denote a processed file. The code to implement this approach would be:


LoadDone:
LOAD
	now() as DateLoaded
AUTOGENERATE(1)
;

for each vFile in FileList('..\Data\*.csv')
     let vDone = replace(vFile, '.csv', '.done');

     if alt(FileSize('$(vDone)'), 0) = 0 or
        FileTime('$(vDone)') < FileTime('$(vFile)') then
          Data:
          LOAD
              1 as RowCounter,
              filename() as SourceFile,
              *
          FROM $(vFile)
          (txt, codepage is 1252, no labels, delimiter is ',', msq)
          ;

          STORE LoadDone INTO $(vDone) (txt);
     end if
next
	
if Alt(FileSize('..\Data\Data.qvd'), 0) > 0 then
	Data:
	LOAD
		*
	FROM ..\Data\Data.qvd (qvd);
end if

STORE Data INTO ..\Data\Data.qvd (qvd);

Finally, if there is a unique key and the chance of records to be updated in subsequent files then standard Incremental Load techniques can be layered on top, such as WHERE NOT EXISTS:


	Data:
	LOAD
		*
	FROM ..\Data\Data.qvd (qvd)
	WHERE NOT EXISTS(ID)
	;

Conclusions

As with any software development, with a bit of thinking through simple solutions can be arrived at that deliver great value. The time savings from this approach could be massive on each load. It could make the difference between having to bring in data overnight to doing frequent reloads during the day.

The Qlik Community is a great resource and you will find inspired solutions from a number of great QlikView developers on there. You can find a list of the responses I have made under my Qlik Community Profile. Add me as a friend to have posts I make added to your recent feed on Qlik Community.

By Steve Dark|2015-10-12T21:46:10+00:00March 6th, 2014|QlikView Tutorial|23 Comments

About the Author: Steve Dark

Steve is owner and principal consultant at Quick Intelligence. He is a Qlik Luminary, Qlik Community MVP and Technical Editor of a number of QlikView Books.

23 Comments

Karl Pover March 11, 2014 at 7:46 am - Reply
Great advanced scripting example that got me thinking about incremental loads of text files. Rob Wunderlich recently posted about the buffer command and I remember that a buffer (incremental) command also existed. I didn’t know if it worked on CSV files, but after some test it apparently does. So if you have a simple requirement to load additional rows from a single CSV file without changing or deleting previously loaded rows this could also be a nice, quick fix. Of course, your solution is much more robust.
Karl
Rene van der Velde April 15, 2014 at 10:07 am - Reply
Great insights! Thanks for sharing them with us.
- Steve Dark April 15, 2014 at 10:16 am - Reply
  Thanks for your comment Rene – glad you find the solutions I post useful.
Kushal Thakral May 10, 2014 at 9:14 am - Reply
Hi Steve
Really a good document to read and helped me to utilize it in real time scenario
Thanks
Kushal
- Steve Dark May 10, 2014 at 10:36 am - Reply
  Hi Kushal, many thanks for your comment, glad you found the post useful. This technique that I use regularly when coming up against text based data sources.
Marek June 3, 2014 at 12:39 pm - Reply
Hi Steve,
Thank you for this, its really helpful and indeed incredible how fast the loads are.
I have however came across a small problem where the csv files contain “()” brackets in the filename:
eg. Test 123 (Production).csv
When that’s the case the following error comes up
Unknown file format specifier:Production
I am currently trying to come up with a workaround for this so will share when I do.
Regards
Marek
- Steve Dark June 3, 2014 at 12:42 pm - Reply
  Hi Marek,
  Thanks for your comment. You will want to add square brackets around each instance of $(vQVD) or $(vFile). I’m usually very careful to avoid punctuation or spaces in filenames, so don’t need to worry about square brackets. In your case though you will need to add the square brackets in.
Marek June 3, 2014 at 1:13 pm - Reply
Hi again,
Thanks so much for the quick and “obvious” suggestion! I have been looking at the problem from a completely wrong angle :)
Solution now works great!
Pravin October 13, 2014 at 5:24 pm - Reply
Hi Steve – That’s a Nice article, as USUAL!
- Steve Dark October 13, 2014 at 6:35 pm - Reply
  Thanks, Pravin. Glad you like the posts.
Steve Dark June 25, 2015 at 10:07 pm - Reply
Just posted some additional code on QlikCommunity, for merging the QVDs created from the routines above into Monthly and Master QVDs: https://community.qlik.com/thread/170103
Enjoy.
James Kennedy April 13, 2017 at 7:38 pm - Reply
I can’t seem to get the basic run of this to work. It keeps telling me that the system cannot find the qvd file in the data: section. What am I doing wrong?
- Steve Dark April 15, 2017 at 7:17 am - Reply
  Hi James – have you used the Debug feature in the Script editor to step through each line of code to see what is happening? My best guess is that the CSV folder is not where the code is expecting it to be, so no data is being loaded from the CSV (as it would not be found by the IF statement), no QVD is therefore written so there is no QVD to load. If you are using the path ..\Data\*.csv you would need to have an app in a folder, which has a Data folder at the same level as it. You could try putting the CSV files in the same folder as the QVW and changing all instances of ..\Data\ to .\, to see if that works?
Sudhakar December 27, 2018 at 2:44 pm - Reply
Hi Steve,
I need to writ the qvds into a separate folder. How can I achieve this please?
Thanks
SB
- Steve Dark December 29, 2018 at 8:45 am - Reply
  You can construct the output path in code, and this can include variable folders as well as filenames. Set the variables for the various parts in the loop and then use them in the store, something like this:
  STORE CurrentFile INTO [..\QVDRoot\$(thisFolder)\$(currentFile)_$(dateSuffix).qvd];
Ishuca May 25, 2019 at 9:24 pm - Reply
Hi..nice info you have there, but I have an issue where i want to load each excel files in a folder(all excel) and store each of them in another folder(all qvd). I think your basic solution works for the issue but I can’t figure out how to store the qvds to another folder. I need to use the same file name so i need the replace function.
Hope you can help me.
Thanks.
- Steve Dark May 26, 2019 at 8:06 am - Reply
  The simplest way is probably to have two replace statements. The first to replace the extension and the other to replace the file path. The other way is to split the file name part from the path and then concatenate on the output path. Hope that helps.
Ali March 24, 2020 at 9:28 pm - Reply
Hi Steve,
This looks like what i am looking to do in QS. However, I cannot get the script to work.
Any thoughts? I am new to coding in QS
- Steve Dark March 25, 2020 at 8:41 am - Reply
  Hi,
  You are going to need to give me a little more information than that, regarding how far you have got so far and what error you are seeing. Please share some more detail and I will be happy to help.
Meenakshi mantri January 8, 2021 at 5:06 pm - Reply
Hi Steve,
Will this code work with Qlik sense.Thanks
- Steve Dark January 8, 2021 at 10:20 pm - Reply
  Hi,
  The approach, and the majority of the code will work in Sense. The only thing that needs to change is that the references to drive letters, UNC or relative paths need to change to valid lib:// connections. Give a shout if you get stuck.
marcus January 20, 2022 at 10:11 am - Reply
Hi Steve,
would it be possible to load in the files from multiple folders instead of one folder? i have hundreds of sub folders that contain further sub folders where i need to get all the csvs from.
all the folders have one parent folder.
is it possible to search all the sub folders in the parent folder and pull in all the csvs?
thanks
- Steve Dark January 20, 2022 at 4:11 pm - Reply
  Hi,
  The quick answer is, Yes, you can. As well as the FileList function there is also a DirList function, which works in a similar way but lists sub folders. You should then be able to create a nested loop that finds all files in all folders. Good luck!

The CSV Advantage

The Basic Solution

Taking It Further

Conclusions

About the Author: Steve Dark

Google SMTP and IMAP to break in Qlik Apps Next Month

Qlik QVD XML Header Viewer

Qlik Set Analysis Tutorial – Star Equals

The Qlik Developers Toolbox

Variables With Parameters In Qlik Load Scripts

23 Comments

Leave A Comment Cancel reply

How To Convert Drop Folder Files to QVD

The CSV Advantage

The Basic Solution

Taking It Further

Conclusions

Share This Story, Choose Your Platform!

About the Author: Steve Dark

Related Posts

23 Comments

Leave A Comment Cancel reply