Recently I blogged on the pitfalls of not investing sufficient effort at the start of a QlikView project to avoid problems later on (see the post Start Your QlikView Project The Right Way). In this article I advised that the single most significant factor in whether a QlikView project is a success is a well designed data model. Here I would like to share some of my suggestions for getting the data model right, as this could save you a lot of pain if you are just about to build a QlikView Document or indeed a whole environment.
First of all, it is crucial that you decide exactly which columns and rows you require and only pull them in to the data model. Redundant fields can cause unncessary confusion, performance issues and ultimately extra effort. Prototyping with more fields is acceptable – but strip back once you know what is required.
Careful consideration also has to be given to the fact that QlikView associations work in a very different way to SQL joins. As you probably already know; any fields with the same name in more than one table will associate. If there are more than one field with the same name a synthetic key is created – and this is always a ‘bad thing’. If you see extra tables in your data model with a dollar sign in the name you need to revise and improve your model.
In my experience it is almost always better to concatenate fact tables together rather than to try and join them. This way you can have many columns with the same name and the values in them will appear just as one field in QlikView. Selecting a value in that field will pull out values from each of the concatenated tables. This technique is especially useful when pulling data from different sources, or applying budget or forecast data into your model. Concatenation is ideally done by having exactly the same fields in all tables you wish to concatenate (concatenation will then happen automatically) or by forcing a concatenation with a CONCATENATE statement. If using the latter method make sure you are aware of the impact on Optimized QVD Loads.
Whenever you are building on top of a database with a number of fair sized tables get the business to consider exactly what they need from each table – columns and rows. If they have direct access to the back end database they can investigate the tables that way. However, what I generally do is build a separate QlikView document for each source table (so there are no join issues) that they can use for this investigation. Each document lists each field in that table in a multi box. This allows users to see every value in that table and decide whether it is required or not. Typically I will add a Table Box of all values (with a calculation condition to limit rows) so they can see all of the raw data in context. Encourage the business users to be ruthless with this shedding of fields and rows – it could really help as the project progresses. A good example of shedding rows is historical data – most businesses change their shape considerably over a a few years – so comparisons with data more than a couple of years old is usually not relevant.
During this data identification process also get the users to think about how the fields should be referred to in the UI. Consistency and clarity are essential. I will often provide spreadsheets of the source field names (or even a printout if they are paper-people) and get them to fill in the required ‘user friendly’ field names. Rename to these field names as soon as you extract from the database – using an ‘as’ statement in the load. Insert spaces in field names where these are not present in the source – this will require you to use square brackets in expressions but it will save retyping legends each time a field is used. Also any underscores and the like should be removed at this point.
Do all you can to simplify and rationalise the data model in the load script. This is better than having lots of joins in the finished data model. The best way of doing this is with ApplyMap (for more details see my previous posting ApplyMap – Is It So Wrong?). Pre-calculate any values you can in the load script to save processor time later (perhaps calculate margins or FX values?). Also, change any codes into values (eg. Y/N into Yes/No) thinking always of what your users will expect to see. I always add a ‘Count’ field to each table (ie. ‘1 as CustomerCount,’) – as this can greatly simplify expressions later. In short, the more work you can do in your load script the better – the benefits to document performance and development time can be immense.
It is almost always (and I had to think twice about the ‘almost’ there) a good idea to have a QVD layer built into your reload strategy. These are essential for incremental loads, but are also very useful for chunking up your data extracts into manageable portions. It can also aid in the breaking down of the development / UAT / promote cycle into segments that are simpler to unit test. Extracting of data from multiple tables in the source databases can be done concurrently if there is a separate ‘QVD Load’ routine for each source table.
When data is loaded into your presentation layer test any remaining joins carefully. As previously mentioned do not allow synthetic keys as whilst these can work some of the time they will always hurt performance and can cause crashes – always build your own composite keys if required. Watch out for issues with non-numeric join keys that can be caused by rogue spaces or mismatch of capitalisation. The best thing to remember when testing a data model is to assume nothing.
Whilst QlikView is often viewed as a tool for producing flashy charts that are dynamic and engaging it is in fact nothing without a solid data model behind it. In most developments more than half of the time needs to be spent on perfecting the data model. If you ensure that you invest that time then you have a far greater chance of success in your QlikView projects.
Nice one Steve!
Actually, this is the kind of post that, if fed properly, gives you a real idea on how to approach to the earlier stages of the QlikView development.
I’d stress on the AutoNumber() types of functions for linking tables and differentiate since the very beginning the fields which values will be shown and the fields that will be used for expressions, joins, maps, links and so. For example, a MonthName() field to show the actual date properly formatted in the charts, and another Num() field to be used in set analysis. This might not be worth doing when a document only pulls a few hundreds of thousands of records, but boosts performance dramatically when the document works with millions of rows.
Finally, it depends largely on the requirements and the data model of course, but the fewer the steps between tables the faster the model will work. It’s always preferable to use more disk space storing more fields (numeric, strings) than make more complex expressions that will result on slower performance and poorer user experience.
Thanks Steve for an interesting post. Like you also mentioned in your post I would really like to stress the importance of doing as much of the calculations as possible in the load scripts, rather than in the sheet objects.
I also like to separate my load scripts into tabs based on the tabs I will use in the UI. That makes it a bit clearer (together with comments of course), especially if someone else will work on the data model in the future. Cheers Mathias
Miguel said:
For example, a MonthName() field to show the actual date properly formatted in the charts, and another Num() field to be used in set analysis.
I never thought of that and could be a large driver in performance. Great tip. Oh yeah, and great article as well, Steve.
Hi Steve,
Thanks for the great post. You suggested to combine multiple fact tables in to a single fact. Can you shed some light on how to handle the data model in case there are fact tables with different levels of granularity.
Regards,
Sanjay
Hi Sanjay – thanks for your comment. Ideally, if the most granular data sums up to your aggregated data, you would only bring in the lowest level of granularity. QlikView will sum up across many rows very quickly. If the granular data doesn’t roll up to the top level (perhaps the lowest level is based on transaction date whilst your rolled up data is at invoice date) it is best to have both sets of data with separate fields for the values. So, to take the example above, you would pull in the transaction level data into a field called Transaction Amount and force a zero into the Invoice Amount field. Then you would concatenate the invoice level data in with a value in the Invoice Amount field and a zero in the Transaction Amount field. I have projects where there are several different types of data concatenated in this way.
Nice article Steve, there is no substitute for a solid data model in any BI project.
Regards,
Nagendhran Rangarajan
I agree with you, Steve!
Good data model is the key for one sucessful QV dashboard.
As my point of view and habit, I prefer the dimensional modeling for QV model. I found the synthetic key is always big problem in QV model based on traditional table relationship(ER mode), some problems we need to fix, like loop bug!
Based on dimensional model(Star schema), relationship of tables is quite clear and our QV job will be also high efficient. so most of time, this dimensional model will be obtained from DM and EDW. Most of job for table integration and clear will be implemented by ETL staff, QV guys should focus on dashboard design rather than cleaning synthetic key bug.
Hi Steve,
Regarding concatenating two fact tables into one. If both of the tables data is in hundred millions of record then concatenating may impact the performance as we need to use set analysis or if clause in expression for differentiating the data. We can take the example of Transaction and Invoice amount.
What should be the solution in that case.
Regards,
Anosh Nathaniel
Hi Anosh,
You will find that using Set Analysis to access half of the rows in one huge fact table is much more performant than accessing values in separate tables with a complex join key. In the case of invoice line items and invoice details there will be an obvious join key, and for simplicity with relatively small numbers of rows I might go for two tables with a join – however with serious numbers of rows the single fact will be better. The other advantage of concatenation is that if you have any other commonality (eg. Date, Year, Qtr) etc. you can have single fields for these – rather than needing separate fields with slightly different names on each of the tables.
Hi Steve,
Thanks for your reply. I understood what you want to say. I have few questions regarding implementation and bug fixing of Qlikview appliation.
1. How the tabs in Qlikview for datamodel script and QVD generation script should be created? Should each tab be created based on database tables or they should be created based on UI tabs? Which approach should be better in terms of code quality and maintainability?
2. If an production Qlikview application is taking ~1 hour to load and there is any changes/modification/bug comes, doing a little change in script and loading it fully to see the effect is very cumbersome process. We can also partially load the script but it can only be used to check whether our changes are syntactically correct.
Any suggestion how can we speed up this process?
Hi Anosh,
I personally always have a tab per main datasource in my load scripts – with no bearing on UI tabs at all. Other tabs that may be present are Mapping Tables, Looping Constructs, Variables, Clean-Up.
If you have an hour long load there is probably some optimisation that can take place – certainly incremental loads if those are not already in place.
If you can split out the creation of different QVD files into separate generators this can mean that you can work on one at a time that can save a lot of time – it also means you can schedule things to run concurrently which can also speed up overall run times. Where I do have multiple QVD creates in a single app I will often put each tab in an IF statement, with a set of flags on the first tab to select which tabs should run.
Debug with limited rows can be useful – but make sure you comment out any QVD writes if you use this (particularly if used for incrementals). Also, putting a TOP statement in your SQL script rather than using the QlikView debug can be quicker – and means you can get fewer rows from just one of the tables in your load.
It’s always a case of finding the best solution for the specific set of circumstances – no two implementations are ever exactly the same!
[…] See on https://www.quickintelligence.co.uk […]
Hi steve,
Couple of things I’ve understand from your post so far, just wanted to check on the multiple facts combining in one fact for ex:-
I’ve already a table called “A” which is summary data in the data warehouse on the other side I’ve one more table called “B” which is transaction level. if I wanted to show the analysis based on the table A which is drill down to Table B ..what will be the best practise for that?
Regards
Imran
Hi Imran,
In this case – because you have different levels of granularity – you will probably be best having two tables in the data model and allowing QlikView to do the associations in the front end.
thnaks steve, for throwing some lights on the approach. just wanted to check if i want to store multiple tables in one QVD, is it possbile?
Ex:- i have 1 fact table and 7 dimesion tables. my requirment is to store all this table in one qvd.
Hi Imran,
No. A QVD can only store one table. You will need to issue a store statement for each of your tables. If you require a single file, for portability, you will either need to use ApplyMap or JOIN to create a single table – or use a QVW as a data source and load from it with a binary load.
Thanks steve,
Especially designing larger application (mine has about 600 million records) – to build a good foundation in the form of perfect data model is very important. Model affects the response time, the appetite for RAM and finally the complexity of expressions. It is worth to spend few days testing different model mutations …
Steve, thank you for all the interesting tips.
Darek
Hi Steve its a nice blog to get all the answers about datamodelling
could you plz guide me in the right direction of creating a data model.
I want to build a data model with five different tables( coming from differnt sources like flat files and oracle database) . the tables have both measures and dimensions( OLTP tables).
i did the following for the oltp talbes .
1) I have concatenated all the 5 different tables and identified each table with tablename.
2) the reason i did is some of the tables have 3 fields common, some have 4 fields common and so on….
3)After concatenating all the tables i took the common date from the above table with the resident load(distinct one) and created a calendar table.
4)on the front end i have 5 different tabs with different data and used set analysis for the expression using table name as identifier that works fine.
5) Now the issue is the tables are getting huge day by day and i have huge amount of data, now i question myself is this the better approach or should i have done in other way.
6) could you tell us about this kind of experiences and how to deal with this.
In qlikiew we have the flexibility of designing own ETL (so we don’t have nay data warehouse where the data massaging is done and fed to data warehouse.
practically its impossible to get the star or snow flake (with out have the facts and dimensions as separate tables).
So doing this is a good practice or bad designing of data-modeling ?
Thanks
sree..
If you are getting correct values out of your data model and are getting results without having to wait for calculations to complete then it sounds like you are on the right lines. If you are experiencing data loads taking longer to complete due to new data being created each day you probably need to look at incremental loads. I have done another blog post on this topic that you may find helpful: Incremental Loads.
If data volumes get problematic with analysis then you may need to look to aggregate historical files to remove some of the detail and only have row level data for recent data. You will find plenty of advice on this on QlikCommunity and other blogs.
Steve… how do you generally handle tables with “header” and “lines”? Do you join them before bringing into QlikView (de-normalize the header data) or do you bring in separate header and lines tables and let the internal key join the table naturally?
Also, what is the best method for what I will call “Common Lookups”?! For example… consider employee records with business group and cost center. But you also have business group and cost center fields in your accounts payable tables as well. Yet, these records should NOT automatically join together. If I change the field names to “Employee Business Group” and “Employee Cost Center” the records do not join. (GOOD). But if I want to set up TWO Tabs, one with employee data and one with AP data I don’t want to “re-select” the current Business Group and Cost Center “selections”. When I switch tabs, I want the AP data to show me the current selection of data. Currently I’m stumped. Thanks for any advice you can provide.
Hi Kevin.
Regarding header and body tables in a source database; it depends. I have created apps both ways – the call is typically based on how much information exists in the header that is not in the line record. If it is only a few points of data I will usually ApplyMap this in or use a SQL JOIN (depending on size of tables and bandwidth).
A good approach for common keys, as you describe them, is to create a Link Table. To do this you need a unique ID in your main fact table. As well as loading your main fact table (perhaps without any business group or cost centre information) load in a link table, making a pass for each instance of the key. So you would load, the ID, hard code a type and give the GroupID, and repeat this for each type, eg.:
BusinessGroupLink:
LOAD
UniqueID,
‘Employee’ as [BG Type],
EmployeeBusinessGroupID as BGID
FROM MainData.qvd (qvd);
BusinessGroupLink:
LOAD
UniqueID,
‘Accounts’ as [BG Type],
AccountBusinessGroupID as BGID
FROM MainData.qvd (qvd);
You could apply map the Business Group Name on to the link table also – to avoid an extra join in the data model.
That works for joins to the same table from more than one field in your fact table.
If you have ID’s joining to another table in more than one table then you could load your lookup table as a data island (ie. with a field name that doesn’t actually link to either table) and have a list box on the name from the data island. To make selections on that list box apply to values in expressions you would need to apply the selection using Set Analysis and the P() operator. The downside of this approach is that the Green/White/Grey functionality will not work correctly on this field.
Hope that gives you some ideas to start from!
how many reports can we generate in dashboard
Hi Gireesh. This depends how you define ‘report’. QlikView is very much an analysis tool rather than a reporting one. However, if you make use of features like Cycle groups and selection of expressions – perhaps by variables set by buttons – then you can get as many different views of your data as you need in a single table or chart.
To combine fact or not , it also have dependencies on Business Model and level of granularity. on the fly we can not concatenate any table if it have different level of granularity.
Steve,
Your screenshot of the 3 side-by-side multiboxes comes from 3 separate QV docs, correct? Also, you just use this as a basis for users to prune unwanted columns from each, correct? Also, if you are concatenating tables, each possibly from a separate business process, do you then add another column Ex. BusinessProcessType (‘Sales’, ‘Customer Service’, ‘Inventory’), to allow the users to filter on? Without this separate column, users would never really know which dimensional columns go with which measures. I am a dimensional modeling guy used to spinning individual data marts for each business process.
Hi Dave – the multiboxes pictured there are all from one application, a single table in fact. There are just three of them so that all fields can be seen at one time – rather than having to use the scroll bar. This is one of the ways I use for pruning fields – null and empty fields stand out very quickly using this method.
When concatenating tables (or more accurately when creating QVD’s for concatenation) I will always add in a field to say what type of data it is on each row. Often I will add other fields, such as source file name when loading from Excel – for example. The field defining the type of data in each row can be useful for users to select on – but I use it more often in Set Analysis when writing expressions. This is a very efficient way of ensuring QlikView only has to look at a smaller set of data when getting values from a large concatenated fact table.
Ah – that makes sense. Concerning concatenation, I just created a new forum post “I see no need to ever concatenate or use link tables”. I am only referencing that here, as it is verbose, so trying to save space. In that post I tried to illustrate my point. Maybe my logic is flawed, however :)
Dave, I’ve followed your lead and taken this thread over to QlikCommunity, here: http://community.qlikview.com/message/454533 – Many thanks for your input.
Nice
[…] Further readings – Perfect your data model […]
how to association two tables in data model
Tables will associate if there are one or more common fields (exact matches) between the two tables.
Hello Steve,
This is a very useful article indeed. The hint of adding the count 1 for customer is something I will definitely apply.
I am just about to start on a new project where the fact table bring back 2 years of data is around 200 million records. I was leaning towards making a qvd for each month and concatenate it finally.
The fact table contains a transaction date, dim 1, dim 2, dim 3, customer id, amount, no of transactions
There is a need to use the count(distinct customer_id) to identify unique customers and I am not too sure if this will prove costly.
There is a need to slice and dice the measure ‘ number of unique customers’ by dim 1, dim 2 and dim 3.
If not for this measure I can pre – aggregate and the size of the data will be greatly reduced.
I am wondering what would be the best approach towards this. Your advise on this would be highly appreciated.
Thanks in Advance
Hi Lav,
Some things you can only really tell by benchmarking. Given the fact that counting customers is an important thing it is probably worth having a customer table with a 1 as CustomerCount in it, associated on the CustomerID. If there is not a separate source table with customers in you can create this by loading from other QVD’s with WHERE NOT EXISTS (CustomerID), this will load the first instance of each customer record – giving a countable table.
A QVD for each month sounds sensible. You can then decide how far back data can change and only refresh those QVD’s. This only works though if there are sensible date based indexes in the source database. You can get cleverer and have daily QVD’s for the current month and then consolidate into a monthly QVD at the end of the month. See whether you need to optimise to that extent or not – if monthly files are quick enough I would stick with that.
Pre-aggregation is a good approach. You can aggregate to different levels based on the date, so for the current year you may have data for each day, but then aggregate to the 1st of the month for more historical data. Again, see if poor performance forces you down that route – if not don’t worry about it.
Steve
Hi Steve,
I had a task which I need to create a datamodel from 3 different tables,
Table1
Tbl1 Docnumber,
Matnr,
Site,
Date,
Cost,
Unit,
SRP
Table2
Tbl1 Docnumber
Tbl2 Docnumber,
Matnr,
Site,
Date,
Cost,
Unit,
SRP,
Table3
Tbl2 Docnumber,
Matnr,
Site,
Date,
Cost,
Unit,
SRP
This is how the tables should be link when natural load is used. The problem with this is Matnr, Site & Date have their dimension tables that is also need on the project. To solve the issue I’ve used a link table which has key that consist of the 3 (matnr,site,date) and disregard their links on Docnumbers, I encountered a problem on this approach because when I select a certain key it shows multiple doc numbers which is wrong. upon investigation my key combinations are not only present in one document so it is not good to use as a key. How will it be possible for me to link the tables on their doc numbers and at the same time load the dimension table of Matnr,site and date????
Thanks in Advance!!
-Charles
Hi Charles. A link table certainly sounds like the route ahead. If your keys are duplicating it may be that you need to also place the document number in the key and have many more distinct values in your link table. It’s hard to second guess though without spending time looking at the data.
HI Steve!
I have built a star schema data model for a requirement currently am working on.
In the fact table, there’s a Income Statement allocation logic and Balance Sheet distribution logic against various branches. Since both of them are different, it resulted in two branch fields(IS_BRANCH and BS_BRANCH). And in all the records, there are different branch names under both, but not always.
Now, how to unify branch selection across the dashboard?
Any ideas,suggestions shall help!
Thanks,
KrX
Hi,
I tend to find that concatenating various datasources whilst keeping as many fields as possible named the same is the best bet. If you have incomplete branch names – but the code is correct I would suggest an ApplyMap onto the codes to get the name. In order to make the concatenate optimised (and therefore much quicker to load from QVD) you may want to add null BS fields to your IS table and null IS fields to your BS table. By having the same structure in both QVDs concatenation becomes easier.
Nice article!
Hi Steve, Thank you so much for all your informative posts. I love the way you explain:) Pardon me if i am asking a very basic doubt – When i create a data model that invloves more than 20 tables from different Data sources, which of the following 2 methods is a best practice? 1. Let Qlikview join all the tables by Associative property (having renamed the key fields appropriately) and calculating all the KPI’s in the front end using complex expressions (mostly aggregated by count) 2. I have been given SQL queries by Data Modellers for every KPI. Do i implement these queries in LOAD scripts(each query/KPI per table) and then join/concatenate all of them into 2 or 3 Fact/Dims tables and finally write simple expressions (like count(Customers)) in the front end. Please advice. Would help me a lot.
Unfortunately it depends. Generally having things pre-aggregated for you in SQL means that you can get to the KPI values you need very quickly in QlikView and things perform well there. Too many associations may make your app feel a bit slow – but you will be able to slice and dice the data many different ways in the front end – and drill to detail. I would suggest seeing how things perform with the full data model and then decide if you need to revert to pre-aggregated values because of performance.
Thanks Steve! My final questions please – Well, my requirement doesn’t involve a lot of slicing and dicing, rather users might want to look at the counts of Customers split by limited KPIs like (How many house loans have been requested/issued/rejected) on a time scale. There are very few dimensions but every KPI appears to be a Fact (Key being the count of Customers). I understand from your post that all Facts needs to be concatenated. But if i concatenate all KPI tables by the same field (Customer), my data model ends up having just one Fact table and one Calender table. All dimensions are available in the Fact itself. Is this a bad design? And sorry for being naive, but what does this mean – “add a ‘Count’ field to each table (ie. ‘1 as CustomerCount,’)”. I believe it would help me if understood, since i am dealing with counts most of the time. Thanks again in advance:)
Hi Nisha, no, it sounds like the single table design here is exactly the right thing to be doing. You may want to ensure you have Set Analysis on all expressions so that you never aggregate numbers across KPIs.
The count field is simply a way of allowing you to get a count of rows by doing sum(MyCounter), this is more efficient than doing count(SomeField). To enable this you have to create the field in the load, like this:
LOAD
1 as MyCounter,
CustomerName,
KPI
…
Hope that makes sense?
Yeah I got it Steve :) Thanks a bunch for the info and your valuable time :)
Steve, great article. You mention this “change any codes into values (eg. Y/N into Yes/No) “.
I’m pulling data in from various tables from a single SQL database and come across this a lot in mant of the fields in the tables, for example on a status field, where user sees: “issued”, “cancelled” or “stopped” (in the main application front-end) in the database it stores “I”, “Ca” or St”. which is what is then pulled into qlikview. Whats the best way to display the full value when the field (and values) are pulled into qlikview, can you give quick example?
Hi Dan, thanks for the comment.
If it is a simple Y/N you could put: if(upper(MyField) = ‘Y’, ‘Yes’, ‘No’) as MyNewField,
Or for fields with more values you could use ApplyMap: ApplyMap(‘Map_Status’, Status, ‘Other’) as NewStatus,
ApplyMap is a very versatile statement, and I have written a blog post on it here:
https://www.quickintelligence.co.uk/applymap-is-it-so-wrong/
Hope that helps.
Steve
Steve,
Regarding this portion:
“I will often provide spreadsheets of the source field names (or even a printout if they are paper-people) and get them to fill in the required ‘user friendly’ field names. Rename to these field names as soon as you extract from the database – using an ‘as’ statement in the load.”
An even better/faster way to do this is to load the spreadsheet they return to you as a mapping table and then
Rename Fields Using FieldNameMap;
That way you don’t have to manually put in all those “as” and fieldnames into your script.
For ease of development, I tend to work with source table names in the script if they’re developer-friendly and then do the field renaming step at the end of the reload process, that gives the advantages of the friendly field names in the UI while making development easier by not requiring the extra characters (square brackets, spaces, etc).
Great post!
I don’t type in the AS statements. I use a little Excel to create the statement. If the Original fieldname is in Column A, Keep Flag in Column B and New Name (optional) in C I would use an expression like this in column D:
=IF(B1 = “Y”, CONCATENATE(” “, A1, ” as [“, IF(C1 = “”, A1, C1), “],”), “”)
This can then be copied and pasted directly into the script. The syntax may not be spot on (I’ve just typed it straight into the blog) but you get the idea.
I think that this approach is simpler for someone else reading the code later to see what is going on. I do think the RENAME FIELDS syntax is quite clever (and you will see from another post that I like MAPPING LOADs), but it is not something I use often.
Nice One Steve, Its a lot lot informative :)
Great post Steve, a lot of people have an idea of QlikView of a tool that you can just built great dashboards and reports from nothing.
I am seeing in Brazil a exponencial growth of QlikView, and it´s great, but its also important to remember that QlikView is nothing without a good Data Model behind it.
Thanks Washington. It’s amazing how many people think you can just use a RDBMS style data model and still have it perform.
[…] Perfect Your QlikView Data Model […]
Hiii,
Guys please help me on the below query,
I have a field [Actual 2013] and i want to take Year from the field. Suggest me in this.
Waiting for your reply.
Thanks
Ali
If the fieldname is always [Actual 2013] then the Year is always 2013 – so you can just hard code this (2013 as Year,). I suspect that you have more than one column with years in though? If so, what you need to use is the CROSSTABLE function, so, if you have data like this:
Product,Region,Actual 2013,Actual 2014,Actual 2015 etc. etc.
You would need to do:
Temp_Table:
CROSSTABLE (ColumnHead,Value, 2)
LOAD
*
FROM DataSource.qvd (qvd);
Table:
LOAD
*,
right(ColumnHead, 4) as Year
RESIDENT Temp_Table;
Hopefully I’ve second guessed your data structure correctly and that gives a solution, or if not, it points in the right direction?
Hi Steve,
I had a question regarding data model. I had a few fact tables storing data on different areas, in which all of them consist of million of records. When we are creating the Qlik data model, should we concatanate all these fact tables into one single fact table, or using a link table as a bridge to link them up?
Which one’s performance is better?
That is the $1,000,000 question.
For pure performance concatenation would be best (create QVDs with identical columns so you can do an optimised load from all of them), but this can break important associations between data types. If that is the case, and performance is ‘okay’ then a link table may be better. I’ve had a case where we couldn’t get the performance from a link table, due to data volume, so had a single fact table and build associations manually using Set Analysis and the P() function.
If you can make sure it is purely facts in your concatenated fact table and you have all dimensions values off to the side then often then association issues can be avoided anyway.
Thanks for your reply. (I saved $1,000,000 :D)
Actually, I think I used a wrong word, i think I should use “JOIN” rather than “CONCATANATE” because these tables could be joined with a key.
What I am thinking is, should we join these tables thru the scripts to form the data model, rather than to let Qlikview to join on the fly thru the app. Thinking if join on the fly with these huge tables, will have great impact on performance. Any thought?
In my opinion, using JOIN in the load is almost always a bad idea. Duplicate values are created where there is a one to many join and this runs the risk of giving incorrect values, it also hurts performance. Associations (i.e. multiple tables in the Qlik data model) are almost always better, but as you rightly say can slow things down in the front end. You may be pleasantly surprised at how efficient Qlik is at dealing with these. Bringing dimension tables into your fact tables is a different case altogether, using ApplyMap to bring in simple ID/Value type dimension values is always a good idea, and doesn’t carry the risk of a join. I’ve other posts on this kind of subject on this blog.
Hope that helps.