Qlik Sense can consume data from just about any data source and present it visually. One rich source of data you may not have considered though is the RSS feed for a blog. This can, however, be loaded by Sense just like any other source. This blog post describes how and gives the code for reading from WordPress (and perhaps other) RSS feeds.
Why Load Data From RSS Feeds?
The answer to that question is entirely up to you.
For me though it was purely to provide an example data set. Recently I made available my Instant Sense Application, which can load any data set and some simple metadata about those data and provide a user configurable application. Along with the application I wanted to provide some example data feeds, and loading details of 2,000+ blog posts from AskQV.com seemed like a nice example. I then set about seeing whether this would work.
If you want to know more about what the Instant Sense Application is and, how it has been built, you will want to see my previous blog post, which is on that very topic.
Setting Up A REST Connection
In Sense we can’t just type a URL straight into our load script and pull data from it. We need to create a connection which is then available as a library. Out of the box it is required that we have a different connection for each REST endpoint, and paging is a problem. That is why the WITH CONNECTION statement was introduced. This allows us to inject changes to our REST connection at run time. It’s worth creating a new connection when connecting to a new source, but you only want to have two REST connections on your production Sense server which you can then re-purpose. The reason it’s two connections is that you can not change the GET and PUT settings using WITH CONNECTION.
I always create a connection called GenericGET and point it to https://jsonplaceholder.typicode.com/posts. The type is set to GET and everything else is left as defaults.
The first part of our code is to connect to that libary:
// Connect to our generic REST connection LIB CONNECT TO 'GenericGET';
Configuring Blog Sources
Now we have a REST connector we need to tell Sense where we want to get data from. Often I use a temporary INLINE table for this kind of purpose. In this instance though I am using the table as part of the end data model, so there are a couple more fields as part of this INLINE load. Obviously you can change the sources, and/or use a different data source such as a spreadsheet:
// Provide a list of blogs that you wish to load information from Blogs: LOAD 1 as BlogCount, [Blog URL], [Blog Name], [Blog Title] INLINE [ Blog URL,Blog Name,Blog Title https://www.quickintelligence.co.uk/,Quick Intelligence,The Quick Intelligence Blog https://qlikshow.com/,QlikShow,Patrick Tehubijuluw https://dataonthe.rocks/,Data On The Rocks,Data On The Rocks https://qlikfreak.wordpress.com/,Qlik Freak,Useful Qlik Resources ];
You can add any number of sources, but it is imperative that they all share the same RSS structure. As mentioned above, all these are WordPress blogs so the RSS is common.
The next thing we need to do is loop around each of these. This is a pretty standard technique for enumerating around data once you have it in a table. To get the values we need from the table we use the PEEK statement.
// Loop around each of these feeds for iBlog = 0 to NoOfRows('Blogs') - 1 // Grab the URL of the next blog let sWPSite = peek('Blog URL', iBlog, 'Blogs');
Getting on the Right Page
It seems that WordPress RSS feeds always serve up ten pages at a time. Some may vary, but that is what I have noted. This is then used for the control of the paging of our load. We will keep loading from the RSS feed, from newest to oldest posts, until we get fewer than ten pages returned. That will tell us we are on the last page. If the number of posts is divisible by ten that logic will not work, and though it pains me to do so I have put an ERRORMODE statement in here (rather than pulling the number of posts from the RSS header and using that).
// Set some paging variables let iDone = 0; let iPage = 1; // And loop for each page - the 500 is a back-stop increase this if you need do while iDone = 0 and iPage <= 500; // Set the paging URL - asking for paged=1 fails let sPage = if(iPage = 1, '', '?paged=$(iPage)'); // Cheeky error mode setting - if number of posts is divisible by 10 (the default page size) a crash can occur set errormode = 0;
Note also here that we are setting a variable which contains nothing for page one, but has the paged=n value for all subsequent pages. The WordPress RSS does a redirect if you request page one, and that freaks out the REST connector.
Some RSS feeds may not require paging at all. For instance, the AskQV.com aggregated RSS feed (and many others I have looked at) just serve up all posts in one long feed.
Fetch the RSS
This is the part which is going to vary the most, depending on the feed that you are pulling. As already mentioned this code is good for quite a lot of blogs, but not all of them. Create a new REST connection with the specific endpoint in to have Sense generate some code for you. Make sure you only tick the fields that you are interested to try and slim that down a bit. The code that comes back from the REST connector when you use the wizard is always ugly. Make sure you tidy it up a bit after the load. In this case we are only pulling one level of data, so the keys are all irrelevant.
// Pull the raw data from the feed RawData: SQL SELECT "__KEY_rss", (SELECT "__KEY_channel", "__FK_channel", (SELECT "title" AS "title_u0", "link" AS "link_u0", "creator", "pubDate", "description", // "encoded", // "commentRss", "__KEY_item", "__FK_item" FROM "item" PK "__KEY_item" FK "__FK_item") FROM "channel" PK "__KEY_channel" FK "__FK_channel") FROM XML "rss" PK "__KEY_rss" WITH CONNECTION ( URL "$(sWPSite)feed/$(sPage)" ); set errormode = 1;
Note that the ERRORMODE has gone back to reporting errors was at the end of each RSS load. Other failures will notify the user and cause the load to stop.
Formatting the Data
Now we just have to tidy up the data that we have received. As it is coming to us in XML format everything is in plain text. This means that the date and time field is only a string, which needs to be converted to a dual value. We also want different date break downs. In order to have a little more to analyse some other values are derived from the data. All relatively straight forward:
// Tidy up the load of fields for the current page BlogPosts: LOAD *, Date(DayStart([Date Time]), 'DD MMM YYYY') as Date, Date(MonthStart([Date Time]), 'MMM YYYY') as Month, Year([Date Time]) as Year, Month([Date Time]) as [Month Name], WeekDay([Date Time]) as Day, Hour([Date Time]) as Hour, Time( Round(([Date Time]-Floor([Date Time])),(1/24/4)), 'hh:mm') as Time, // to nearest 15 minutes today() - DayStart([Date Time]) as [Days Ago], len([Post Title]) as [Title Length] ; LOAD 1 as PostCount, '$(sWPSite)' as [Blog URL], [title_u0] as [Post Title], [link_u0] as [Post URL], [creator] as [Post Author], Date(Date#(mid(pubDate, 6, 17), 'DD MMM YYYY hh:mm'), 'DD MMM YYYY hh:mm') as [Date Time], [description] as Description //, // [encoded] as [Full Text], // [commentRss] as [Comment Feed] RESIDENT RawData WHERE NOT IsNull([__FK_item]);
I’m sure that there are further things which can be derived, but this gave me enough for my purposes. Notice that I have commented out the load of the full text of the blog post. This would give plenty of detail for searching, but is not helpful for analysis. The commentRss value gives an endpoint for all of the comments on the given post. You can craft code to go off and get those as well, if you feel so inclined.
There are a few bits of tidying up to do at the end of our script. Finishing off the loop for each of the pages and also the loop for each of the blogs. Finally, because we are good tidy coders, there is clearing out the variables we have created along the way:
// Check if we need another page and clean up let iDone = if(alt(NoOfRows('RawData'), 0) < 10, 1, iDone); let iPage = iPage + 1; DROP TABLE RawData; loop // End loop of blogs next // Tidy up behind ourselves let iBlog = ; let sPage = ; let iDone = ; let iPage = ;
So, there you have it, all you need to load details of all the posts from your favourite blogs into Sense.
Visualising the Results
Once you have the data in Sense you can obviously start building your own charts and tables over those data. Another approach though is to leverage our Instant Sense Application. Included in this free download is the load script above and a metadata definition file which populates the application with all the measures and dimensions to allow you to visualise these data.
If you would like to try the application, please go to https://www.quickintelligence.co.uk/isa and fill in the form. There is also a video there which demonstrates its use.
Also included with the download are load scripts which demonstrate the use of open APIs (Kids in Data and Magic The Gathering), Qlik Cloud Connectors (Twitter) and another RSS load (AskQV). Further data sources will be added over time – if you have a suggestion for one you would like to see please use the comments below.