Website content analysis is about understanding which pages are seen by most of the users, and helps us to identify where to focus to improve the website performance.
First of all, let us see the data that I am going to use for deriving the insights.
Table Name : website_pageviews
SELECT * FROM website_pageviews
WHERE website_pageview_id < 1000
There is a pageview_id column which is incremented for each page view by the user automatically. created_at column is the date and time in which the particular page has been viewed. website_session_id is the column which is used to identify a user by using cookies. And finally the pageview_url is the name of the page the users viewed. I have limited the rows to 1000 for simplicity purposes in the query above.
Now, since we are aware about the columns and the table, we can dive into the insights in a Q/A manner. We will be answering some analytical questions which will be used to make decisions and improve the website performance.
We have to find which page is viewed first by the users. We can use the created_at or the website_pageview_id column to find it easily. For instance, we can say that the minimum value of the pageview_id indicates that it is seen first by the user in a session. Note: A website_session is a user viewing the website, we track it by using site cookies.
First, check the first entry pages:
SELECT
website_session_id,
MIN(website_pageview_id) as min_pv_id
FROM website_pageviews
WHERE website_pageview_id < 1000
GROUP BY website_session_id;
Here, I am getting the first pageview id that a user has viewed by grouping with the website_session_id.
We can see that, for each website_session_id, we are able to get the first viewed pageview_id by using the MIN function in the query.
Now, we need to create this as a temporary table so that we can join it with the original website_pageviews table so that we can find the page which is viewed the most by the users.
Note : TEMPORARY TABLE is a table which will only exists for that particular workbench session in mysql.
CREATE TEMPORARY TABLE first_pageview
SELECT
website_session_id,
MIN(website_pageview_id) as min_pv_id
FROM website_pageviews
WHERE website_pageview_id < 1000
GROUP BY website_session_id;
Here, I have created a temporary table named first_pageview which holds the website_session_id and its respective first viewed page by a user.
Now, We can join the two table to find out which page_url has more views.
SELECT
website_pageviews.pageview_url AS landing_page,
COUNT(DISTINCT first_pageview.website_session_id) AS session_landing_this_page
FROM first_pageview
LEFT JOIN website_pageviews
ON first_pageview.min_pv_id = website_pageviews.website_pageview_id
GROUP BY website_pageviews.pageview_url;
We are grouping both the tables to get the count of first viewed pages and its page_url.
And we are getting /home page as entry page for all the users. So we should try to keep that page more user friendly and enhance its performance, because it is the most viewed landing page for the first 1000 users we tracked.
コメント