Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spotify Data 2023 - analysis #2

Merged
merged 3 commits into from
Feb 7, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
solve: Easy #1-#5
  • Loading branch information
faizanxmulla committed Feb 4, 2024
commit e69f5458bfbaa9b78abdae3252db668378f06e75
44 changes: 44 additions & 0 deletions datalemur-solutions/1 - Easy/01-histogram-of-tweets.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
-- Group the users by the number of tweets they posted in 2022 and count the number of users in each group.

-- Solution 1 : using subquery

SELECT tweet_count_per_user AS tweet_bucket,
Count(user_id) AS users_num
FROM (SELECT user_id,
Count(tweet_id) AS tweet_count_per_user
FROM tweets
WHERE tweet_date BETWEEN '2022-01-01' AND '2022-12-31'
GROUP BY 1) AS total_tweets
GROUP BY 1


-- Solution 2 : using CTE

WITH total_tweets
AS (SELECT user_id,
Count(tweet_id) AS tweet_count_per_user
FROM tweets
WHERE tweet_date BETWEEN '2022-01-01' AND '2022-12-31'
GROUP BY 1)

SELECT tweet_count_per_user AS tweet_bucket,
Count(user_id) AS users_num
FROM total_tweets
GROUP BY 1;


-- my first approach : to use substring or extract_year to get the tweets in 2022.

SELECT SUBSTRING(CAST(tweet_date AS VARCHAR),1, 4) as subs
FROM tweets;

-- Approach towards the solution :
-- First, we need to find the number of tweets posted by each user in 2022.

SELECT user_id,
Count(tweet_id) AS tweet_count_per_user
FROM tweets
WHERE tweet_date BETWEEN '2022-01-01' AND '2022-12-31'
GROUP BY 1;


23 changes: 23 additions & 0 deletions datalemur-solutions/1 - Easy/02-data-science-skills.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
-- Given a table of candidates and their skills, you're tasked with finding the candidates best suited for an open Data Science job. You want to find candidates who are proficient in Python, Tableau, and PostgreSQL.

-- Write a query to list the candidates who possess all of the required skills for the job. Sort the output by candidate ID in ascending order.


-- solution :

SELECT DISTINCT candidate_id
FROM candidates
WHERE skill in ('Python', 'Tableau', 'PostgreSQL')
GROUP BY 1
HAVING COUNT(skill)='3'
ORDER BY 1


-- first approach :

SELECT DISTINCT candidate_id
FROM candidates
WHERE skill in ('Python', 'Tableau', 'PostgreSQL')
GROUP BY 1

-- REMARKS : didn't read the question properly. always see output table, if given & try to work backwards from that.
52 changes: 52 additions & 0 deletions datalemur-solutions/1 - Easy/03-page-with-no-likes.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
-- Assume you're given two tables containing data about Facebook Pages and their respective likes (as in "Like a Facebook Page").

-- Write a query to return the IDs of the Facebook pages that have zero likes. The output should be sorted in ascending order based on the page IDs.


-- Solution 1 : using NOT IN. (my approach)

SELECT page_id
FROM pages
WHERE page_id NOT IN (SELECT page_id
FROM pages_likes
WHERE page_id IS NOT NULL)


-- Solution 2: using NOT EXISTS

SELECT page_id
FROM pages
WHERE NOT EXISTS (SELECT page_id
FROM page_likes AS likes
WHERE likes.page_id = pages.page_id)


-- Solution 3 : using Joins (my initial approach)

SELECT pages.page_id
FROM pages
LEFT OUTER JOIN page_likes AS likes
ON pages.page_id = likes.page_id
WHERE likes.page_id IS NULL;


-- Solution 4 : kind of ingenius.

SELECT page_id
FROM pages
EXCEPT
SELECT page_id
FROM page_likes;



-- first approach :

SELECT page_id
FROM pages p INNER JOIN page_likes pl ON p.page_id=pl.page_id
WHERE COUNT(pl.page_id) = 0
ORDER BY 1

-- REMARKS :
-- 1. saw 2 tables given, and immediately went to join them w/o thinking. could be more easily solved w/o joins.
-- 2. to remember use of : EXCEPT & INTERSECT.
13 changes: 13 additions & 0 deletions datalemur-solutions/1 - Easy/04-unfinished-parts.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
-- Tesla is investigating production bottlenecks and they need your help to extract the relevant data. Write a query to determine which parts have begun the assembly process but are not yet finished.

-- Assumptions:

-- 1. parts_assembly table contains all parts currently in production, each at varying stages of the assembly process.
-- 2. An unfinished part is one that lacks a finish_date.


-- Solution (as well as first approach):
SELECT part, assembly_step
FROM parts_assembly
WHERE finish_date IS NULL;

52 changes: 52 additions & 0 deletions datalemur-solutions/1 - Easy/05-laptop-vs-mobile-viewership.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
-- Assume you're given the table on user viewership categorised by device type where the three types are laptop, tablet, and phone.

-- Write a query that calculates the total viewership for laptops and mobile devices where mobile is defined as the sum of tablet and phone viewership. Output the total viewership for laptops as laptop_reviews and the total viewership for mobile devices as mobile_views.



-- Solution 1: using [CASE WHEN THEN ELSE]

SELECT Sum(CASE
WHEN device_type = 'laptop' THEN 1
ELSE 0
END) AS laptop_views,
Sum(CASE
WHEN device_type IN ( 'tablet', 'phone' ) THEN 1
ELSE 0
END) AS mobile_views
FROM viewership;


-- Solution 2 : using FILTER (didnt knew FILTER existed)

SELECT Count(*) filter( WHERE device_type='laptop') AS laptop_views,
Count(*) filter( WHERE device_type IN('tablet', 'phone')) AS mobile_views
FROM viewership;


-- Solution 3: using JOINs (wolud not have thought)

SELECT Count(DISTINCT a.user_id) AS "laptop_views",
Count(DISTINCT b.user_id) AS "mobile_views"
FROM viewership AS a
INNER JOIN viewership AS b
ON a.device_type = 'laptop'
AND b.device_type IN ( 'tablet', 'phone' )



-- first approach: was stuck and couldn't get ahead. should have used SUM() instead of COUNT()

SELECT COUNT(CASE
WHEN device_type = 'laptop'
END) AS laptop_views,
COUNT(CASE
WHEN device_type IN ( 'tablet', 'phone' )
END) AS mobile_views
FROM viewership;



-- REMARKS :
-- 1. didn't get the idea to use 'THEN 1 ELSE 0' --> makes it very easy to solve.
-- 2. also can use NOT IN('laptop') everywhere instead of, IN ('tablet', 'phone')