Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datalemur Solutions - Easy #5

Merged
merged 14 commits into from
Mar 10, 2024
Prev Previous commit
Next Next commit
solve: Easy #12-#18
  • Loading branch information
faizanxmulla committed Feb 22, 2024
commit 5f715dd38a0023541ac0db2429ccadf3427ca32f
44 changes: 44 additions & 0 deletions datalemur-solutions/1 - Easy/12-app-clickthrough-rate-ctr-.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
-- Assume you have an events table on Facebook app analytics. Write a query to calculate the click-through rate (CTR) for the app in 2022 and round the results to 2 decimal places.

-- Definition and note:

-- - Percentage of click-through rate (CTR) = 100.0 * Number of clicks / Number of impressions
-- - To avoid integer division, multiply the CTR by 100.0, not 100.


-- Solution 1: (my approach too)

SELECT app_id,
ROUND(100.0 *
COUNT(*) FILTER(WHERE event_type='click') /
COUNT(*) FILTER(WHERE event_type='impression'), 2) as ctr
FROM events
WHERE EXTRACT(year from timestamp) = '2022'
GROUP BY 1


-- other approaches :

-- 1. using SUM(CASE)

ROUND(100.0 *
SUM(CASE WHEN event_type = 'click' THEN 1 ELSE 0 END) /
SUM(CASE WHEN event_type = 'impression' THEN 1 ELSE 0 END), 2) AS ctr


-- 2. using COUNT(CASE)

ROUND(100.0 *
COUNT(CASE WHEN event_type = 'click' THEN 1 ELSE NULL END) /
COUNT(CASE WHEN event_type = 'impression' THEN 1 ELSE NULL END), 2) AS ctr


-- 3. using SUM(FILTER)

ROUND(100.0 *
SUM(1) FILTER (WHERE event_type = 'click') /
SUM(1) FILTER (WHERE event_type = 'impression'), 2) AS ctr



-- remarks : messed up the brackets.
31 changes: 31 additions & 0 deletions datalemur-solutions/1 - Easy/13-second-day-confirmation.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
-- Assume you're given tables with information about TikTok user sign-ups and confirmations through email and text. New users on TikTok sign up using their email addresses, and upon sign-up, each user receives a text message confirmation to activate their account.

-- Write a query to display the user IDs of those who did not confirm their sign-up on the first day, but confirmed on the second day.

-- Definition:
-- action_date refers to the date when users activated their accounts and confirmed their sign-up through text messages.


-- user_id
-- <> confirm signup on first date but on second date.


SELECT DISTINCT e.user_id
FROM emails e JOIN texts t USING(email_id)
WHERE DATE_PART('Day', t.action_date - e.signup_date) = 1 and signup_action='Confirmed'




-- other approaches :

1. WHERE t.action_date = e.signup_date + INTERVAL '1 day'

2. EXTRACT(DAY FROM (t.action_date - e.signup_date)) = 1

3. CAST(action_date AS DATE) = CAST(signup_date AS DATE)+1

4. t.action_date::DATE - e.signup_date::DATE = 1


-- remarks: was trying to work with DATEDIFF() and DATE_ADD(), but didn't realize that it doesn't work in PostgreSQL.
24 changes: 24 additions & 0 deletions datalemur-solutions/1 - Easy/14-cards-issued-difference.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
-- Your team at JPMorgan Chase is preparing to launch a new credit card, and to gain some insights, you're analyzing how many credit cards were issued each month.

-- Write a query that outputs the name of each credit card and the difference in the number of issued cards between the month with the highest issuance cards and the lowest issuance. Arrange the results based on the largest disparity.


-- Solution 1 (solved)
SELECT card_name, MAX(issued_amount) - MIN(issued_amount) as difference
FROM monthly_cards_issued
GROUP BY card_name
ORDER BY 2 DESC




-- my initial approach:

SELECT DISTINCT(card_name)
FROM monthly_cards_issued
WHERE issued_amount = (
SELECT MAX(issued_amount) - MIN(issued_amount)
FROM monthly_cards_issued)


-- remarks: complicated the problem initially, but found the solution after some time.
32 changes: 32 additions & 0 deletions datalemur-solutions/1 - Easy/15-compressed-mean.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
-- You're trying to find the mean number of items per order on Alibaba, rounded to 1 decimal place using tables which includes information on the count of items in each order (item_count table) and the corresponding number of orders for each item count (order_occurrences table).


SELECT ROUND(SUM(item_count::decimal * order_occurrences) / SUM(order_occurrences), 1) as mean
FROM items_per_order



-- other approaches:

-- 1.
ROUND((SUM(order_occurrences * item_count) / SUM(order_occurrences))::numeric, 1) AS mean

-- 2.
ROUND(CAST(SUM(item_count*order_occurrences) / SUM(order_occurrences) AS DECIMAL),1) AS mean






-- my approach:

-- mean #items per order
-- round to 1

SELECT ROUND(SUM(item_count * order_occurrences) / SUM(order_occurrences), 1) as mean
FROM items_per_order



-- remarks : both item_count and order_occurrences are of integer type by default, which means that division will return an integer result. To ensure that the output is rounded to 1 decimal place, we can cast either column to a decimal type.
18 changes: 18 additions & 0 deletions datalemur-solutions/1 - Easy/16-pharmacy-analytics-1.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
-- CVS Health is trying to better understand its pharmacy sales, and how well different products are selling. Each drug can only be produced by one manufacturer.

-- Write a query to find the top 3 most profitable drugs sold, and how much profit they made. Assume that there are no ties in the profits. Display the result from the highest to the lowest total profit.

-- Definition:

-- - cogs stands for Cost of Goods Sold which is the direct cost associated with producing the drug.
-- - Total Profit = Total Sales - Cost of Goods Sold



SELECT drug, (total_sales - cogs) as total_profit
FROM pharmacy_sales
ORDER BY 2 DESC
LIMIT 3


-- remarks: pretty straightforward.
16 changes: 16 additions & 0 deletions datalemur-solutions/1 - Easy/17-pharmacy-analytics-2.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
-- CVS Health is analyzing its pharmacy sales data, and how well different products are selling in the market. Each drug is exclusively manufactured by a single manufacturer.

-- Write a query to identify the manufacturers associated with the drugs that resulted in losses for CVS Health and calculate the total amount of losses incurred.

-- Output the manufacturer's name, the number of drugs associated with losses, and the total losses in absolute value. Display the results sorted in descending order with the highest losses displayed at the top.



SELECT manufacturer, COUNT(drug) as drug_count, ABS(SUM(total_sales - cogs)) as total_loss
FROM pharmacy_sales
WHERE total_sales - cogs <= 0
GROUP BY 1
ORDER BY 2 DESC, 3 DESC


-- remarks: didn't do -> SUM(total_sales - cogs) and directly took absolute value.
27 changes: 27 additions & 0 deletions datalemur-solutions/1 - Easy/18-pharmacy-analytics-3.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
-- CVS Health wants to gain a clearer understanding of its pharmacy sales and the performance of various products.

-- Write a query to calculate the total drug sales for each manufacturer. Round the answer to the nearest million and report your results in descending order of total sales. In case of any duplicates, sort them alphabetically by the manufacturer name.

-- Since this data will be displayed on a dashboard viewed by business stakeholders, please format your results as follows: "$36 million".


-- Solution 1:


SELECT manufacturer, CONCAT('$', ROUND(SUM(total_sales) / 1000000) ,' million') AS sales_mil
FROM pharmacy_sales
GROUP BY 1
ORDER BY SUM(total_sales) DESC, 1




-- my approach:

SELECT manufacturer, CONCAT('$', ROUND(SUM(total_sales) / 1000000) ,' million') AS sales_mil
FROM pharmacy_sales
GROUP BY 1
ORDER BY 2 DESC, 1


-- remarks: was performing "ORDER BY 2" instead of "ORDER BY sum(total_sales)".