Sup, Hi, Hello, Good (evening/morning/afternoon). I created this repository in purpose to study the LaTeX basis, due to the Microsoft Word's tools are not enough for me, and perhaps it will be helpful in my future education. The task has been presented by AAA Media Corporation Rambler Group. I'd like to use this repository as a representation of me (I mean how do I work) for the corporations and other people. Soooo, let's plunge into the tasks:)
This task checks whether how well you know the SQL, or how well you use Google, or both.
Rambler Group's advertising campaign uses most fascinating and most memorable banners. Analytics have access to the databases containing data regard the banner' showing. The Shows_table contains:
- show_id - an identifier of a showing
- day - a day of a showig
show_id | day |
---|---|
12367 | 2018-10-04 |
28736 | 2019-02-22 |
19862 | 2019-01-31 |
The Click_table contains:
- click_id - a show identifier clicked by an user
- bounce - an user dismissing from an advertising after click (0 - when an user relinked to the site he keened in the information on the site. 1 - an user immediatly left the site.)
show_id | bounce |
---|---|
12367 | 1 |
28735 | 0 |
15627 | 0 |
You need to get all users who clicked at a banner in February 2020, and they din't reject an advertising.
- First of all, I'd like to obtain all users, I mean the whole database of users, without any filters. But I banned by the task, because it is clearly said that we need to fetch the "users", however I do not really aware what exactly is the "users". Ensuing of this I decided to derive the show_id (it also can be the click_id, result will be the same).
SELECT show_id FROM Shows_table
- The next scanty modification is to retrieve unique show_id. For instance, if an user clicked twice on the same banner, it shouldn't be represented twice, therefore distinct it.
SELECT DISTINCT show_id FROM Shows_table
- The next step is to detect those who clicked and remained on the advertising site. The INER JOIN the most appropriating command for this purpose.
The INNER JOIN keyword selects records that have matching values in both tables
SELECT DISTINCT show_id FROM Shows_table
INNER JOIN Clicks_table ON
show_id = click_id AND bounce ='0'
Let's deem this query on the example
|
|
In the above case the banners which ids are 15627 and 28735 are those banners that an user remained by clicking on. So, this request returns such list:
SELECT DISTINCT show_id FROM Shows_table
12367 |
---|
28736 |
15627 |
If be honest this one does the same as the previous one
SELECT DISTINCT show_id FROM Shows_table
INNER JOIN Clicks_table ON
show_id = click_id
And the last thing in this subsection is to filter the case when an user remains on the advertising site. According the task "bounce - an user dismissing from an advertising after click (0 - when an user relinked to the site he keened in the information on the site. 1 - an user immediatly left the site.)" This request perfectly does this, and returns such id's list.
28736 |
---|
15627 |
- The last what we should to do is to distill by date. The task says "You need to get all users who clicked at a banner in February 2020, and they din't reject an advertising.", it does pretty simple in SQL, by adding up this to the request.
... day BETWEEN '2020-02-01' AND '2020-02-29'
- The result looks like this:
SELECT DISTINCT show_id FROM Shows_table
INNER JOIN Clicks_table ON
show_id = click_id AND bounce ='0' AND day BETWEEN '2020-02-01' AND '2020-02-29'
In our example this query has to return only the 15627, because this is the only click_id within February 2020.
In purpose to test my solution I was using the SQLFiddle and PostgreSQL 9.6.
- I created the test-tables, using DLL, by these queries:
CREATE TABLE Shows_table (
show_id INT ,
day DATE
);
CREATE TABLE Clicks_table (
click_id INT ,
bounce BOOLEAN
);
- Fill them (tables) up, by using these requests
INSERT INTO Shows_table ( show_id, day) VALUES (12367, '2018-10-04');
INSERT INTO Shows_table ( show_id, day) VALUES (28736, '2019-02-22');
INSERT INTO Shows_table ( show_id, day) VALUES (19862, '2019-01-31');
INSERT INTO Shows_table ( show_id, day) VALUES (11111, '2020-02-04');
INSERT INTO Shows_table ( show_id, day) VALUES (22222, '2020-02-01');
INSERT INTO Shows_table ( show_id, day) VALUES (33333, '2020-02-29');
INSERT INTO Clicks_table ( click_id, bounce ) VALUES (12367, '1');
INSERT INTO Clicks_table ( click_id, bounce ) VALUES (28736, '0');
INSERT INTO Clicks_table ( click_id, bounce ) VALUES (19862, '0');
INSERT INTO Clicks_table ( click_id, bounce ) VALUES (11111, '1');
INSERT INTO Clicks_table ( click_id, bounce ) VALUES (22222, '0');
INSERT INTO Clicks_table ( click_id, bounce ) VALUES (33333, '0')
And now the tables looks like these:
|
|
- For this example the query (my solution) returns the 22222 and 33333.
- 12367 - isn't suitable, because an user left the site and it was in October 2018
- 28736 - isn't suitable, because it was in February 2019.
- 19862 - isn't suitable, because it was in January 2019.
- 11111 - isn't suitable, because an user left the site.
The answer for the First task
SELECT DISTINCT show_id FROM Shows_table
INNER JOIN Clicks_table ON
show_id = click_id AND bounce ='0' AND day BETWEEN '2020-02-01' AND '2020-02-29'
This task checks your ability to use math.
The friendly Rambler Group's community likes to play in the table football: At the odd days they play before lunch, at the even days the play after lunch. They are splitting at the N teams among each other, and every team plays with each another team. Because of the splitting onto the teams is randomly, the product of the games is random. Also I would note that there are no ties. Only win or lose.
- Estimate the probability if one of the teams will finish the tournament without defeat.
- How many times do you need to hold a tournament, so that with a probability of 98% at least once this happened?
What we have: |
---|
N - number of teams P (win) = 1/2 P (lose) = 1/2 |
Necessary to seek: |
---|
P (if one of the teams will finish the tournament without defeat) - ? |
There are two ways to solve it, the First one is the combinatorics and the second is the probability theory. I will show the each one.
-
First of all we need to estimate how many games N teams plays. Because of each team plays with every another team, therefore the number of games is the number of combination N by 2. ("by 2" because two teams participate in a game.)
I'd like to recall that the factorial of a positive integer n, denoted by n!, is the product of all positive integers less than or equal to n. For instance, the factorial of N is 1 × 2 × 3 × ... × (N − 2) × (N − 1) × N.
-
Each team either wins nor loses, therefore there are two outcomes. Ensuing of this the amount of all possible outcomes in whole tournament is
-
Let's assume that the team A is won each game in the tournament. It means that from the (N - 1) games the team A is won (N - 1) games. Moreover, it means that the quantity of uncertain game's outcomes declines on (N - 1). Therefore the overall number of outcomes, with the condition that team A wins all games is:
-
However, nobody knows which exactly team wins all games. It's not necessary that team A wins, as same as A it could be either team B, nor team C, nor team D and so on. If we will deem each case (for each team), then the overall amount of outcomes increases by N (number of teams) times. For N teams it (number of all possible outcomes, where one of the teams wins) equals to:
-
Recall that probability definition:
The probability of an event is a number indicating how likely that event will occur. And the probability of an event is: where m - the number of demanded outcomes and N - the number of all possible outcomes.
Therefore to obtain the probability of a team will finish the tournament without defeat we need to divide the number of outcomes where one of the teams is won whole tournament by number of all possible outcomes
-
Here it is, the answer. It means that the probability of a team wins the whole tournament without lose is
If be honest this approach much easier than previous one.
-
Let's assume that the team A wins each game in the tournament, and the probability of this event is - the probability of an outcome (whether lose nor victory) of one game. And (N − 1) the number of games which plays the team A.
-
However the probability of that the one of the teams wins the tournament is
-
So, the probability that one of the teams wins the tournament without defeat is
How many times do you need to hold a tournament, so that with a probability of 98% at least once this happened (one of the teams wins the tournament without defeat)?
-
Let x be the quantity of tournaments required to befell to approximate the probability of the event (victory without defeat) to 98%.
-
Let's explore the equation above. In the previous part we've figured out the probability that one of the teams wins the whole tournament without defeat is or . Of course the probability of that this event never happens is (100% - probability of happens at least once). Therefore, in our case it looks like , in other words this equation shows us the probability that there is no a team which wins a tournament without defeat. But this probability represents the only tournament, however we have x tournaments, and in the each one there is shouldn't be a team which wins at least 1 tournament without defeat.
And the last thing, when we subtracting from 100% the probability of the event when there is no a team which wins at least one tournament without defeat. This shows us the probability when a team which wins at least one tournament exsists.
-
And according the task's condition this probability equal to 98% or 0.98
-
And use log-power rule
-
Derive the x
Here is it! The number of required tournaments is Of course it depends on number of teams, for instance for two teams the only tournament enough.
The answer for the Problem 1