This exercise consists of developing a simple tool to compute the continuity of work for the workers on the platform.
In worker_activity.csv
(please use the file provided in this repository), you will find data structured as below (here ordered by Worker for convenience):
Worker, Employer, Role, Date
1435, 234, 86, 2021-11-01 17:00:00
1435, 234, 86, 2021-11-04 12:30:00
1435, 234, 86, 2021-11-08 07:00:00
135, 45, 696, 2021-11-25 18:00:00
135, 45, 95, 2021-11-27 18:00:00
135, 45, 95, 2021-11-29 22:15:00
456, 78, 576, 2021-11-02 05:00:00
456, 78, 576, 2021-11-27 14:30:00
We want to generate a report describing the continuity of work for each user as of 2021-12-01. This date is important when considering the first requirement (no activity for 6 days), as this should be generated as if we're running the report on 1st December 2021 (as opposed to today).
We increment continuity for each day worked, and the counter is reset when one of the rules below applies:
- A worker had no activity for more than 6 days.
- A worker stayed active but switched to a different employer.
- A worker stayed active but switched to a different role.
In the dataset above, this gives:
- Worker
1435
worked three different days between November 1st and November 8th, and never switched role nor employer, but the report is at 1st December so more than 6 days have passed. So continuity=0 - Worker
135
appears three times, but switched role after the first occurrence; thus we only count the last two positions, therefore continuity=2. - Worker
456
worked a first time on 2nd November 2021, then paused for more than 6 days, then worked again at the end of November 2021; so continuity=1.
We expect a results.csv
file following this format, ordered by Continuity
(descending order).
The field Continuity
is expressed in days.
Worker,Continuity
1435,3
135,2
456,1
We don't expect your solution to use sophisticated paradigms or frameworks. However, you can spend as much time you want to demonstrate your skills. You could, for example, implement some tests to ensure your output is correct.
As part of your submission, please answer the following questions in a readme.txt
file.
- What is the complexity of your algorithm?
- We provided a CSV file for convenience, but in a production environment, what format or database would you recommend instead for this use-case (and why)?
- Is there anything else you would implement differently for a large-scale application in production?
Please write your application in either Python, Java, Scala or Kotlin, with a runnable shell script named run-app.sh
to launch your application.
Please put your results in result.csv
and submit all the files in a zip file named as follows: indeedflex-dataeng-<firstname>-<lastname>.zip
.
It's recommended to spend approximately 1 to 2 hours on this challenge. However, you are free to elaborate as much as you want to show us your ability to deliver great software!