Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Develop #18

Merged
merged 52 commits into from
May 4, 2021
Merged
Show file tree
Hide file tree
Changes from 51 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
1f40705
Add context and system analysis
velcrin May 4, 2021
81d5796
Update context and system analysis
velcrin May 4, 2021
bbc7c79
adds architecture diagrams
stitakis May 4, 2021
3a11ca5
Add principals and ticket workflow alteration to the solution overview
velcrin May 4, 2021
3c9510c
ADRs
cschweikert May 4, 2021
978beb6
link ticket workflow
cschweikert May 4, 2021
025b729
Merge pull request #2 from stitakis/architecture-images
stitakis May 4, 2021
bab53af
Solution overview introduction
SHoen May 4, 2021
33814c7
Merge branch 'develop' of https://github.com/stitakis/ArchElekt into …
SHoen May 4, 2021
76adf6f
Update 001 Allow experts to maintain their own skills.md
stitakis May 4, 2021
0d43883
Update 007 Move login-relevant data and functionality into separate a…
stitakis May 4, 2021
b4da601
Update 009 Introduce component responsible for the whole ticket life-…
stitakis May 4, 2021
bb3b7b7
Initial solution background: TOC and ADRs (#3)
cschweikert May 4, 2021
3d4c49c
add diagrams
SHoen May 4, 2021
4b3b565
adds ADR split website
stitakis May 4, 2021
6c27a0b
Merge branch 'develop' of https://github.com/stitakis/ArchElekt into …
cschweikert May 4, 2021
cf4cd7a
intro page (#5)
cschweikert May 4, 2021
15f672a
apply fixes from last PR
cschweikert May 4, 2021
f15819b
cleanup
cschweikert May 4, 2021
abd1482
Add ADR 012
velcrin May 4, 2021
d69dec8
cleanup
cschweikert May 4, 2021
1fd8936
Merge branch 'develop' of https://github.com/stitakis/ArchElekt into …
SHoen May 4, 2021
f73f9d6
added context
SHoen May 4, 2021
6a06343
fix broken links
SHoen May 4, 2021
906c2ec
Merge pull request #1 from stitakis/updates-on-problem-background
velcrin May 4, 2021
9e66c7a
Merge pull request #8 from stitakis/initial-solution-background
velcrin May 4, 2021
ceda3ed
Merge pull request #6 from stitakis/adr-split-website
velcrin May 4, 2021
b6e6dce
Merge pull request #7 from stitakis/add-adr-012
velcrin May 4, 2021
fb8cb4b
Tradeoffs (#9)
stitakis May 4, 2021
0ad6cef
fixes typos and removes dead link in Solution Background README.md
stitakis May 4, 2021
6a2367f
Add System Overview (#10)
velcrin May 4, 2021
c287ec3
Add Readme to ADRs
velcrin May 4, 2021
5d5e870
Add Solution Background Discussion (#11)
velcrin May 4, 2021
2f0ef30
Merge pull request #12 from stitakis/add-readme-to-adrs
cschweikert May 4, 2021
5bbd4ed
implement recommended changes of review velcrin
SHoen May 4, 2021
b94226f
remove wrong picture link
cschweikert May 4, 2021
5e0f47f
fix TOC links
cschweikert May 4, 2021
3309914
Merge branch 'solutionoverview' of https://github.com/stitakis/ArchEl…
SHoen May 4, 2021
6c801ec
Merge branch 'develop' of https://github.com/stitakis/ArchElekt into …
SHoen May 4, 2021
5216342
Merge pull request #4 from stitakis/solutionoverview
SHoen May 4, 2021
7f0143d
fix link
cschweikert May 4, 2021
50d15e4
add missing entry
cschweikert May 4, 2021
66ee9b4
add and fix links
cschweikert May 4, 2021
28397d7
fix link
cschweikert May 4, 2021
5010816
fix relative links
cschweikert May 4, 2021
5c6aca6
Fix typos (#14)
velcrin May 4, 2021
7f97616
fix intro
cschweikert May 4, 2021
ef2b9a0
Fix typos (#13)
velcrin May 4, 2021
36b6b9a
Update tradeoffs (#15)
velcrin May 4, 2021
97fbd90
Add assumptions (#16)
velcrin May 4, 2021
b058c6d
Solutionoverview (#17)
SHoen May 4, 2021
d00dba7
Update Solution Background/Solution Overview.md
velcrin May 4, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions Problem Background/Context.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Sysops Squad
Penultimate Electronicsis a large electronics giant that has numerous retail stores throughout the country. When customers buy computers, TV’s, stereos, and other electronic equipment, they can choose to purchase a support plan. Customer-facing technology experts (the “Sysops Squad”) will then come to the customers residence (or work office) to fix problems with the electronic device.

## A Bad Situation...
Things have not been good with the Sysops Squad lately. The current trouble ticket system is a large monolithic application that was developed many years ago. Customers are complaining that consultants are never showing up due to lost tickets, and often times the wrong consultant shows up to fix something they know nothing about. Customers and call-center staff have been complaining that the system is not always available for web-based or call-based problem ticket entry. Change is difficult and risky in this large monolith -whenever a change is made, it takes too long and something else usually breaks. Due to reliability issues, the monolithic system frequently “freezes up” or crashes -they think it’s mostly due a spike in usage and the number of customers using the system. If something isn’t done soon, PenultimateElectronics will be forced to abandon this very lucrative business line and fire all of the experts (including you, the architect).

Current process in the monolithic system:

Sysops squad experts are added and maintained in the system through an administrator, who enters in their locale, availability, and skills.
Customers who have purchased the support plan can enter a problem ticket using the sysops squad website. Customer registration for the support service is part of the system. The system bills the customer on an annual basis when their support period ends by charging their registered credit card.

Once a trouble ticket is entered in the system, the system then determines which sysops squad expert would be the best fit for the job based on skills, current location, service area, and availability (free or currently on a job).
The sysops squad expert is then notified via a text message that they have a new ticket. Once this happens an email or SMS text message is sent to the customer (based on their profile preference) that the expert is on their way.
The sysops squad expert then uses a custom mobile application on their phone to access the ticketing system to retrieve the ticket information and location. The sysops squad expert can also access a knowledge base through the mobile app to find out what things have been done in the past to fix the problem.
Once the sysops squad expert fixes the problem, they mark the ticket as “complete”. The sysops squad expert can then add information about the problem and fix to the knowledge base.
After the system receives notification that the ticket is complete, the system send an email to the customer with a link to a survey which the customer then fills out
8 changes: 8 additions & 0 deletions Problem Background/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Problem Background

- [Context](Context.md)
- Definition of the context of the system and the problem
- [System Analysis](System%20Analysis.md)
- Analysis of actors, requirments and system characteristics
- [System Overview](System%20Overview.md)
- Description of the current system and discussions of the concepts
42 changes: 42 additions & 0 deletions Problem Background/System Analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# System Analysis

| Actors | Actions |
| ------- | ------- |
| Admin | <ul><li>add and maintain experts (locale, availability and skills)</li><li>maintain the internal users</li><li>manages all of the billing processing for customers</li><li>maintains static reference data (support products, name-value pairs in the systems, etc...)</li></ul> |
| Customer | <ul><li>register for the Sysops Squad service</li><li>view billing history and statements</li><li>view support plans based on products purchased</li><li>maintain profile (including credit card and billing information)</li><li>enter a problem ticket via website</li><li>complete a survey</li></ul> |
| Experts | <ul><li>retrieve ticket information and location from custom mobile app</li><li>browse a knowledge base to find issue history</li><li>mark ticket as complete</li><li>add fix information to the issue/knowledge base</li></ul> |
| Manager | <ul><li>request reports (financial, expert perf, ticketing)</li><li>keep track of problem ticket operations</li></ul> |
| System | <ul><li>send reports to manager (financial, expert perf, ticketing)</li><li>bill customer monthly</li><li>determines which expert would be the best fit for the job (skills, location, service area, availability)</li><li>notify expert by SMS</li><li>notify customer by Email or SMS (based on profile pref) ticket has been assigned</li><li>notify customer by Email or SMS the ticket is complete</li><li>send survey to customer to fill out</li></ul> |

## Ticket Workflow
![Ticket workflow](./resources/ticket-workflow.png?raw=true)

## Identified Problems
1. Wrong expert shows up to appointment
- Expert skills are outdated in the system due to the manager maintaining them.
2. Tickets lost
- Ticket disappeared from db (data loss). Could happen during the workflow while moving data from a component to another.
- No expert with the right skills is available (e.g. holidays).
- SMS to the expert is lost. SMS is a one way communication channel. Acknowledgment feature exist in the protocol although that is unreliable.
- Push of the ticket to the expert mobile app seems an unreliable solution and prone to errors.
3. System not always available for web-based (or call-based problem) ticket entry
- System crashed due after a change of the development team due to the monolithic architecture of the system.
- Performance issue of the system leading to slow down and timeouts.
4. System freezes up or crashes due reliability issues (maybe due a usage spike)
- Usage spike seems unlikely as the system is supposed to serve customer of the size of a mid-sized city although better monitoring and observability of the system could help identifying bottle necks.
- Reporting feature could be the cause of the issue as it might put pressure on the system while customer are trying to add tickets.
5. Changes are difficult and risky, takes too long, something else usually breaks
- The monolithic architecture is most likely the cause of this as it might have lead to strong coupling. Decomposition and isolation could help.
- Responsibilities might have been intertwined within component making difficult to update functionalities without regressions.

## Architecture Charateristics

Following the identification of the current system problems, the following architecture charateristics emerge:

![Architecture Charateristics](./resources/architecture-characteristics.png?raw=true)

- Reliability
- Availibility
- Elasticity
- Observability
- Maintenability
28 changes: 28 additions & 0 deletions Problem Background/System Overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# System Overview

## Architecture Components

![Architecture Components](./resources/architecture-components.png)

### Responsabilities

![Components Responsabilities 1](./resources/existing-components-1.png)
![Components Responsabilities 2](./resources/existing-components-2.png)

### Discussion

The current system clearly implements a monolith style of architecture. Also, it is interesting to note that the architecture is not actually partitioned in technical layers but rather in domain. Even though monolithic architecture probably helped in getting the system where it is now, it seems at a stage where there is a need for greater decomposition and complexity outweigh the easiness of building, testing or deploying a monolith system. It seems clear that some problems mentioned such as the coupling of components making it hard to change, or the performance issue are due to the monolithic nature of the system.


## Database Tables

![Database Tables](./resources/existing-tables-1.png)

### Responsabilities

![Tables Responsabilities 1](./resources/existing-tables-2.png)
![Tables Responsabilities 2](./resources/existing-tables-3.png)

### Discussion

The database architecture does not really share the decomposition of the existing components. Many dependencies emerge which might make the splitting if needed difficult. Also, from a security point view, it might be dangerous to mix payments with other application tables responsibilities. This would need better partitioning although it might be challenging.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Problem Background/resources/context.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Problem Background/resources/ticket-workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 26 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,26 @@
# ArchElekt
Architectural Katas 2021 Fall - Group ArchElekt
# Architectural Katas 2021 Spring - Group ArchElekt

Welcome everybody to our first participation in the Architectural Katas 😄

## Members
- Vincent Elcrin
- Steffen Hoening
- Christian Schweikert
- Sebastian Titakis

## Resources
- [Architectural Katas](https://learning.oreilly.com/attend/architectural-katas/0636920054100/0636920054099)

# Solution Structure

Table of content:
- [Problem Background](Problem%20Background/README.md)
- [Context](Problem%20Background/Context.md)
- [System Analysis](Problem%20Background/System%20Analysis.md)
- [System Overview](Problem%20Background/System%20Overview.md)
- [Solution Background](Solution%20Background/README.md)
- [Assumptions](Solution%20Background/Assumptions.md)
- [Solution Overview](Solution%20Background/Solution%20Overview.md)
- [Trade-offs](Solution%20Background/Tradeoffs.md)
- [Achitecture Decision Records](Solution%20Background/ADRs/README.md)
- [Discussion](Solution%20Background/Discussion.md)
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# 1. Allow experts to maintain their own skills

Date: 2021-04-28

## Status

Accepted

## Context

Customers reported that sometimes wrong expert shows up to the appointment to fix something they know nothing about. Sometimes no expert shows up at all. We assume that one of the reasons for this might be the expert does not have the right skill set because those are maintained by the manager and might be outdated.

## Decision

To improve the situation, experts should be able to maintain their own skills. This will help to:
- relief the administrator from this activity
- help the system to match tickets with better accuracy to available skillsets
- help the system to find an expert and therefore reduce the perception of lost tickets

A possible solution is a new section in the expert's mobile app for editing/maintaining their own profile including skills.

## Consequences

Allowing experts to maintain their own skills could lead to a misuse of this function. Some experts could introduce more skills than they actually have to gather more jobs. While the former situation limits the skillset entry to the administrator, who acts as a filter, it also builds a bottleneck and doesn't guarantee the accuracy of the experts skillset introduced by the administrator.

We believe that by allowing experts to maintain their own skillset will outweigh these risks and will help overall to improve the situation.
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# 2. API layer as single point of contact for all user interfaces

Date: 2021-04-28

## Status

Accepted

## Context

Usage spikes might lead to unavailability to interact with the user interface, system freezes and the perception of ticket lost.

## Decision

Add an API layer as single point of contact of Mobile App and Website to allow decomposition of the monolith and provide services at runtime.

## Consequences

Having an additional layer will add more complexity. On the other hand a unifing API layer might simplyfy attaching roles to certain endpoints and therby improve security. Having an API layer builds an abstraction layer towards the downstream services which will increases flexibility and allow decomposition of responsibilities over time.
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# 3. Segregate ticket creation into a separate container

Date: 2021-04-29

## Status

Accepted

## Context

Usage spikes of the system might lead to not beeing able to enter tickets, system freezes and the perception of ticket lost.

## Decision

Segregate ticket creation into a separate service. Create a persisting ticket queue to decouple the ticket creation flow from the DB.

## Consequences

Storing a new ticket is dependant on the monolith anymore. It is a rather atomic operation on an easy to scale service. This separation will add slight complexity but also more availability and reliability of the action of ticket creation.
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# 4. Segregate reporting into a separate container

Date: 2021-04-29

## Status

Accepted

## Context

The pressure that the reporting operations might apply to the system could also be the cause of the unavailability and unreliability of the system.

## Decision

Segregate reporting into a separate service.

## Consequences

More complexity, but more availability and reliability. Allows to scale resources on demand depending on usage spikes (scalability elasticity) or alternativly wait longer for results.
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# 5. Expert needs to actively accept or reject an assigned ticket

Date: 2021-04-29

## Status

Accepted

## Context

The system has not assigned the expert with the right expertise (maybe skills not being up to date), the expert does not have time (vacation for example) or is missing the notification message. This leads to the wrong expert or no expert showing up, which again leads to the perception of ticket loss.

## Decision

To mitigate the perception of ticket lost

- the expert's mobile app pulls assigned ticket in addition to the ticket assignment engine pushing the tickets to the expert mobile app
- expert needs to open the ticket and actively accept or reject it
- possible reactions on rejected tickets or tickets that have not been accepted after a defined amount of time: system can try to reassign the ticket to another expert or notify a manager to take action on this ticket

## Consequences

Tickets can not get lost due to missing reaction from an expert. There is a well-defined process for keeping a ticket "alive" including ecalation, if assignment doesn't work in time. This increases the complexity of the assignment process.
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# 6. Handle cases for system not able to assign ticket to an expert

Date: 2021-04-29

## Status

Accepted

## Context

If system doesn't find an expert with matching skillset, the right location, etc., it is not able to assign a ticket → ticket stays in the system forever → perception of ticket loss

## Decision

System notifies a manager that and also why the system couldn't assign the ticket. The manager could then take action on this ticket.

## Consequences

Closing potential gaps in a ticket's lifecycle and therby increasing the percieved reliability of the system.
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# 7. Move login-relevant data and functionality into separate authentication service

Date: 2021-04-30

## Status

Accepted

## Context

Usernames and passwords are in the same database as the rest which increases the chance of passwords being stolen.

## Decision

Move usernames and passwords into separate database (e.g. with higher security measures) and also create a separate authentication service for providing the login functionality. Alternatively use an external authentication provider.

## Consequences

Increases security, reliability and availability of login, but also increase of complexity (usernames and customers in database need to be kept in sync). Using external authentication providers might bring the benefit of supporting additional login possibilities, e.g. social login, etc.
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# 8. Move payment-relevant data into separate database

Date: 2021-04-30

## Status

Proposed

## Context

Payment information is in the same database as the rest which increases the chance of credit card information being stolen.

## Decision

Move payment-related information into separate database and restrict access to this db only to the billing functionality.

## Consequences

Increases security
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# 9. Introduce component responsible for the whole ticket life-cycle

Date: 2021-04-30

## Status

Accepted

## Context

Ticket loss, difficult to make changes to the system. Assumption: there is no central entity driving the ticket life-cycle. The system rather reacts on events with direct actions.

## Decision

We want to introduce a ticket life-cycle component that represents and also drives the whole life-cycle of a ticket. This component will be responsible for pulling tickets from the persistent ticket queue until exhausted or new ticket are added as mentioned in the [ADR 3](./003%20Segregate%20ticket%20creation%20into%20a%20separate%20container.md) and orchestrating the life-cycle of the ticket:
- assignment to expert
- awaiting acceptance or rejection of the expert assignation
- retries of assignments
- timeouts when not responding to assigned tickets
- escalation to manger
- triggering notifications
- requesting for survey

## Consequences

- Ticket life-cycle component is the single source of truth and manages all phases of the ticket workflow. By isolating and making this explicit we hope to lower complexity and increase maintainability.
- Allows an easy way of implementing more complex processes like the escalation, if ticket assignment was rejected by expert (see [ADR 5](./005%20Expert%20needs%20to%20actively%20accept%20or%20reject%20an%20assigned%20ticket.md)).
- Improves elasticity by consuming the persistent ticket queue.
- Parallelisation might introduce issues of tickets assigned to multiple experts.
Loading