Skip to content

gabrielbcn/thalestest

Repository files navigation

A small Azure and AKS challenge to facilitate the discussion

Dear Candidate,

We want to have an open exchange with you to assess your level of resourcefulness and flexibility when it comes to DevOps and Cloud, and in order to facilitate that discussion we have thought of a small challenge for you...

The challenge will be to deploy a K8s cluster from scratch in Azure, using the AKS service and to adapt the deployment automation baseline to address some specific problem statements.

The files used are all in this repo which you can clone. Since we're giving you this document in advance it is fine if you get it done first and just comment it to us. If you prefer, we will give you time to do it and check back with you later.

This exercise is not about replicating the steps but about discussing the process, showing understanding and discussing possible ways of improving it.

In order to deploy the K8s cluster, you will be given an Azure account

Username: (to be provided by email)

Password: (to be provided by email)

And a service principal:

Client ID: (to be provided by email)

Secret: (to be provided by email)

It is suggested to use cloud shell, and, once in cloud shell, automation to deploy a cluster. But you can also other approaches as long as they are representative of a DevOps approach.

We're proposing the following ansible playbook:

# Description
# ===========
# This playbook first creates an Azure Kubernetes Service cluster of 2 nodes with provided username and ssh key and scales it to 3 nodes.
# Change variables below to customize your AKS deployment.
# It also requires a valid service principal to create AKS cluster, so fill:
# - client_id
# - client_secret
# This sample requires Ansible 2.6 

- name: Create Azure Kubernetes Service
  hosts: localhost
  connection: local
  vars:
    ssh_key: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDH14vPP1NE+qc3SgpoSPYMVaPLj8ouA1azujnMPbVKtN8mSg0KBWVWebx+GLqGI9nrKwmVwPHVzM8LhbbpYhfnN0UA2OV7sxBIshy7BshDTewiLyR9FXuNjuf8Cwd7WI+vgkKO0Yh6EdgwZ2hz+yVTn+5umCRDOg0ht0i2kyPMn1MHv61aN3noYp5zV8W3Tg8oJ2DPei6FxcNheRWkyvjan73/rISQJYWURlI90b3MCQOTXuI2w6BIxF3kHIHLUtzD7FqQhZdEkM/8cA9wUDgnS6+3ySkGOuQazkEWYn60Ll/597u2fLsq6P/pOCM1+KWLf2kY6mTl/dNxiIdV1d01
    resource_group: "{{ resource_group_name }}"
    location: eastus
    aks_name: myAKSCluster
    username: azureuser
    client_id: "{{ lookup('env', 'AZURE_CLIENT_ID') }}"
    client_secret: "{{ lookup('env', 'AZURE_CLIENT_SECRET') }}"
  tasks:
  - name: Create resource group
    azure_rm_resourcegroup:
      name: "{{ resource_group }}"
      location: "{{ location }}"
  - name: Create a managed Azure Container Services (AKS) cluster
    azure_rm_aks:
      name: "{{ aks_name }}"
      location: "{{ location }}"
      resource_group: "{{ resource_group }}"
      dns_prefix: "{{ aks_name }}"
      linux_profile:
        admin_username: "{{ username }}"
        ssh_key: "{{ ssh_key }}"
      service_principal:
        client_id: "{{ client_id }}"
        client_secret: "{{ client_secret }}"
      agent_pool_profiles:
        - name: default
          count: 2
          vm_size: Standard_D2_v2
      tags:
        Environment: Production
  - name: Scale created Azure Container Services (AKS) cluster
    azure_rm_aks:
      name: "{{ aks_name }}"
      location: "{{ location }}"
      resource_group: "{{ resource_group }}"
      dns_prefix: "{{ aks_name }}"
      linux_profile:
        admin_username: "{{ username }}"
        ssh_key: "{{ ssh_key }}"
      service_principal:
        client_id: "{{ client_id }}"
        client_secret: "{{ client_secret }}"
      agent_pool_profiles:
        - name: default
          count: 3
          vm_size: Standard_D2_v2

The Challenge:

In this challenge we want you use the baseline given above as a starting point.

We then want you to investigate and propose appropriate means for adapting the cluster deployment, using automation and abstraction of variable elements where possible, in order to solve the following set of "problem statements". Each of these problems can be addressed in various ways, however using automation and abstraction principles are valued higher than (for example) "hard-coded" solutions.

You are not limited to using specific solution areas or tools -- you are free to propose what you assess as most effective and flexible. Obviously, we would like you to present and explain your choices and achievements after the challenge.

Problem Statements:

We don't want this cluster to be created as it is in the script by default.

  • Our applications are memory intensive. Sometimes pod deployments are stuck due to no available nodes. We are also facing intermittent eviction of some of our pods. We would want to prevent this from happening, especially to pods providing critical services.

  • We are in Singapore, and we want to be sure to cater for sovereignity issues.

  • We are not sure what the RSA key is for.

  • We would also like to choose proper names for the cluster and the group of resources.

  • We are not happy with a cluster that has a fixed number of nodes: we would like the cluster to have a minimum of three nodes, but be able to grow.

  • We would like to build more resilience to our cluster and we're thinking of availability zones.

What are your recommendations? We would like some quick and easy solutions for a start. Include one or more of such solutions into your answer.

Getting Started:

In order to get the file mentioned, you can use:

git clone ...

You will be able to execute it as:

ansible-playbook myplaybook.yml

But first please remember to change it to our needs and later on to get credentials

az aks get-credentials --resource-group yourResourceGroup --name yourAKSname

And you should be able to see the nodes created. We leave this to you.

Once we have a Kubernetes cluster we need to give it some content. An example would be a service that would return some useful information. In this case we're giving you a sample app in Go that does a few things:

// HTTP server, coded in Go, that does three different things, all of them quite easy to guess...

package main

import (
	"fmt"
	"log"
	"math/rand"
	"net"
	"net/http"
	"os"
	"time"
)

const defaultAddr = ":8080"

// main starts an http server on the $PORT environment variable.
func main() {
	addr := defaultAddr
	// $PORT environment variable is provided in the Kubernetes deployment.
	if p := os.Getenv("PORT"); p != "" {
		addr = ":" + p
	}

	log.Printf("server starting to listen on %s", addr)
	log.Printf("http://localhost%s", addr)
	log.Printf("http://localhost%s/test", addr)
	log.Printf("http://localhost%s/ip", addr)
	http.HandleFunc("/", home)
	http.HandleFunc("/ip/", getip)
	http.HandleFunc("/test/", test)
	if err := http.ListenAndServe(addr, nil); err != nil {
		log.Fatalf("server listen error: %+v", err)
	}
}

// home logs the received request and returns a simple response.
func home(w http.ResponseWriter, r *http.Request) {
	log.Printf("received request main: %s %s", r.Method, r.URL.Path)
	rand.Seed(time.Now().UnixNano())
	answers := []string{
		"It is certain",
		"It is decidedly so",
		"Without a doubt",
		"Yes definitely",
		"You may rely on it",
		"As I see it yes",
		"Most likely",
		"Outlook good",
		"Yes",
		"Signs point to yes",
		"Reply hazy try again",
		"Ask again later",
		"Better not tell you now",
		"Cannot predict now",
		"Concentrate and ask again",
		"Don't count on it",
		"My reply is no",
		"My sources say no",
		"Outlook not so good",
		"Very doubtful",
	}
	fmt.Fprintf(w, "Magic 8-Ball says:", answers[rand.Intn(len(answers))])
}

func getip(w http.ResponseWriter, r *http.Request) {
	addrs, err := net.InterfaceAddrs()
	if err != nil {
		log.Printf("Oops: " + err.Error() + "\n")
	}

	for _, a := range addrs {
		if ipnet, ok := a.(*net.IPNet); ok && !ipnet.IP.IsLoopback() {
			if ipnet.IP.To4() != nil {
				fmt.Fprintf(w, ipnet.IP.String()+"\n")
			}
		}
	}
}

func test(w http.ResponseWriter, r *http.Request) {
	log.Printf("received request test: %s %s", r.Method, r.URL.Path)
	fmt.Fprintf(w, "The test page")
}

You should create a Dockerfile that gets a container made from this code, here you have a sample that works:

FROM golang:latest
WORKDIR /app
COPY ./ /app
RUN go run thalestest.go

Time to build the container:

docker build . -t thalestest

Which will start executing instantaneously from cloud shell. You will be able to interact with it with the proper controls. Image should be tagged as thalestest, make sure it has been done.

Now create a deployment, an example of yaml follows. But take into account that you need to deploy the container we have created, not the sample given below.

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: frontend
  labels:
    app: guestbook
spec:
  selector:
    matchLabels:
      app: guestbook
      tier: frontend
  replicas: 3
  template:
    metadata:
      labels:
        app: guestbook
        tier: frontend
    spec:
      containers:
      - name: php-redis
        image: gcr.io/google-samples/gb-frontend:v4
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        env:
        - name: GET_HOSTS_FROM
          value: dns
          # Using `GET_HOSTS_FROM=dns` requires your cluster to
          # provide a dns service. As of Kubernetes 1.3, DNS is a built-in
          # service launched automatically. However, if the cluster you are using
          # does not have a built-in DNS service, you can instead
          # access an environment variable to find the master
          # service's host. To do so, comment out the 'value: dns' line above, and
          # uncomment the line below:
          # value: env
        ports:
        - containerPort: 80

kubectl apply -f ./my-manifest.yaml

Now the deployment should be made accessible to the internet. The following sample exposes the deployment as a nodeport, but we'd rather use a load balancer. Comment on the available options and whether you will have to create a load balancer manually. If it's an entity creating it (recommended), comment on the identity of the entity.

apiVersion: v1
kind: Service
metadata:
  name: frontend
  labels:
    app: guestbook
    tier: frontend
spec:
  # comment or delete the following line if you want to use a LoadBalancer
  type: NodePort 
  # if your cluster supports it, uncomment the following to automatically create
  # an external load-balanced IP for the frontend service.
  # type: LoadBalancer
  ports:
  - port: 80
  selector:
    app: guestbook
    tier: frontend

Now the service should be accessible through the internet. We should be able to see the IP inside the pods using the given go code. It is interesting to comment about the IP address we are getting, whether it is always the same and why.

A few hints you may, or may not, need:

If you have a problem: check that the ports are right. Map the ports accordingly.

To upload the container in the kubernetes cluster you can choose to store it in dockerhub or any other way you see fit.

Entry points are important.

Singapore is in South East Asia, also known as "southeastasia"

You can also fork this repo, make all the fine tuning, and show us your own version of the exercise.

If you want to do something different to make this challenge work, go ahead, the more DevOps the better.

Release notes

Exercise tested working, with modified files, on Azure, 12th June 2020.

Testing this exercise on 12th June 2020, an issue was found for the specific case in which you run brew-installed ansible in MacOs, which uses python 3.8, and use the azure_rm_aks controller:

  • Assuming you have all the python dependent packages installed as per the requirement here
  • the azure_rm_aks controller is using a function, time.clock() that was removed in python 3.8. (no backwards compatibility from 3.7 to 3.8 python, can you believe it? kindly use a superior language such as java or go instead -okay kotlin too-)
  • can be sorted out by editing /usr/local/lib/python3.8/site-packages/azure/cli/core/_session.py line 37 and replacing time.clock() with time.process_time()
  • additionally you may need to change a line in the ansible azure module to avoid an error as described in Azure/azure_preview_modules#338 ( look for the function def create_aks_dict(aks), comment out line 304 # addon=create_addon_dict(aks.addon_profiles), of /usr/local/lib/python3.8/site-packages/ansible/modules/cloud/azure/azure_rm_aks.py). In principle the error doesn't prevent the cluster from being created but doesn't allow for a neat exit.
  • the problem only happens when you run a playbook in a Mac with this specific controller, if you run it in azure cloud shell, which is the recommendation above, you find all the right versions and dependencies installed

About

A small DevOps challenge

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published