Table of Contents generated with DocToc
- The Design of Crossplane Provider for Ansible
- Overview
- How It Works
- Supported Sources
- Requirements Declaration
- Supported Ansible Contents
- Passing Variables
- AnsibleRun Lifecycle
- Comparing with Ansible Operator
- Appendix: Feature List
Crossplane is often used to assemble infrastructure from multiple vendors declaratively in Kubernetes which is a typical cloud native architecture. However, this may not be true for many real-world organizations who have many IT systems that are traditional based and have invested heavily on management automation for these systems.
Ansible, as a popular automation technology, has a large user base and mature eco-system. It is widely adopted by many organizations to automate management for different varieties of IT systems, ranging from cloud native to non cloud native.
The Crossplane provider for Ansible (or Ansible provider for short) is targeted to extend the Crossplane scope by enabling its integration with Ansible to build a bridge between the cloud native and non cloud native worlds, open the door to drive and reuse existing automation assets, no matter it is cloud native or non cloud native, to manage hybrid technologies using the same way consistently and help organizations transition to cloud native while keep their existing investments.
The high level architecture for Ansible provider is illustrated as below.
Ansible provider uses Kubernetes resource AnsibleRun
to represent a certain Ansible run, e.g.: to run a set of Ansible roles or playbooks. The AnsibleRun
resource also refers to a ProviderConfig
resource including all information at provider level that are needed to support Ansible runs, e.g.: credentials needed to access a remote place that hosts Ansible contents, requirements needed to install as prerequisite before run Ansible contents. Regarding the Ansible contents, it can be collections, roles, or playbooks that are hosted in Ansible Galaxy, public or private Automation Hub, or GitHub repository.
As the managed resource of Ansible provider, once an AnsibleRun
resource is defined, the provider will download the Ansible contents specified in that resource into the provider working directory via the method Connect()
in AnsibleRun controller, then trigger the Ansible run according to the provider lifecycle via below methods defined in the controller:
Method | Description |
---|---|
Observe() |
Detect if it needs to perform create, update, or delete by comparing the actual state in target system and the desired state defined in AnsibleRun resource. |
Create() |
Create something on target system by running Ansible contents. |
Update() |
Update something on target system by running Ansible contents. |
Delete() |
Delete something from target system by running Ansible contents. |
Inside the provider, it actually relies on the command line tool ansible-galaxy
to download Ansible contents, then use ansible-runner
to execute them. The module ansible-runner-http
is a plugin for ansible runner that allows to emit Ansible status and events to HTTP services in the form of POST events, so as to notify AnsibleRun controller about the run results and attatch that to AnsibleRun
as status.
The provider working directory is used to host the Ansible contents downloaded from remote place. It is currently inside the provider container so that will not be persisted permanently by the provider. As a result, when the provider pod restarts, the contents will be lost, but the provider will download them from remote place again.
There are two types of sources from which the Ansible contents can be retrieved, installed and run by Ansible provider.
This is maily for quick test. You can inline the Ansible content to be run in an AnsibleRun
resource. It will be wrapped as a playbook.yml
file and stored in the working directory for the provider to run.
Here is an example to use an inline playbook to call a builtin Ansible module:
apiVersion: ansible.crossplane.io/v1alpha1
kind: AnsibleRun
metadata:
name: inline-example
spec:
forProvider:
# For simple cases, you can use an inline source to specify the content of
# playbook.yml as opaque and inline yaml.
playbookInline: |
---
- hosts: localhost
tasks:
- name: simple-playbook
debug:
msg: You are running 'simple-playbook'
providerConfigRef:
name: provider-config-example
This is more useful for a real project where Ansible contents are hosted in a remote place. The Ansible contents can be retrieved from Ansible Galaxy as community contents, or Automation Hub as Red Hat certified and supported contents, or a private Automation Hub that hosts private contents created and curated by an organization, or even a GitHub repository.
Here is an example to run multiple Ansible roles listed using spec.forProvider.roles
, they will be run sequencially one after another:
apiVersion: ansible.crossplane.io/v1alpha1
kind: AnsibleRun
metadata:
name: remote-example
spec:
forProvider:
roles:
- sample_namespace.sample_role
- sample_namespace.another_sample_role
providerConfigRef:
name: provider-config-example
By default, the roles being referenced will be retrieved from Ansible Galaxy. This is identical to run ansible-galaxy
from command line as below:
ansible-galaxy role install sample_namespace.sample_role
To retrieve Ansible contents from other places, please refer to Requirements Declaration.
The Ansible provider supports to retrieve Ansible contents from different places including Ansible Galaxy, public or private Automation Hub, and GitHub repository. This can be configured by declaring requirements
in ProviderConfig
resource. The requirements definition will be wrapped as a requirements.yml
file and stored in the working directory for the provider to consume.
Here is an example to retrieve an Ansible collection from Ansible Galaxy:
apiVersion: ansible.crossplane.io/v1alpha1
kind: ProviderConfig
metadata:
name: provider-config-example
spec:
requirements: |
---
collections:
# Install a collection from Ansible Galaxy.
- name: sample_namespace.sample_collection
version: 0.1.0
source: https://galaxy.ansible.com
This is identical to run ansible-galaxy
from command line as below:
ansible-galaxy install -r requirements.yml
To retrieve Ansible contents from GitHub repository:
apiVersion: ansible.crossplane.io/v1alpha1
kind: ProviderConfig
metadata:
name: provider-config-example
spec:
requirements: |
---
collections:
# Install a collection from GitHub repository.
- name: https://github.com/sample_namespace/sample_collection.git
version: 0.1.0
type: git
To retrieve Ansible contents from Automation Hub or private GitHub repository that requires credentials, this can be configured by declaring credentials
in ProviderConfig
.
Here is an example to retrieve an Ansible collection from a private GitHub repository using git credentials:
apiVersion: ansible.crossplane.io/v1alpha1
kind: ProviderConfig
metadata:
name: provider-config-example
spec:
credentials:
- filename: .git-credentials
source: Secret
secretRef:
namespace: crossplane-system
name: git-credentials
key: .git-credentials
requirements: |
---
collections:
# Install a collection from GitHub repository.
- name: https://github.com/sample_namespace/sample_collection.git
version: 0.1.0
type: git
It requires to create a secret git-credentials
including the credentials and is referenced in ProviderConfig
as above.
Besides Ansible collections, you can also define Ansible roles as requirements in ProviderConfig
and you can define both roles and collections in the same ProviderConfig
resource. For example:
apiVersion: ansible.crossplane.io/v1alpha1
kind: ProviderConfig
metadata:
name: provider-config-example
spec:
requirements: |
---
roles:
# Install a role from Ansible Galaxy.
- name: sample_namespace.sample_role
version: 0.1.0
collections:
# Install a collection from Ansible Galaxy.
- name: sample_namespace.sample_collection
version: 0.1.0
source: https://galaxy.ansible.com
Ansible provider supports running different types of Ansible contents using AnsibleRun
, including roles and playbooks. You can not define roles and playbooks in the same AnsibleRun
resource. They are mutually exclusive.
You have already seen how to run Ansible role and inline playbook. Here is an example to run an Ansible playbook that is included in a collection, using spec.forProvider.playbook
:
apiVersion: ansible.crossplane.io/v1alpha1
kind: AnsibleRun
metadata:
name: remote-example
spec:
forProvider:
playbook: sample_namespace.sample_collection.sample_playbook
providerConfigRef:
name: provider-config-example
If multiple playbooks are listed using spec.forProvider.playbooks
, they will be run sequencially one after another. At the time of writing this document, only single playbook execution is supported.
apiVersion: ansible.crossplane.io/v1alpha1
kind: AnsibleRun
metadata:
name: remote-example
spec:
forProvider:
playbooks:
- sample_namespace.sample_collection.sample_playbook
- sample_namespace.sample_collection.another_sample_playbook
providerConfigRef:
name: provider-config-example
The collection that includes the playbook is defined as a requirement in ProviderConfig
and retrieved from a remote place.
When using inline playbook, besides builtin Ansible modules, you can also include roles or playbooks from collections that are defined as requirements in ProviderConfig
and retrieved from remote place. For example, below inline playbook is to call nginx
role in nginxinc.nginx_core
collection:
apiVersion: ansible.crossplane.io/v1alpha1
kind: AnsibleRun
metadata:
name: inline-example
spec:
forProvider:
playbookInline: |
---
- hosts: all
collections:
- nginxinc.nginx_core
tasks:
- name: Install NGINX
include_role:
name: nginx
providerConfigRef:
name: provider-config-example
The corresponding collection is defined as requirement in ProviderConfig
:
apiVersion: ansible.crossplane.io/v1alpha1
kind: ProviderConfig
metadata:
name: example
spec:
requirements: |
---
collections:
- name: nginxinc.nginx_core
version: 0.5.0
Ansible uses variables to manage differences among systems on which Ansible operates, so it can run roles or playbooks on multiple systems using single command. Ansible provider allows you to pass those differences into Ansible run through vars
field when define AnsibleRun
resource.
Here is an example to define variables including a simple variable, a list variable, and a dictionary variable. These variable types are also supported in Ansible. The variables will be passed into Ansible role sample_namespace.sample_role
to consume:
apiVersion: ansible.crossplane.io/v1alpha1
kind: AnsibleRun
metadata:
name: remote-example
spec:
forProvider:
roles:
- sample_namespace.sample_role
vars:
foo: value1
bar:
- value2
- value3
baz:
field 1: value1
field 2: value2
providerConfigRef:
name: provider-config-example
The reason that AnsibleRun supports the same set of variable types as Ansible does is that:
- By following the same way, it makes the Ansible provider adoption smoothly for existing Ansible users who are familiar with the variable use in Ansible.
- It is not always sufficient to express variables using simple key/value pair. Some times, people may ask for variables in the form of list or dictionary.
After you define the variables as above, you can reference them in Ansible roles or playbooks using Jinja2 syntax as below:
- name: An example to show the use of vars
debug:
msg: "Print {{ foo }}."
- name: An example to show the use of vars
debug:
msg: "Print {{ bar[0] }}."
- name: An example to show the use of vars
debug:
msg: "Print {{ baz['field1'] }}."
- name: An example to show the use of vars
debug:
msg: "Print {{ baz.field2 }}."
Besides that, you can also define variables in a ConfigMap
or Secret
as reusable piece, then reference it in one or more AnsibleRun
resources using varFiles
, just as you define variables in reusable variable files in Ansible, then reference it using vars_files
in playbooks.
Here is an example:
apiVersion: ansible.crossplane.io/v1alpha1
kind: AnsibleRun
metadata:
name: remote-example
spec:
forProvider:
roles:
- sample_namespace.sample_role
varFiles:
- source: ConfigMapKey
configMapKeyRef:
namespace: default
name: plain_vars
key: plain_vars.yml
- source: SecretKey
secretKeyRef:
namespace: default
name: secret_vars
key: secret_vars.yml
providerConfigRef:
name: provider-config-example
Please note that the feature varFiles
has not been implemented yet. It will be supported in the coming releases.
To support loading Ansible roles or playbooks at runtime, the provider also allows users to manage their Ansible contents by specifiying some native Ansible environment variables to customize Ansible default behavior. Since such configuration may have a global impact across all Ansible runs, this is done by passing variables in ProviderConfig.
Here is an example:
apiVersion: ansible.crossplane.io/v1alpha1
kind: ProviderConfig
metadata:
name: provider-config-example
spec:
vars:
# Specify the path where the Ansible roles are located
- key: ANSIBLE_ROLE_PATH
value: /path/to/roles
# Specify the path where the Ansible collections are located
- key: ANSIBLE_COLLECTION_PATH
value: /path/to/collections
This section discusses how Ansible provider maps Ansible run to Crossplane resource management lifecycle, that is a resource management centric lifecycle. Before that, let's understand how Crossplane manages resource.
In Crosspane, Managed Resource Reconciler is a key component to drive the resource management by following Resource Management Lifecycle. It exposes a set of lifecycle methods that allows developers who write providers to customize the actual behavior, typically CRUD operation, for a certain series of resources that they are interested in.
When a managed resource is created, updated, or deleted, it will be detected by the reconciler, then go through the lifecycel to trigger the corresponding method at each stage. This is illustrated in the following diagram.
- It firstly calls
Connect()
and usually connects the target system where hosts the resources to be created. - It then calls
Observe()
and usually compares the desired state that is defined by the managed resource created locally and the actual state for the resource hosted on target system, or we call it the external resource as opposed to the managed resource. Also,Observe()
is called periodically which is defined by a poll interval. - If the external resource does not exist, it will call
Create()
to delegate the actual resource creating job to provider. After then, it will requeue to wait for the next run of reconciliation so that it can keep synchronizing the state between the external resource and the managed resource. - If the external resource does exists, but is not update to date, it will call
Update()
to delegate the actual resource updating job to provider. Once it's done successfully, it will requeue to wait for the next run of reconciliation controlled by poll interval to discover any state difference between the external resource and the managed resource. - When the managed resource is deleted, it will call
Delete()
that allows provider to do clean up job for the external resource. After then, just asCreate()/Update()
does, it will requeue to wait for the next run of reconciliation so that it can keep synchronizing the state between the external resource and the managed resource. - For all above CRUD methods, if there is an error occurred, it will report the error and requeue to wait for the next run of reconciliation to give it another try.
The Crossplane resource management lifecycle is composed with a set of phases or methods. To implement a Crossplane provider, it usually involves writing code for each method that implements the behavior to support the corresponding phase. For Ansible provider, it delegates the action to Ansible binary to make changes to the resource on target system. This is the major difference compared to other Crossplane providers. For example, as opposed to providers that manage resources on public cloud, we no longer make direct API calls to the cloud using local binaries or golang libraries inside the provider, but instead we rely on the local Ansible binary to execute the Ansible contents retrieved from remote places to make these calls or changes. This can be illustrated by the following diagram.
From the resource management perspective, in Ansible provider, AnsibleRun
is a way to describe the desired state of the resource on target system. It depends on the cooperation among Managed Resource Reconciler, Ansible provider, and Ansible contents to ensure the consistency between the resource desired state and actual state.
For the logic in Ansible contents, it is totally determined by developers who create the Ansible contents. This is not covered in this document. But for the cooperation between Ansible provider and Managed Resource Reconciler, this is deterministic and will be discussed in the following sections. You will see how we map Ansible run to a sub-set of the phases defined by Crossplane Resource Management Lifecyle to support the resource management use case.
To support Ansible run, Ansible provider requires the Ansible contents to be available before run. This may involve the work to retrieve Ansible contents from a remote place or to create a playbook.yml
file including the inline content, then store into the local working directory.
In Ansible provider, this is supported by implement the above logic in Connect()
.
Once an AnsibleRun
resource is created, the reconciler will call the provider method Connect()
to retrieve Ansible contents from the remote or generate inline playbook file which depends on how we define AnsibleRun
.
Once Ansible contents are available, we can start the Ansible run. Ansible provider supports a couple of run policies to fulfill different types of requirements. The policy is represented as annotation ansible.crossplane.io/runPolicy
, which can be applied to AnsibleRun
resource, to instruct the provider how to run the corresponding Ansible contents.
This is the default policy and probably the most commonly used policy to manage Ansible run. When this policy is applied, the provider uses Observe()
to handle the case when the managed resource AnsibleRun
is present, and uses Delete()
to handle the case when the managed resource is absent. Both Observe()
and Delete()
will call the same set of Ansible contents.
Here is an example to run an Ansible role using ObserveAndDelete policy to provision an OpenShift cluster remotely:
apiVersion: ansible.crossplane.io/v1alpha1
kind: AnsibleRun
metadata:
name: openshift-cluster
annotations:
ansible.crossplane.io/runPolicy: ObserveAndDelete
spec:
forProvider:
roles:
- sample_namespace.openshift_cluster
vars:
ocpVersion: "4.8.27"
platform: "x"
size: "large"
providerConfigRef:
name: provider-config-example
In this case, we define the desired state as source of truth for our OpenShift cluster using spec.forProvider.vars
. For example, it specifies the OpenShift version, the platform, and the size for the cluster to be provisioned. It relies on the Ansible role sample_namespace.openshift_cluster
to provision the cluster for us.
The annotation used to specify the policy information is not part of the desired state or source of truth. It is really just a small chunk of metadata that instructs the Ansible provider how to trigger the Ansible role.
When user creates the AnsibleRun
resource, it means they claim to request the cluster. This will trigger the Ansible role in Observe()
.
When user edits the AnsibleRun
resource, it means they claim to update the cluster. This will trigger the Ansible role in Observe()
.
When user deletes the AnsibleRun
resource, it means they claim to drop the cluster. This will trigger the same Ansible role in Delete()
to clean the cluster.
In order to differentiate the presence or absence of AnsibleRun
, a special variable will be sent to the Ansible role when it starts to run:
ansible_provider_meta.managed_resource.state = absence|presence
This variable belongs to a special set of variables maintained by Ansible provider where developers who create the Ansible contents can reference these variables when needed. For example, to run Ansible playbooks conditionally by checking the variable that indicates the presence or absence of AnsibleRun
as below:
- include_tasks: setup-resource.yml
when: ansible_provider_meta.managed_resource.state == 'present'
- include_tasks: cleanup-resource.yml
when: ansible_provider_meta.managed_resource.state == 'absent'
In future release, we should allow users to use arbitrary name for the variable that represents the presence or absence of the AnsibleRun
resource, so that the Ansible contents maintained by user do not have to be coupled with or aware of Ansible provider.
This policy can be used when the Ansible modules that you use in your Ansible roles or playbooks support check mode. According to Ansible documents, check mode is a way for Ansible to do a "Dry Run". In check mode, Ansible runs without making any changes on remote systems. Modules that support check mode report the changes they would have made.
When this policy is applied, the provider will run the Ansible contents in Observe()
but using check mode. This will not apply any real change on target system, but is only used to detect changes between the actual state on target system and the desired state defined in AnsibleRun
resource. If any change is detected, the provider will then trigger Update()
to kick off the actual run of the same set of Ansible contents. Then, the ansible-runner
will determine whether it is a create or update operation. For the call of Delete()
, as previously discussed, it is triggered when the AnsibleRun
resource is deleted.
Here is an example to run an Ansible role using CheckWhenObserve policy:
apiVersion: ansible.crossplane.io/v1alpha1
kind: AnsibleRun
metadata:
name: remote-example
annotations:
ansible.crossplane.io/runPolicy: CheckWhenObserve
spec:
forProvider:
roles:
- sample_namespace.sample_role
providerConfigRef:
name: provider-config-example
In order to differentiate the presence or absence of the AnsibleRun
resource, we can still use the previously discussed variable maintained by the provider and sent to Ansible when the Ansible contents start to run. For the variable value, when Observe()
, Create
, or Update
is called, the value presence
will be passed, otherwise, the value absense
will be passed.
Note, because Ansible modules that do not support check mode report nothing and do nothing, if you use this policy in such a case, Observe()
will not detect any change. As a result, neither Create()
nor Update()
will get triggered.
The policy annotation is not mandatory. If no policy annotation is specified, the provider will take ObserveAndDelete
as the default policy which does not rely on check mode. The reasons that using annotation to specify the policy are that:
-
To avoid the confusion for people who mix it with the desired state defined in
spec
field. It is just a small chunk of metadata that instructs the Ansible provider how to trigger the Ansible roles or playbooks. -
To avoid the overhead of API version upgrade if we change the behavior later per user feedback. The idea of policy is still at early stage and may be subject to change. Instead of using annotation, if we add that into
spec
field, we will have to deal with API version upgrade to support backward compatibility or migration for existing provider users.
Althouth there is no significant hard requirement in general for Ansible contents to work with Ansible provider, there are still some best practices for developers who maintain Ansible conents to take as reference. These are also guidelines for people to write general Ansible roles or playbooks effectively, which is not Ansible provider specific.
It is always a best practice to write Ansible roles or playbooks in an idempotent way. As an example, if a playbook consists of 5 steps, and the system deviates in step 3 from the desired state, then only this particular step needs to be applied. By its nature, Ansible tasks will only change the system if there is something has to do.
From Ansible provider perspective, this is required because the same Ansible contents will be run many times in Resource Management Lifecycle.
It is a common practice in many Ansible modules that support state field and behave differently according to the state value, e.g.: precence
or absence
. Explicitly setting state=present
or state=absent
makes playbooks and roles clearer.
From Ansible provider perspective, this is required because the same Ansible contents will be used to handle both the case when the AnsibleRun
resource is present and the case when the AnsibleRun
resource is absent.
The Operator Framework is an open source toolkit to manage Kubernetes native applications, Operators, in an effective, automated, and scalable way. The Operator SDK as a framework can help make writing operators more simple. The SDK enables Operator development in Go, Helm, and Ansible. In this section, we will discuss how Ansible operator works and compare it with Ansible provider.
The way that an Ansible operator works can be illustrated in following diagram.
An Ansible operator allows you to bundle a set of Ansible roles or playbooks that are typically used to manage Kubernetes resources. By using a file called watches.yaml
, it defines mappings between Kubernetes custom resources to Ansible roles or playbooks. So, when a custom resource is created, the operator knows which role or playbook needs to be called.
Just like Ansible provider, Ansible operator uses Ansible Runner to execute the actual Ansible roles or playbooks, and uses ansible-runner-http
, a plugin for ansible runner, to collect the run results, then works as an Event Reciever listening at unix socket for a peered Kubernetes controller to query.
Ansible operator has a small embedded proxy listening at localhost:8888 that is dedicately designed to intercept calls from the Ansible roles or playbooks to manage Kubernetes resources. It mainly does the following things:
-
Inject owner reference to those Kubernetes resources that are created by Ansible roles or playbooks. By doing so, the resource created by Ansible will be treated as the child resource of the user created resource, e.g.: using kubectl, that is used to trigger the Ansible run. This allows the operator to delete the child resource in a cascaded way when the top level resource is deleted.
-
Add watch to the controller for those Kubernetes resources that are created by Ansible roles or playbooks. By doing so, anytime when the Ansible created resources are modified, it will be detected and reverted by the controller that re-runs the Ansible roles or playbooks. This guarantees that the actual state on target system is always identical to the desired state defined by the user created resource.
As you can see, Ansible operator can also be used to drive Ansible run in a Kubernetes native way. However, there are a few major differencs:
-
Ansible operator is mainly designed for managing Kubernetes resources. As you can see from the above digram, the embedded proxy does a lot of dirty work to handle the child level Kubernetes resources generated by the Ansible run. This proxy is only useful when the Ansible run is to interact with Kubernetes cluster. Ansible provider, on the other side, does not has such limitation. It can handle arbitrary target systems, no matter it is cloud native such as Kubernetes, public clouds, or non cloud native such as virtual machine, bare metal machine, etc.
-
Ansible operator requires you to bundle Ansible contents into a container image then run as a Kubernetes application. If you have ten Ansible roles, you may need to build multiple container images where each image bundled with one or more Ansible roles that handles different Kubernetes resources. Ansible provider works differently. Instead of bundling the controller with Ansible contents inside the same container image, the provider works as a standalone Kubernetes application and runs separately. This allows people to have the provider call Ansible contents remotely from any place such as Ansible Galaxy, Automation Hub, or GitHub repository.
-
Ansible operator is usually implemented as an in-cluster Kubernetes application where the resources or workloads it operates on are co-located within the same cluster. On the contrary, Ansible provider is designed to manage resources remotely. It usually runs itself in a Kubernetes cluster, then operates on the resources or workloads on remote systems. Of course, it can also handle in-cluster resources or workloads if your Ansible contents manage Kubernetes resources using Ansible k8s module.
The following list includes the major features that are discussed in this document with their current status: implemented or not implemented.
- ✅ Inline Playbook
- ✅ Remote Role
- ❎ Remote Playbook
- ✅ Credentials
- ✅ Requirements
- ✅ Variables
- ✅ Ansible Run Policy: ObserveAndDelete
- ✅ Ansible Run Policy: CheckWhenObserve