An overview of articles
There are a couple of ways to loop in Terraform. Let’s have a look.
Count
This is the “oldest” method. It’s quite powerful.
resource "some_resource" "some_name" {
count = 3
name = "my_resource_${count.index}"
}
As you can see, the resource some_resource
is being created 3 (count = 3
) times.
The name
should be unique, so the count.index
variable is used. This variable is available when using count
.
The variable count.index
has these values:
itteration |
count.index value |
first |
0 |
second |
1 |
third |
2 |
And so on.
Counting and looking up values
The parameter count
sounds simple, but is actually quite powerful. Lets have a look at this example.
Here is a sample .tfvars
file:
virtual_machines = [
{
name = "first"
size = 16
},
{
name = "second"
size = 32
}
]
The above structure is a “list of maps”:
- List is indicated by the
[
and ]
character.
- Map is indicated by the
{
and }
character.
Now lets loop over that list:
resource "fake_virtual_machine" "default" {
count = length(var.virtual_machines)
name = var.virtual_machines[count.index].name
size = var.virtual_machines[count.index].size
}
A couple of tricks happen here:
count
is calculated by the length
function. It basically counts how many maps there are in the list virtual_machines
.
name
is looked-up in the variable var.virtual_machines
. Pick the first (0
) entry from var.virtual_machines
in the first itteration.
- Similarly
size
is looked up.
This results in two resources:
resource "fake_virtual_machine" "default" {
name = "first"
size = 16
}
# NOTE: This code does not work; `default` may not be repeated. It's just to explain what happens.
resource "fake_virtual_machine" "default" {
name = "second"
size = 32
}
For each
The looping mechanism for_each
can also be used. Similar to the count
example, let’s think of a data structure to make virtual machines:
virtual_machines = [
{
name = "first"
size = 16
},
{
name = "second"
size = 32
}
]
And let’s use for_each
to loop over this structure.
resource "fake_virtual_machine" "default" {
for_each = var.virtual_machines
name = each.value.name
size = each.value.size
}
This pattern creates exactly the same resources as the count
example.
Dynamic block
Imagine there is some block in the fake_virtual_machine
resource, like this:
resource "fake_virtual_machine" "default" {
name = "example"
disk {
name = "os"
size = 32
}
disk {
name = "data"
size = 128
}
}
The variable structure we’ll use looks like this:
virtual_machines = [
{
name = "first"
disks [
{
name = "os"
size = 32
},
{
name = "data"
size = 128
}
]
},
{
name = "second"
disks [
{
name = "os"
size = 64
},
{
name = "data"
size = 256
}
]
}
]
As you can see:
- A list of
virtual_machines
.
- Each virtual_machine has a list of
disks
.
Now let’s introduce the dynamic block:
resource "fake_virtual_machine" "default" {
for_each = var.virtual_machines
name = each.value.name
dynamic "disk" {
for_each = each.value.disks
content {
name = disk.value.name
size = disk.value.size
}
}
}
Wow, that’s a lot to explain:
- The
dynamic "disk" {
starts a dynamic block. The name (“disk”) must reflect the parameter in the resource, not juts any name. Now a new object is available; disk
.
- The
for_each = each.value.disks
loops the dynamic block. The loop uses disks
from an already looped value var.virtual_machines
.
- The
content {
block will be rendered by Terraform.
- The
name = disk.value.name
uses the disk
variable (created by the block iterator
disk
) to find the value from the disks
map.
Hope that helps a bit when writing loops in Terraform!
Looping is quite an important mechanism in coding. (Thanks @Andreas for the word coding
, a mix of scripting and programming.)
Looping allows you to write a piece of logic once, and reuse it as many times are required.
Looping is difficult to understand for new people new to coding. It’s sometimes difficult for me to. This article will probably help me a bit too!
A sequence
The simplest loop I know is repeating a piece of logic for a set of number or letters, like this:
First off, here is how to generate a sequence:
bash |
ansible |
terraform |
seq 1 3 |
with_sequence: start=1 end=3 |
range(1, 3) |
{1..3} |
|
|
For all languages, this returns a list of numbers.
bash |
ansible |
terraform |
1 2 3 |
item: 1, item: 2, item: 3 |
[ 1, 2, 3, ] |
So, a more complete example for the three languages:
Bash
for number in {1..3} ; do
echo "number: ${number}
done
The above script returns:
Ansible
- name: show something
debug:
msg: "number: "
with_sequence:
start=1
end=3
Note: See that =
sign, I was expecting a :
so made an issue.
The above script returns:
ok: [localhost] => (item=1) => {
"msg": "1"
}
ok: [localhost] => (item=2) => {
"msg": "2"
}
ok: [localhost] => (item=3) => {
"msg": "3"
}
locals {
numbers = range(1,4)
}
output "number" {
value = local.numbers
}
The above script returns:
number = tolist([
1,
2,
3,
])
Ansible testing components
To test Ansible, I use quite a number of components. This page lists the components uses, their versions, and where they are used.
Component |
Used |
Latest |
Used where |
ansible |
2.9 |
2.9.18 |
tox.ini |
ansible |
2.10 |
2.10.7 |
tox.ini |
molecule |
>=3,<4 |
|
docker-github-action-molecule |
tox |
latest |
n.a. |
docker-github-action-molecule |
ansible-lint |
latest |
|
docker-github-action-molecule |
pre-commit |
2.9.3 |
v2.10.1 |
installed on development desktop. |
molecule-action |
2.6.16 |
|
.github/workflows/molecule.yml |
github-action-molecule |
3.0.6 |
|
.gitlab-ci.yml |
ubuntu |
20.04 |
20.04 |
.github/workflows/galaxy.yml |
ubuntu |
20.04 |
20.04 |
.github/workflows/molecule.yml |
ubuntu |
20.04 |
20.04 |
.github/workflows/requirements2png.yml |
ubuntu |
20.04 |
20.04 |
.github/workflows/todo.yml |
galaxy-action |
1.1.0 |
|
.github/workflows/galaxy.yml |
graphviz-action |
1.0.7 |
|
.github/workflows/requirements2png.yml |
checkout |
v2 |
|
.github/workflows/requirements2png.yml |
checkout |
v2 |
|
.github/workflows/molecule.yml |
todo-to-issue |
v2.3 |
|
.github/workdlows/todo.yml |
python |
3.9 |
3.9 |
.travis.yml |
pre-commit-hooks |
v3.4.0 |
|
.pre-commit-config.yaml |
yamllint |
v1.26.0 |
v1.26.0 |
.pre-commit-config.yaml |
my pre-commit |
v1.4.5 |
|
.pre-commit-config.yaml |
fedora |
33 |
33 |
docker-github-action-molecule |
Debugging GitLab builds
Now that Travis has become unusable, I’m moving stuff to GitLab. Some builds are breaking, this is how to reproduce the errors.
Start the dind
container
export role=ansible-role-dns
cd Documents/github/buluma
docker run --rm --name gitlabci --volume $(pwd)/${role}:/${role}:z --privileged --tty --interactive docker:stable-dind
Login to the dind
container
docker exec --tty --interactive gitlabci /bin/sh
Install software
The dind image is Alpine based and misses required software to run molecule
or tox
.
apk add --no-cache python3 python3-dev py3-pip gcc git curl build-base autoconf automake py3-cryptography linux-headers musl-dev libffi-dev openssl-dev openssh
Tox
GitLab CI tries to run tox (if tox.ini
is found). To emulate GitLab CI, run:
python3 -m pip install tox --ignore-installed
And simply run tox
to see the results.
Molecule
For more in-depth troubleshooting, try installing molecule:
python3 -m pip install ansible molecule[docker] docker ansible-lint
Now you can run molecule
:
cd ${role}
molecule test --destroy=never
molecule login
YAML Anchors and References
This is a post to help me remind how YAML Anchors and References work.
What?
In YAML you can make an Anchor:
- first_name: &me Michael
Now the Anchor me
contains Michael
. To refer to it, do something like this:
- first_name: &me Michael
give_name: *me
The value for given_name
has been set to Michael
.
You can also anchor to a whole list item:
people:
- person: &me
name: Michael
family_name: Buluma
No you may refer to the me
anchor:
Now Michael
has access
.
Good to prevent dry.
Ansible alternatives for shell tricks
If you’re used to shells and their commands like bash, sed and grep, here are a few alternatives for Ansible.
Using these native alternatives has an advantage, developers maintain the Ansible modules for you, they are idempotent and likely work on more distributions or platforms.
Grep
Imagine you need to know if a certain patter is in a file. With a shell script you would use something like this:
grep "pattern" file.txt && do-something-if-pattern-is-found.sh
With Ansible you can achieve a similar result like this:
---
- hosts: all
gather_facts: no
tasks:
- name: check if pattern if found
lineinfile:
path: file.txt
regexp: '.*pattern.*'
line: 'whatever'
register: check_if_pattern_is_found
check_mode: yes
notify: do something
handlers:
- name: do something
command: do-something-if-pattern-is-found.sh
Hm, much longer than the bash example, but native Ansible!
Sed
So you like sed
? So do I. It’s one of the most powerful tools I’ve used.
Lets replace some pattern in a file:
sed 's/pattern_a/pattern_b/g' file.txt
This would repace all occurences of pattern_a
for pattern_b
. Lets see what Ansible looks like.
---
- name: replace something
gather_facts: no
tasks:
- name: replace patterns
lineinfile:
path: file.txt
regexp: '^(.*)pattern_a(.*)$'
line: '\1pattern_b\2'
Have a look at the lineinfile module documentation for more details.
Find and remove.
The find
(UNIX) tools is really powerful too, imagine this command:
find / -name some_file.txt -exec rm {} \;
That command would find all files (and directories) named some_file.txt and remove them. A bit dangerous, but powerful.
With Ansible:
- name: find and remove
gather_facts: no
tasks:
- name: find files
find:
paths: /
patterns: some_file.txt
register: found_files
- name: remove files
file:
path: "{{ item.path }}"
state: absent
loop: "{{ found_files.results }}"
Conclusion
Well, have fun with all non-shell solutions. You hardly needs the shell
or command
modules when you get the hang of it.
Debugging services in Ansible
Sometimes services don’t start and give an error like:
Unable to start service my_service: A dependency job for my_service.service failed. See 'journalctl -xe' for details.
Well, if you’re testing in CI, you can’t really access the instance that has an issue. So how to you troubleshoot this?
I use this pattern frequently:
- name: debug start my_service
block:
- name: start my_service
service:
name: "my_service"
state: started
rescue:
- name: collect information
command: "{{ item }}"
register: my_service_collect_information
loop:
- journalctl --unit my_service --no-pager
- systemctl status my_service
- name: show information
debug:
msg: "{{ item }}"
loop: "{{ my_service_collect_information.results }}"
What’s happening here?
- The
block
is a grouping of tasks.
- The
rescue
runs when the initial task (start my_service
) fails. It has two tasks.
- The task
collect information
runs a few commands, add extra is you know more and saves the results in my_service_collect_information
.
- The task
show information
displays the saved result. Because collect information
has a loop, the variable has .results
appended, which is a list to needs to be looped over.
Hope this helps you troubleshoot services in Travis of Google Actions.
5 times why
Why do I write all this code? What’s the purpose and were does it stop?
To answer this question, there is a method you can use: 5 times why. Answer the initial question and pose a new question: “Why?”. Repeat the process a couple of times until you get to the core of the reason. Let’s go.
1. Why do I write all this code?
Because it’s a great way to learn a language.
2. Why do I want to learn a new language?
Technology changes, this allows me to keep up to date.
3. Why do I need to be up to date?
Being up to date allows me to be relevant.
4. Why do I need to be relevant.
Being relevant allows me to steer my career.
5. Why do I need to steer my career?
I don’t want to depend on 1 single employer and have many options when choosing a job.
Conclusion
So, I write all this code to not depend on any one employer. Interesting conclusion, I tend to agree.
GitHub action to release to Galaxy
GitHub Actions is an approach to offering CI, using other peoples actions from the GitHub Action Marketplace.
I’m using 2 actions to:
- test the role using Molecule
- release the role to Galaxy
GitHub is offering 20 concurrent builds which is quite a lot, more than Travis’s limit of 5. The build could be 4 times faster. Faster CI, happier developers. ;-)
Here are a few examples. First; just release to Galaxy, no testing includes. (Not a smart idea)
---
name: GitHub Action
on:
- push
jobs:
release:
runs-on: ubuntu-latest
steps:
- name: checkout
uses: actions/checkout@v4
- name: galaxy
uses: buluma/galaxy-action@master
with:
galaxy_api_key: "$"
As you can see, 2 actions are used, checkout
which gets the code and galaxy-action
to push the role to Galaxy. Galaxy does lint-testing, but not functional testing. You can use the molecule-action
to do that.
---
name: GitHub Action
on:
- push
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: checkout
uses: actions/checkout@v4
with:
path: "$"
- name: molecule
uses: buluma/molecule-action@1.0.0
with:
image: "$"
release:
needs:
- test
runs-on: ubuntu-latest
steps:
- name: galaxy
uses: buluma/galaxy-action@master
with:
galaxy_api_key: $
The build is split in 2 parts now; test
and release
and release
needs test
to be done.
You can also see that checkout
is now called with a path
which allows Molecule to find itself. (ANSIBLE_ROLES_PATH: $ephemeral_directory/roles/:$project_directory/../
)
Finally you can include a matrix
to build with a matrix of variables set.
---
name: GitHub Action
on:
- push
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
image:
- alpine
- amazonlinux
- debian
- centos
- fedora
- opensuse
- ubuntu
steps:
- name: checkout
uses: actions/checkout@v4
with:
path: "$"
- name: molecule
uses: buluma/molecule-action@1.0.0
with:
image: $
release:
needs:
- test
runs-on: ubuntu-latest
steps:
- name: galaxy
uses: buluma/galaxy-action@master
with:
galaxy_api_key: $
Now your role is tested on the list of images
specified.
Hope these actions make it easier to develop, test and release your roles, if you find problems, please make an issue for either the molecule or galaxy action.
GitHub action to run Molecule
GitHub Actions is an approach to offering CI, using other peoples actions from the GitHub Action Marketplace.
The intent is to let a developer of an Action think about ‘hard stuff’ and the user of an action simply include another step into a workflow.
So; I wrote a GitHub action to test an Ansible role with a single action.
Using the GitHub Action.
Have a look at the Molecule action.
It boils down to adding this snippet to .github/workflows/molecule.yml
:
---
on:
- push
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: checkout
uses: actions/checkout@v4
with:
path: "$"
- name: molecule
uses: buluma/molecule-action@master
How it works
You may want to write your own action, here is an overview of the required components.
+--- Repository with an Ansible role ---+
| - .github/workflows/molecule.yml |
+-+-------------------------------------+
|
| +-------- buluma/molecule-action --------+
+--> | - image: buluma/github-action-molecule |
+-+--------------------------------------------+
|
| +--- github-action-molecule ---+
+--> | - pip install molecule |
| - pip install tox |
+------------------------------+
1. Create a container
First create a container that has all tools installed you need and push it to Docker Hub. Here is the code for my container
2. Create an action
Create a GitHub repository per action. It should at least contain an action.yml
. Have a look at the documentation for Actions.
3. Integrate your action
Pick a repository, and add a file (likely with the name of the action) in .gitlab/workflows/my_action.yml
. The contents should refer to the action:
steps:
- name: checkout
uses: actions/checkout@v4
with:
path: "$"
- name: molecule
uses: buluma/molecule-action@master
with:
image: $
A full example here.
The benefit is that you (or others) can reuse the action. Have fun making GitHub actions!
And, or and not
Today I spent a couple of hours on a condition that contained a mistake. Let me try to help myself and describe a few situations.
Condition?
A condition in Ansible can be described in a when
statement. This is a simple example:
- name: do something only to virtual instances
debug:
msg: "Here is a message from a guest"
when: ansible_virtualization_role == "guest"
And
It’s possible to describe multiple conditions. In Ansible, the when
statement can be a string (see above) or a list:
- name: do something only to Red Hat virtual instances
debug:
msg: "Here is a message from a Red Hat guest"
when:
- ansible_virtualization_role == "guest"
- ansible_os_family == "RedHat"
The above example will run when it’s both a virtual instance and it’s a Red Hat-like system.
Or
Instead of combining (‘and’) conditions, you can also allow multiple condition where either is true:
- name: do something to either Red Hat or virtual instances
debug:
msg: "Here is a message from a Red Hat system or a guest"
when:
- ansible_virtualization_role == "guest" or ansible_os_family == "RedHat"
I like to keep lines short to increase readability:
when:
- ansible_virtualization_role == "guest" or
ansible_os_family == "RedHat"
And & or
You can also combine and
and or
:
- name: do something to a Debian or Red Hat, if it's a virtual instances
debug:
msg: "Here is a message from a Red Hat or Debian guest"
when:
- ansible_virtualization_role == "guest"
- ansible_os_family == "RedHat" or ansible_os_family == "Debian"
In
It’s also possible to check if some pattern is in a list:
- name: make some list
set_fact:
allergies:
- apples
- bananas
- name: Test for allergies
debug:
msg: "A match was found: "
when: item in allergies
loop:
- pears
- milk
- nuts
- apples
You can have multiple lists and check multiple times:
- name: make some list
set_fact:
fruit:
- apples
- bananas
dairy:
- milk
- eggs
- name: Test for allergies
debug:
msg: "A match was found: "
when:
- item in fruit or
item in dairy
loop:
- pears
- milk
- nuts
- apples
Negate
It’s also possible to have search in
a list negatively. This is where it gets difficult: (for me!)
- name: make some list
set_fact:
fruit:
- apples
- bananas
dairy:
- milk
- eggs
- name: Test for allergies
debug:
msg: "No match was found: "
when:
- item not in fruit
- item not in dairy
loop:
- pears
- milk
- nuts
- apples
The twist here is that both conditions (and
) should not be true.
Well, I’ll certainly run into some issue again in the future, hope this helps you (and me) if you ever need a complex condition in Ansible.
Relations between containernames, setup and Galaxy
It’s not easy to find the relation between container names, facts returned from setup (or gather_facts
) and Ansible Galaxy platform names.
Here is an attempt to make life a little easier:
Alpine
containername: alpine
ansible_os_family: Alpine
ansible_distribution: Alpine
galaxy_platform: Alpine
galaxy_version |
docker_tag |
ansible_distribution_major_version |
all |
latest |
3 |
all |
edge |
3 |
AmazonLinux
containername: amazonlinux
ansible_os_family: RedHat
ansible_distribution: Amazon
galaxy_platform: Amazon
galaxy_version |
docker_tag |
ansible_distribution_major_version |
Candidate |
latest |
2 |
2018.03 |
1 |
2018 |
CentOS
containername: centos
ansible_os_family: RedHat
ansible_distribution: CentOS
galaxy_platform: EL
galaxy_version |
docker_tag |
ansible_distribution_major_version |
8 |
latest |
8 |
7 |
7 |
7 |
RockyLinux
containername: rockylinux
ansible_os_family: RedHat
ansible_distribution: Rocky
galaxy_platform: EL
galaxy_version |
docker_tag |
ansible_distribution_major_version |
8 |
latest |
8 |
Debian
containername: debian
ansible_os_family: Debian
ansible_distribution: Debian
galaxy_platform: Debian
galaxy_version |
docker_tag |
ansible_distribution_major_version |
bullseye |
latest |
11 |
bookworm |
bookworm |
testing/12 |
Fedora
containername: fedora
ansible_os_family: RedHat
ansible_distribution: Fedora
galaxy_platform: Fedora
galaxy_version |
docker_tag |
ansible_distribution_major_version |
32 |
32 |
32 |
33 |
latest |
33 |
34 |
rawhide |
34 |
OpenSUSE
containername: opensuse
ansible_os_family: Suse
ansible_distribution: OpenSUSE
galaxy_platform: opensuse
galaxy_version |
docker_tag |
ansible_distribution_major_version |
all |
latest |
15 |
Ubuntu
containername: ubuntu
ansible_os_family: Debian
ansible_distribution: Ubuntu
galaxy_platform: Ubuntu
galaxy_version |
docker_tag |
ansible_distribution_major_version |
focal |
latest |
20 |
bionic |
bionic |
18 |
xenial |
xenial |
16 |
Why would you write Ansible roles for multiple distributions?
During some disucussion with the audience at DevOps Amsterdam, I got some feedback;
My statement are:
- “Keep your code as simple as possible”
- “Write roles for multiple distributions” (To improve logic.)
These two contradict each other: simplicity would mean 1 role for 1 (only my) distribution.
Hm, that’s a very fair point. Still I think writing for multiple operating systems is a good thing, for these reasons:
- You get a better understanding of all the operating systems. For example Ubuntu is (nearly) identical to Debian, SUSE is very similar to Red Hat.
- By writing for multiple distributions, the logic (in
tasks/main.yml
) becomes more stable.
- It’s just very useful to be able to switch distributions without switching roles.
Super important Ansible facts
There are some facts that I use very frequently, they are super important to me.
This is more a therapeutic post for me, than it’s a great read to you. ;-)
Sometimes, actually most of the times, each operating system or distribution needs something specific. For example Apache httpd has different package named for mostly every distribution. This mapping (distro:packagename) can be done using this variable: ansible_os_family
.
Try to select packages/service-names/directories/files/etc based on the most general level and work your way down to more specific when required. This results in a sort of priority list:
- General variable, not related to a distribution. For example:
postfix_package
.
ansible_os_family
variable, related to the type of distribution, for example: httpd_package
which differs for Alpine, Archlinux, Debian, Suse and RedHat. (But is’t the same for docker images debian
and ubuntu
.)
ansible_distribution
variable when each distribution has differences. For example: reboot_requirements
. CentOS needs yum-utils
, but Fedora needs dnf-utils
.
ansible_distribution
and ansible_distribution_major_version
when there are differences per distribution release. For example firewall_packages
. CentOS 6 and CentOS 7 need to have a different package.
Here is a list of containers and their ansible_os_family
.
Container image |
ansible_os_family |
alpine |
Alpine |
archlinux/base |
Archlinux |
centos |
RedHat |
debian |
Debian |
fedora |
RedHat |
opensuse/leap |
Suse |
ubuntu |
Debian |
What I wish ansible collections would be
Ansible Collections is a way of:
- Packaging Ansible Content (modules/roles/playbooks).
- Distributing Ansible Content through Ansible Galaxy.
- Reducing the size of Ansible Engine.
All modules that are now in Ansible will move to Ansible Collections.
I’m not 100% sure how Ansible Collections will work in the future, but here is a guess.
From an Ansible role
I could imagine that requirements.yml will link depending modules and collections. Something like this:
---
- src: buluma.x
type: role
- src: buluma.y
type: collection
That structure would ensure that all modules required to run the role are going to be installed.
From an Ansible playbook repository
Identical to the role setup, I could imagine a requirements.yml
that basically prepares the environment with all required dependencies, either roles or collections.
Loop dependencies
Ansible Collections can depend on other Ansible Collections.
Imagine my_collection
’s requirements.yml:
---
- src: buluma.y
type: collection
The Ansible Collection y
could refer to my_colletion
.
my_collection ---> y
^ |
| |
+-----------+
I’m not sure how that can be resolved or prevented.
How many modules do you need?
Ansible Collections are coming. A big change in Ansible, so a stable version will likely be a good moment to go to Ansible 3. Listening to the developers, I think we can expect Ansible 3 in the spring of 2020.
Anyway, let’s get some stats:
How many modules am I using?
That was not so difficult to estimate: 97 modules.
What ‘weird’ modules?
A bit more difficult to answer, I’ve taken two approaches:
- Take the bottom of the list of “most used modules”.
- Walked through the 97 modules and discover odd once.
- bigip_*: I’ve written a role for a workshop.
- gem: Don’t know, weird.
- debug: What, get it out of there!
- include_vars: Why would that be?
- fail: Let’s check that later.
- set_fact: I’m now a big fan of
set_fact
, most facts can be “rendered” in vars/main.yml
.
How many ‘vendor’ modules?
I expect some Ansible Collections will be maintained by the vendors; Google (GCP), Microsoft (Azure), F5 (BigIP), yum (RedHat), etc. That’s why knowing this upfront is likely smart.
Module |
Times used |
Potential maintainer |
pip |
17 |
PyPi |
apt |
16 |
Canonical |
yum |
9 |
Red Hat |
apt_key |
6 |
Canonical |
apt_repository |
5 |
Canonical |
rpm_key |
4 |
Red Hat |
zypper |
3 |
SUSE |
yum_repository |
3 |
Red Hat |
dnf |
3 |
Red Hat/Fedora |
zypper_repository |
2 |
SUSE |
zabbix_host |
2 |
Zabbix |
zabbix_group |
2 |
Zabbix |
apk |
2 |
Alpine |
tower_* |
7 (combined) |
RedHat |
redhat_subscription |
1 |
RedHat |
pacman |
1 |
ArchLinux |
bigip_* |
6 (combined) |
F5 |
How often do I use what modules?
Place |
Module |
Times used |
1 |
package |
138 |
2 |
service |
137 |
3 |
command |
73 |
4 |
template |
64 |
5 |
file |
62 |
6 |
meta |
27 |
7 |
assert |
26 |
8 |
unarchive |
24 |
9 |
lineinfile |
21 |
10 |
copy |
20 |
Wow, I’m especially surprised by two modules:
- command - I’m going to review if there are modules that I can use instead of
command
. I know very well that command should be used as a last resort, not 73 times… Painful.
- assert - Most roles used to see of variable met the criteria. (If a variable is defined and the type is correct.) Rather wait for
role spec
.
Ansible Fest Atlanta 2019
Announcements on Ansible, AWX, Molecule, Galaxy, Ansible-lint and many other produts are always done on Ansible Fest.
Here is what I picked up on Ansible Fest 2019 in Atlanta, Georgia.
Ansible Collections
Ansible if full of modules, “batteries included” is a common expression. This reduces velocity in adding modules, fixing issues with modules or adding features to modules. Ansible Collections is there to solve this issue.
Ansible will (in a couple of releases) only be the framework, without modules or plugins. Modules will have to be installed seprarately.
There are a few unknowns:
- How to manage dependencies between collections and Ansible. For example, what collections work on which Ansible version.
- The documentation of Ansible is very important, but how to keep the same central point of documentation while spreading all these collections.
- How to deal with colliding module names? Imaging the
file
module is included in more than 1 collection, which one takes precedence?
Anyway, the big take-away: Start to learn to develop or use Ansible Collections, it’s going to be important.
Here is how to develop Ansible Collections and how to use them.
AWX
AWX is refactoring components to improve development velocity and the performance of the product itself.
- New UI, based on React and Pattern-Fly.
tower-cli
will be replaced by awx
, which exposed the availabe commands based on the capabilities of the AWX API. The version of awx
will be the same as the AWX web/api-tool.
Data analysis
There are a few applications to analyse data and give insights on development and usage of Ansible:
There are many more perspectives, have a look.
Next Ansible Fest not in Europe
Spain seems to be the largest contributor of Ansible, but next Ansible Fest will be in San Diego.
The Contributors Summit will be in Europe though.
Why “hardening” is not a role
I see many developers writing an Ansible role for hardening
. Although these roles can absolutely be useful, here is why I think there is a better way.
Roles are (not always, but frequently) product centric. Think of role names like:
A role for hardening you system has the potential to cover all kinds of topics that are covered in the product specific roles.
Besides that, in my opinion a role should be:
- Small
- Cover on function
A good indicator of a role that’s too big is having multiple task files in tasks
.
So my suggestion to not use a harden
role, but rather have each role that you compose a system out of, use secure defaults.
Ansible Galaxy Collections are here!
As the documentation describes:
Collections are a new way to package and distribute Ansible related content.
I write a lot of roles, roles are nice, but it’s a bit like ingredients without a recipe: A role is only a part of the whole picture.
Collections allow you to package:
- roles
- actions
- filters
- lookup plugins
- modules
- strategies
So instead of [upstreaming](https://en.wikipedia.org/wiki/Upstream_(software_development) content to Ansible, you can publish or consume content yourself.
The whole process is documented and should not be difficult.
I’ve published my development_environment and only had to change these things:
1. Add galaxy.yml
namespace: "buluma"
name: "development_environment"
description: Install everything you need to develop Ansible roles.
version: "1.0.4"
readme: "README.md"
authors:
- "Michael Buluma"
dependencies:
license:
- "Apache-2.0"
tags:
- development
- molecule
- ara
repository: "https://github.com/buluma/ansible-development-environment"
documentation: "https://github.com/buluma/ansible-development-environment/blob/master/README.md"
homepage: "https://buluma.nl"
issues: "https://github.com/buluma/ansible-development-environment/issues"
2. Enable Travis for the repository
Go to Travis and click Sync account
. Wait a minute or so and enable the repository containing your collection.
3. Save a hidden variable in Travis
Under settings
for a repository you can find Environment Variables
. Add one, I called it galaxy_api_key
. You’ll refer to this variable in .travis.yml
later.
4. Add .travis.yml
---
language: python
install:
- pip install mazer
- release=$(mazer build | tail -n1 | awk '{print $NF}')
script:
- mazer publish --api-key=${galaxy_api_key} ${release}
Bonus hint: Normally you don’t save roles, so you add something like roles/*
to .gitignore
, but in this case it is a part of the collection. So if you have requirements.yml
, download all the roles locally using ansible-galaxy install -r roles/requirements.yml -f
and include them in the commit.
Fedora 30 and above use python-3
Fedora 30 (and above) uses python 3 and starts to deprecate python 2 package like python2-dnf
.
Ansible 2.8 and above discover the python interpreter, but Ansible 2.7 and lower do not have this feature.
So for a while, you have to tell Ansible to use python 3. This can be done by setting the ansible_python_interpreter
somewhere. Here are a few locations you could use:
1. inventory
This is quite a good location, because you could decide to give a single node this variable:
inventory/host_vars/my_host.yml:
---
ansible_python_interpreter: /usr/bin/python3
Or you could group hosts and apply a variable to it:
inventory/hosts:
inventory/group_vars/python3.yml
---
ansible_python_interpreter: /usr/bin/python3
You could start a playbook and set the ansible_python_interpreter
once:
ansible-playbook my_playbook.yml --extra_vars "ansible_python_interpreter=/usr/bin/python3"
It’s not very persistent though.
3. playbook or role
You could save the variable in your playbook or role, but this makes re-using code more difficult; it will only work on machines with /usr/bin/python3:
---
- name: do something
hosts: all
vars:
ansible_python_interpreter: /usr/bin/python3
tasks:
- name: do some task
debug:
msg: "Yes, it works."
4. molecule
Last case I can think of it to let Molecule set ansible_python_interpreter
.
molecule/default/molecule.yml:
---
# Many parameters omitted.
provisioner:
name: ansible
inventory:
group_vars:
all:
ansible_python_interpreter: /usr/bin/python3
# More parameters omitted.
Why you should use the Ansible set_fact module
So far it seems that the Ansible set_fact
module is not required very often. I found 2 cases in the roles I write:
In the awx role:
- name: pick most recent tag
set_fact:
awx_version: ""
with_items:
- ""
In the zabbix_server role:
- name: find version of zabbix-server-mysql
set_fact:
zabbix_server_version: ""
In both cases a “complex” variable strucure is saved into a simpler to call variable name.
Variables that are constructed of other variables can be set in vars/main.yml
. For example the kernel role needs a version of the kernel in defaults/main.yml
:
And the rest can be calculated in vars/main.yml
:
kernel_unarchive_src: https://cdn.kernel.org/pub/linux/kernel/v.x/linux-.tar.xz
So sometimes set_fact
can be used to keep code simple, other (most) times vars/main.yml
can help.
For a moral compass Southpark uses Brian Boitano, where my moral coding compass uses Jeff Geerling who would say something like: “If your code is complex, it’s probably not good.”
Different methods to include roles
There are several ways to include roles from playbooks or roles.
Classic
The classic way:
---
- name: Build a machine
hosts: all
roles:
- buluma.bootstrap
- buluma.java
- buluma.tomcat
Or a variation that allows per-role variables:
---
- name: Build a machine
hosts: all
roles:
- role: buluma.bootstrap
- role: buluma.java
vars: java_version: 9
- role: buluma.tomcat
Include role
The include_role way:
---
- name: Build a machine
hosts: all
tasks:
- name: include bootstrap
include_role:
name: buluma.bootstrap
- name: include java
include_role:
name: buluma.java
- name: include tomcat
include_role:
name: buluma.tomcat
Or a with_items (since Ansible 2.3) variation:
---
- name: Build a machine
hosts: all
tasks:
- name: include role
include_role:
name: ""
with_items:
- buluma.bootstrap
- buluma.java
- buluma.tomcat
Sometimes it can be required to call one role from another role. I’d personally use import_role like this:
---
- name: do something
debug:
msg: "Some task"
- name: call another role
import_role:
name: role.name
If the role (role.name in this example) requires variables, you can set them in vars/main.yml
, like so:
variable_x_for_role_name: foo
variable_y_for_role_name: bar
A real life example is my buluma.artifactory role calls buluma.service role to add a service.
The code for the artifactory role contains:
# snippet
- name: create artifactory service
import_role:
name: buluma.service
# endsnippet
and the variable are set in [vars/main.yml](https://github.com/buluma/ansible-role-artifactory/blob/master/vars/main.yml)
contains:
service_list:
- name: artifactory
description: Start script for Artifactory
start_command: "/bin/artifactory.sh start"
stop_command: "/bin/artifactory.sh stop"
type: forking
status_pattern: artifactory
How to write and maintain many Ansible roles
It’s great to have many code nuggets around to help you setup an environment rapidly. Ansible roles are perfect to describe what you want to do on systems.
As soon as you start to write more roles, you start to develop a style and way of working. Here are the tings I’ve learned managing many roles.
Use a skeleton for stating a new role
When you start to write a new role, you can start with pre-populated code:
ansible-galaxy init --role-skeleton=ansible-role-skeleton role_name
To explain what happens:
ansible-galaxy
is a command. This may change to molecule
in the future.
init
tells ansible-galaxy to initialize a new role.
--role-skeleton=ansible-role-skeleton
refers to a skeleton ansible role. I use his repository.
role_name
is the name of your new role. I use short names here, like nginx
or postfix
.
Use ansible-lint for quick feedback
Andrew has written a tool including many rules that help you write readable and consistent code.
There are times where I don’t agree to the rules, but the feedback is quickly processed.A
There are also times where I initially think rules are useless, but after a while I’m convinced about the intent and change my code.
You can also describe your preferences and use ansible-lint to verify you code. Great for teams that need to agree on a style.
Use molecule on Travis to test
In my opinion the most important part of writing code is testing. I spend a lot of time on writing and executing tests. It helps yourself to prove that certain scenarios work as intended.
Travis can help test your software. A typical commit takes some 30 to 45 minutes to test, but after that I know:
- It works on the platforms I want to support.
- When it works, the software is released to Galaxy
- Pull requests are automatically tested.
It makes me less afraid of committing.
When I write some new functionality, I typically need a few iterations to make it work. Using GitHub releases helps me to capture (and release) a working version of a role.
You can play as much as you want in between releases, but when a release is done, the role should work.
Go forth and develop!
You can setup a machine yourself for developing Ansible roles. I’ve prepared a repository that may help.
The playbook in that repository looks something like this:
---
- name: setup an ansible development environment
hosts: all
become: yes
gather_facts: no
roles:
- buluma.bootstrap
- buluma.update
- buluma.fail2ban
- buluma.openssh
- buluma.digitalocean_agent
- buluma.common
- buluma.users
- buluma.postfix
- buluma.docker
- buluma.investigate
- buluma.ansible
- buluma.ansible_lint
- buluma.buildtools
- buluma.molecule
- buluma.ara
- buluma.ruby
- buluma.travis
tasks:
- name: copy private key
copy:
src: id_rsa
dest: /home/robertdb/.ssh/id_rsa
mode: "0400"
owner: robertdb
group: robertdb
- name: copy git configuration
copy:
src: gitconfig
dest: /home/robertdb/.gitconfig
- name: create repository_destination
file:
path: ""
state: directory
owner: robertdb
group: robertdb
- name: clone all roles
git:
repo: "/.git"
dest: "/"
accept_hostkey: yes
key_file: /home/robertdb/.ssh/id_rsa
with_items: ""
become_user: robertdb
When is a role a role
Sometimes it’s not easy to see when Ansible code should be captured in an Ansible role, or when tasks can be used.
Here are some guidelines that help me decide when to choose for writing an Ansible role:
Don’t repeat yourself
When you start to see that your repeating blocks of code, it’s probably time to move those tasks into an Ansible role.
Repeating yourself may:
- Introduce more errors
- Be more difficult to maintain
Keep it simple
Over time Ansible roles tend to get more complex. Jeff Geerling tries to keep Ansible roles under 100 lines. That can be a challenge, but I agree with Jeff.
Whenever I open up somebody else’ Ansible role and the code keeps on scrolling, I tend to get demotivated:
- Where can you find the error/issue/bug?
- How can this be maintained?
- There is probaly no easy way to test this.
- The code does many things and misses focus.
Cleanup your playbook
Another reason to put code in Ansible roles, is to keep your playbook easy to read. A long list of tasks is harder to read than a list of roles.
Take a look at this example:
- name: build the backend server
hosts: backend
become: yes
gather_facts: no
roles:
- buluma.bootstrap
- buluma.update
- buluma.common
- buluma.python_pip
- buluma.php
- buluma.mysql
- buluma.phpmyadmin
This code is simple to read, anybody could have an understanding what it does.
Some roles can have variables to change the installation, imagine this set of variables:
The role can assert variables, for example:
- name: test input
assert:
that:
- httpd_port <= 65535
- httpd_port >= 1
Check yourself
To verify that you’ve made the right decision:
Could you publish this role?
That means you did not put data in the role, except sane defaults.
Would anybody else be helped with your role?
That means you thought about the interface (defaults/main.yml
).
Is there a simple way to test your role?
That means the role is focused and can do just a few things.
Was it easy to think of the title?
That means you knew what you were building.
Conclusion
Hope this helps you decide when a role is a role.
Testing CVE 2018-19788 with Ansible
So a very simple exploit on polkit has been found. There is not solution so far.
To test if your system is vulnerable, you can run this Ansible role.
A simple playbook that includes a few roles:
---
- name: test cve 2018 19788
hosts: all
gather_facts: no
become: yes
roles:
- buluma.bootstrap
- buluma.update
- buluma.cve_2018_19788
And a piece of altered-for-readability code from the role:
- name: create a user
user:
name: cve_2018_19788
uid: 2147483659
- name: execute a systemctl command as root
service:
name: chronyd
state: started
In my tests these were the results: (snipped, only kept the interesting part)
TASK [ansible-role-cve_2018_19788 : test if user can manage service] ***********
ok: [cve-2018-19788-debian] => {
"changed": false,
"msg": "All assertions passed"
}
fatal: [cve-2018-19788-ubuntu-16]: FAILED! => {
"assertion": "not execute_user.changed",
"changed": false,
"evaluated_to": false,
"msg": "users can manage services"
}
...ignoring
fatal: [cve-2018-19788-ubuntu-18]: FAILED! => {
"assertion": "not execute_user.changed",
"changed": false,
"evaluated_to": false,
"msg": "users can manage services"
}
...ignoring
fatal: [cve-2018-19788-ubuntu-17]: FAILED! => {
"assertion": "not execute_user.changed",
"changed": false,
"evaluated_to": false,
"msg": "users can manage services"
}
...ignoring
fatal: [cve-2018-19788-fedora]: FAILED! => {
"assertion": "not execute_user.changed",
"changed": false,
"evaluated_to": false,
"msg": "users can manage services"
}
...ignoring
fatal: [cve-2018-19788-centos-7]: FAILED! => {
"assertion": "not execute_user.changed",
"changed": false,
"evaluated_to": false,
"msg": "users can manage services"
}
...ignoring
ok: [cve-2018-19788-centos-6] => {
"changed": false,
"msg": "All assertions passed"
}
So for now these distributions seem vulnerable, even after an update:
- Ubuntu 16
- Ubuntu 17
- Ubuntu 18
- Fedora 28
- Fedora 29
- CentOS 7
Ansible on Fedora 30.
Fedora 30 (currently under development as rawhide) does not have python2-dnf anymore.
The Ansible module dnf tries to install python2-dnf if it running on a python2 environment. It took me quite some time to figure out why this error appeared:
fatal: [bootstrap-fedora-rawhide]: FAILED! => {"attempts": 10, "changed": true, "msg": "non-zero return code", "rc": 1, "stderr": "Error: Unable to find a match\n", "stderr_lines": ["Error: Unable to find a match"], "stdout": "Last metadata expiration check: 0:01:33 ago on Thu Nov 29 20:16:32 2018.\nNo match for argument: python2-dnf\n", "stdout_lines": ["Last metadata expiration check: 0:01:33 ago on Thu Nov 29 20:16:32 2018.", "No match for argument: python2-dnf"]}
(I was not trying to install python2-dnf, so confusion…)
Hm; so I’ve tried these options to work around the problem:
- Use alternatives to set /usr/bin/python to /usr/bin/python3. Does not work, the Ansible module dnf will still try to install python2-dnf.
- Set
ansible_python_interpreter
for Fedora-30 hosts. Does not work, my bootstrap role does not have any facts, it does not know about ansible_distribution
(Fedora
), nor ansible_distribution_major_version
(30
).
so far the only reasonable option is to set ansible_python_interpreter
as documented by Ansible.
provisioner:
name: ansible
inventory:
group_vars:
all:
ansible_python_interpreter: /usr/bin/python3
This means all roles that use distributions that:
- use dnf
- don’t have python2-dnf
will need to be modified… Quite a change.
2 December 2018 update: I’ve created pull request 49202 to fix issue 49362.
TL;DR On Fedora 30 (and higher) you have to set ansible_python_interpreter
to /usr/bin/python3
.
Ansible Molecule testing on EC2
Molecule is great to test Ansible roles, but testing locally with has it’s limitations:
- Docker - Not everything is possible in Docker, like starting services, rebooting of working with block devices.
- Vagrant - Nearly everything is possible, but it’s resource intensive, making testing slow.
I use my bus-ride time to develop Ansible Roles and the internet connection is limited, which means a lot of waiting. Using AWS EC2 would solve a lot of problems for me.
Here is how to add an EC2 scenario to an existing role.
Save AWS credentials
Edit ~/.aws/credentials
using information downloaded from [AWS Console].
[default]
aws_access_key_id=ABC123
aws_secret_access_key=ABC123
On the node where you initiate the tests, a few extra pip modules are required.
Add a scenario
If you already have a role and want to add a single scenario:
cd ansible-role-your-role
molecule init scenario --driver-name ec2 --role-name ansible-role-your-role --scenario-name ec2
Start testing
And simply start testing in a certain region.
export EC2_REGION=eu-central-1
molecule test --scenario-name ec2
The molecule.yml should look something like this:
---
dependency:
name: galaxy
driver:
name: ec2
lint:
name: yamllint
platforms:
- name: rhel-7
image: ami-c86c3f23
instance_type: t2.micro
vpc_subnet_id: subnet-0e688067
- name: sles-15
image: ami-0a1886cf45f944eb1
instance_type: t2.micro
vpc_subnet_id: subnet-0e688067
- name: amazon-linux-2
image: ami-02ea8f348fa28c108
instance_type: t2.micro
vpc_subnet_id: subnet-0e688067
provisioner:
name: ansible
lint:
name: ansible-lint
scenario:
name: ec2
Weirdness
It feels as if the ec2 driver has had a little less attention as for example the vagrant or docker driver. Here are some strange things:
- The region needs to be set using an environment variable, the credentials from a file. This may be my mistake, but now it’s a little strange. It would feel more logical to add
region:
to the platform
section.
- The
vpc_subnet_id
should be found by the user and put into molecule.yml
.
Molecule and ARA
To test playbooks, molecule is really great. And since Ansible Fest 2018 (Austin, Texas) clearly communicated that Molecule will be a part of Ansible, I guess it’s safe to say that retr0h’s tool will be here to stay.
When testing, it’s even nicer to have great reports. That’s where ARA comes in. ARA collects job output as a callback_plugin, saves it and is able to display it.
Here is how to set it up.
Install molecule
Install ara
Start ara
Edit molecule.yml, under provisioner:
provisioner:
name: ansible
config_options:
defaults:
callback_plugins: /usr/lib/python2.7/site-packages/ara/plugins/callbacks
Now point your browser to http://localhost:9191/ and run a molecule test:
[204] Lines should be no longer than 120 chars
It seems Galaxy is going to use galaxy-lint-rules to star roles.
One of the controls tests the length of the lines. Here are a few way so pass those rules.
Spread over lines
In YAML you can use multi line to spread long lines.
Without new lines
The >
character replaces newlines by spaces.
- name: demostrate something
debug:
msg: >
This will just
be a single long
line.
With new lines
The |
character keeps newlines.
- name: demostrate something
debug:
msg: |
The following lines
will be spread over
multiple lines.
Move long lines to vars
Sometimes variables can get very long. You can save a longer variable in a shorter one.
For example, too long would be this task in main.yml:
- name: unarchive zabbix schema
command: gunzip /usr/share/doc/zabbix-server-{{ zabbix_server_type }}-{{ zabbix_version_major }}.{{ zabbix_version_minor }}/create.sql.gz
Copy-paste that command to vars/main.yml:
gunzip_command: "gunzip /usr/share/doc/zabbix-server-{{ zabbix_server_type }}-{{ zabbix_version_major }}.{{ zabbix_version_minor }}/create.sql.gz"
And change main.yml to simply:
- name: unarchive zabbix schema
command: "{{ gunzip_command }}"
Conclusion
Yes it’s annoying to have a limitation like this, but it does make the code more readable and it’s not difficult to change your roles to get 5 stars.
Ansible roles for clusters
Ansible can be used to configure clusters. It’s actually quite easy!
Typically a cluster has some master/primary/active node, where stuff needs to be done and other stuff needs to be done on the rest of the nodes.
Ansible can use run_once: yes
on a task, which “automatically” selects a primary node. Take this example:
inventory:
tasks/main.yml:
- name: do something on all nodes
package:
name: screen
state: present
- name: select the master/primary/active node
set_fact:
master: ""
run_once: yes
- name: do something to the master only
command: id
when:
- inventory_hostname == master
- name: do something on the rest of the nodes
command: id
when:
- inventory_hostname != master
It’s a simple and understandable solution. You can even tell Ansible that you would like to pin a master:
- name: select the master/primary/active node
set_fact:
master: ""
run_once: yes
when:
- master is not defined
In the example above, if you set “master” somewhere, a user can choose to set a master instead of “random” selection.
Hope it helps you!
Ansible Galaxy Lint
Galaxy currently is a dumping place for Ansible roles, anybody can submit any quality role there and it’s kept indefinitely.
For example Nginx is listed 1122 times. Happily Jeff Geerling’s role shows up among the top list, probably because it has the most downloads.
The Galaxy team has decided that checking for quality is one way to improve search results. It looks liek roles will have a few criterea:
The rules are stored in galaxy-lint-roles. So far Andrew, House and Robert have contributed, feel free to propose new rules or improvements!
You can prepare your roles:
cd directory/to/save/the/rules
git clone https://github.com/ansible/galaxy-lint-rules.git
cd directory/to/your/role
ansible-lint -r directory/to/save/the/rules/galaxy-lint-rules/rules .
I’ve removed quite a few errors by using these rules:
- Missing spaces after {{ or before }}.
- Comparing booleans using
== yes
.
- meta/main.yml mistakes
You can peek how your roles are scored on development.
Ansible 2.7
As announced, Ansible 2.7 is out. The changed look good, I’m testing my bootstrap role against it.
In 2.7 (actually since 2.3) all package modules don’t need with_items:
or loop:
anymore. This make for simpler code.
- name: customize machine
hosts: all
vars:
packages:
- bash
- screen
- lsof
tasks:
- name: install packages
package:
name: "{{ packages }}"
state: present
Wow, that’s simpler so better.
A reboot module has been introduced. Rebooting in Ansible is not easy, so this could make life much simpler.