14 May

Manage the Docker Swarm nodes – new Ansible 2.8 modules developed by me!

Ansible 2..8 introduces huge update for Docker Swarm modules

There are several good things about open source projects. But there is one which makes the good projects grow quickly – if you miss a feature and you have some programming skills you can write your extension. Then you can propose the change to get merged into the official repository. This is how my journey as a community Ansible developer started.

Lately, I worked more and more with Docker Swarm clusters and wanted to automate some tasks. Unfortunately, with the Ansible 2.7 there were only three modules available – thedocker_swarm, docker_swarm_serviceanddocker_secret. The tasks I wanted to automate required features as reading the Swarm configuration (this was partially covered by the docker_swarm), node status and change the node configuration. I had three options – use the shell module and execute the CLI command locally on the remote host, wait for someone to provide the module or write the module by myself.

I may not be a professional developer, but I had a great teacher when I started coding in the last class of primary school and some experience in programming. You can’t really be an engineer if you cannot write scripts or simpler apps. I think it is not a surprise I decided to test myself in professional project 😛

In the previous post, I presented the docker_swarm_facts module. I am a co-author there. Let me now show you the Swarm Node related modules fully developed and now maintained by me. All of them will be available in Ansible 2.8 release in next few weeks, but you can test it using the devel branch code on GitHub!

Tell me about my Docker Swarm node…

Before you start changing the configuration of any device, software etc. you really should get and store some information about its current state. To read the essentials about the Docker Swarm node you need to use the docker_node_facts module. The module inherits all default docker modules options like the TLS configuration. You can execute it locally using the Docker socket or remotely using the Docker API. Remember that Docker by default use an only local socket so you need to reconfigure your daemon to support remote management. And one of the most important requirements – you need to run the module on Docker Swarm Manager node.

    - name: Get the swarm node facts
      docker_node_facts:
        self: yes
        docker_host: "tcp://172.10.10.10:2376"
        tls: true

By default, Ansible will open an SSH connection to the remote host and execute the module there connecting to local Docker socket. The host is usually an inventory entry. Using the API requires providing the URL of the management interface as the docker_host parameter. You can execute the module still on the remote host (the same as in docker_host or different), but usually, in such case, the playbook is set with the parameter connection: local and run on the localhost. Understanding this difference is crucial for running any Swarm module and avoid problems.

You also need to tell the module information on which Swarm node you want the manager to return. The default behavior is returning facts about all Swarm nodes. If you know the names of the nodes you are interested in you can provide them as a list in the name parameter. Setting the self: yes option will tell the module you want facts only about the node module communicates with.

Module output matches the output of the docker node inspect CLI command. The nodes key in the returned structure contains an array of dictionaries where each element matches the CLI command. Its structure may vary depending on the version of Docker daemon and Docker API – you need to check the documentation for details.

Changing the node configuration

The docker_node module allows changing some parameters of the Swarm node configuration like the node role or the availability. The first parameter defines if the node is a manager or a worker. Initially, you must define the role when you add a node to the cluster. The availability defines how the swarm cluster will use the node for work distribution. Three allowed states are active (accepting new containers), pause (operating the existing containers but not accepting new ones) or drain (not accepting new containers and existing will restart on another node). The last state is useful during upgrades or when you want to remove a node from the cluster

The docker_node module also allows changing the labels assigned to the Swarm node. Labels are usefull for management and automated work distribution. Labels are the key-value pairs. The default module behavior is merging the labels provided in playbook with those already assigned to node and update the value of a label if it is already defined.

- name: Merge node labels and new labels
  docker_node:
    hostname: mynode
    labels:
      key: value

To override default behavior and replace the labels with new ones you must set labels_state: replace. To remove all assigned labels you also have to use this option, but without providing any new labels.

- name: Remove all labels assigned to node
  docker_node:
    hostname: mynode
    labels_state: replace

If you want to remove specified labels you need to provide the list of label keys in parameter labels_to_remove.

11 Mar

Using Docker Swarm? You gonna love Ansible 2.8!

Ansible 2..8 introduces huge update for Docker Swarm modules

The Ansible is kind of an icon of automation platform. Owned by RedHat but available as a free product on the public license. Developed both by RedHat employees and the community. Docker itself is an icon of containerization. If you use containers you know that automation is the key to simplify management of dockerized infrastructure. In Ansible 2.x many modules covering the docker operations has been introduced, but Docker Swarm was not really covered. However, it is going to change soon! There is a huge update coming with Ansible 2.8 release!

Lately, I started missing some features in Ansible that will allow me to perform some operations on Docker Swarm clusters. And I try to avoid using the command or shell modules as much as possible – running CLI commands on a remote host is like asking for troubles. I decided to fill this gap by myself and as a result, I can say I am now the Ansible community developer, author, and maintainer of a few modules – docker_swarm_facts, docker_node_facts, docker_host_facts, docker_node; and co-author of docker_swarm and the ansible.docker.swarm library.

In next posts, I wanna show you how those modules work. However, if you are looking forward using them I strongly advise you to give them a try now and report any bugs you find so we can fix them before Ansible 2.8 release. You can find them in the devel branch of Ansible repository.

Read More
26 Nov

Conditional parameter value in Ansible playbooks

Juniper automation with Ansible

Ansible playbook is just a list of tasks executed one by one in the order you define them in the playbook code. Using the conditional statements you may skip execution of some tasks, but there is one general rule that should apply to all playbooks you create – keep the number of tasks at the minimum.

Execution of each task takes time, and until you provide additional optimization, Ansible will establish the connection, perform an authentication process and then terminate the connection with a remote device. The longer this process takes, the more time you waste when you execute the playbook. It is a good practice to consolidate the tasks – if you need to perform multiple commands on a remote network device, you should define them as a parameter of one task instead of running them in separate tasks. In rare cases it may lead to unexpected problems like the one I described in my post Automated scripts can send commands faster than RP can process.

Sometimes you need to vary the value of an option provided to the task module. If you need to get an output of a CLI command on Juniper device, you will use the module junos_command which is part of standard Ansible library. Using the display parameter, you can specify if command output will be encoded in XML or JSON format. JSON is a more flexible format, but it is not a supported output format on older JunOS version. If you try to request it but the firmware does not support it your task, and the whole playbook will fail. Most of the developers will create two separate task and the conditional test to check version as a task with the when option. However, let me show you the other, not that well known, way.

Read More

05 Nov

Ansible can’t read some facts from Juniper devices

Juniper automation with Ansible

It is really amazing how fast Ansible is developed lately. Stable versions are released more often and contain more changes required by IT professionals. Many of them fill the gaps between two worlds – the developers and operations engineers. Unfortunately, some modules are not catching up as fast as they should which causes problems in developing simple tasks. I experienced such when I was working on playbook example required for my latest press articles for ‘IT Professional’ magazine. The default Ansible junos_facts module couldn’t correctly read JunOS version on some devices. Usually on devices running the older firmware release. This can be a real problem if some tasks execution depends on the firmware version on the router or switch.

Besides the official modules and lots of roles available on Ansible Galaxy repository many vendors developed their own modules and let them use for free. In many cases, it should be considered a better, more secure approach as long as the vendor repository is still maintained. In my situation it was the easiest workaround of my problem.

Read More
18 Jun

Automated scripts can send commands faster than RP can process

Juniper automation with Ansible

When I was writing one Ansible playbook I faced an interesting situation. We all keep forgetting that automated tasks are executed faster than from CLI. Ane we do not take it into consideration when we write playbooks. Time is only a problem when playbook execution takes to much time.

On Juniper SRX you can enable VPN debugging using the traceoptions feature on IKE and IPSec processes. By default, it stores logs of all configured VPNs on the device in either /var/log/kmd file or one specified in traceoptions configuration. JunOS 11.4R3 brings additional enhancement to limit debugging to single VPN tunnel specified by local and peer IPs. You can turn it on by request security ike debug-enable command. This is very handy because in most cases you want to troubleshoot just the not operational connection.

There is a slight difference between the output of those commands in the log file. If you use the traceoptions you will see pretty much standard Juniper log output

[Mar 25 13:12:12]IPSec SA done callback called for sa-cfg VPN-V201 local:10.0.201.3, remote:10.0.201.4 IKEv1 with status Timed out

If you decide to perform troubleshooting using the second method you will get output like that

[Mar 25 13:20:12][10.0.201.3 <-> 10.0.201.4] IPSec SA done callback called for sa-cfg VPN-V201 local:10.0.201.3, remote:10.0.201.4 IKEv1 with status Timed out

The log now contains both peers IP addresses. It is really handy for my playbook. Because in one of the tasks I need to get specific information from the log output I can easily identify interesting lines using the peer IP addresses.

The Playbook

There are two tasks in my playbook divided by 60 seconds pause. In the first task, I send set of three commands to the router using the juniper_junos_command module here but it works the same in the one available in Ansible. I send three commands to save the time, and also provide the variables as a parameter – I gather them in previous tasks.

    - name: Prepare and start VPN debugging on remote device
      juniper_junos_command:
        commands:
          - "request security ike debug-disable"
          - "clear log VPN"
          - "request security ike debug-enable local {{ sa_tunnel_local_gateway }} remote {{ sa_tunnel_remote_gateway }}"
        format: xml

Then after one minute pause required to generate the logs I run another task where I simply read the log file via CLI command.

- name: Collect VPN logs
  juniper_junos_command:
    commands:
      - "show log VPN"
    format: xml
  register: result_VPN_log_xml

The problem

If you run those commands in this order from CLI you will get logs from troubleshooting the single VPN specified by IP parameters. But if you execute them from Ansible playbook you would not get the peer-specific logs output. It looks like the request command is ignored by the router. If you enable commands logging on the router you will see those from netconf as well. You can also check the debugging status directly from the router CLI. If no peer-specific debugging is running you will get following output

show security ike debug-status
Enabled
flag: all
level: 5

When the debug is running you will get information about its level and the peer’s addresses

show security ike debug-status
Enabled
flag: all
level: 7
Local IP: 10.0.201.3, Remote IP: 10.0.201.4

Because the delivery of command itself was not a problem then one of the workarounds could be a playbook, where I send only one command in each task. This workaround is working but it is not optimal. Remember that connection to the router is opened for each task and closed when the task completes. It takes time and the more command you want to send the longer it takes to finish the playbook.

Don’t send commands to fast

I asked JTAC for help in this case. It happened not to be standard customer problem, it was verified in their lab and at some point, the Engineering team was asked for support. I received the answer I like the best – “it is not a bug, it is a feature”!

Because there is no output when you execute any of the three commands from the first task the Ansible module sends the next command almost without the gap after the previous one. As Junos does not acknowledge that it has finished the first request command, the second comes in and gets ignored as the first one is not finished yet. Remember, you can never issue commands via CLI as fast as you can send them from the script.

I still want to send multiple commands in one task to save time, so here is other way to mittigate the problem

    - name: Prepare and start VPN debugging on remote device
      juniper_junos_command:
        commands:
          - "request security ike debug-disable"
          - "clear log VPN"
          - "show system uptime"
          - "request security ike debug-enable local {{ sa_tunnel_local_gateway }} remote {{ sa_tunnel_remote_gateway }}"
        format: xml

To slow down the task a little I put additional command before the ignored request command. It does not take much time to execute it and it generates the output that I simply ignore.

There is no documentation of how long the gap between commands should be but it would differ depending on platform.

 

06 Jun

Dynamic inventory from GIT on AWX

Blog post title - automation

In many deployments, people do not run the Ansible playbooks directly via command line. In the long term, it is not flexible and cannot provide proper permission granulation as well as trusted code control. So people use Ansible or its free version – AWX. Every playbook, no matter how you run it, needs an inventory definition. Sometimes you use static files with a list of hosts, another time you use the script that dynamically provides the inventory for the playbook. We call it then a dynamic inventory.

If you keep your Ansible project with all scripts and playbooks on GIT repository, you can import it to AWX. You can also maintain dynamic inventory scripts as a part of your project and use them to build AWX inventory.

Read More

13 Apr

Use Ansible to check SNMP on remote device

Best way to learn Ansible and the whole idea of automation is to start from the small playbooks and then grow big. If you first automate simple tasks, even those that may be easier and quicker to perform from the command line, you will learn how Ansible is working. Let’s say, we want to test if SNMP is responding on the remote host (we name it HostA). We will use SNMPv3 and authPriv security model. And of course, we want to write the Ansible playbook and run it on server HostNOC.

Read More

23 Nov

Dynamic VIRL inventory for Ansible playbooks

Ansible is one of the powerful tools providing us an automation of recurring tasks. In the current world, it is impossible to manage infrastructure manually efficiently. Many people still do this but the world has already changed and we need to progress otherwise our business will be cost ineffective. You can provide static inventory – list of the devices where you want to execute the playbook. But in dynamic environments, such as Cisco VIRL simulations you don’t want to edit inventory file manually. That is why I use Python script that will generate Dynamic VIRL inventory for Ansible playbook for me.

Read More