29 Jun

Using Terraform to manage Azure vWAN

Using Terraform to manage Azure vWAN

A few weeks ago I delivered a webinar about the Azure Virtual WAN (vWAN) for the polish audience. The Azure vWAN service is nice and worth mentioning by itself but I wanted to share something different. The way you can automate the deployment of the Azure vWAN using the Terraform. I try to automate all the things I do in the public cloud. It is quite rare that I will perform such a configuration only once. So my natural approach to creating the webinar lab was programming is in the IaaC model. It worked nicely. Since the webinar, I used this code three times already.

If you want to test the vWAN by yourself, or you are interested in how to automate the vWAN management using the Terraform (my favorite tool lately), check the Szkola DevNet’s (my DevNet education project for the polish speaking audience) GitHub, where I put complete Terraform script. Using it, you can create a simple vWAN topology with two hubs, a VPN connection to another VNet, some VNet Peering, and of course, routing.

https://github.com/SzkolaDevNet/Terraform-Azure-vWAN

14 May

Manage the Docker Swarm nodes – new Ansible 2.8 modules developed by me!

Ansible 2..8 introduces huge update for Docker Swarm modules

There are several good things about open source projects. But there is one which makes the good projects grow quickly – if you miss a feature and you have some programming skills you can write your extension. Then you can propose the change to get merged into the official repository. This is how my journey as a community Ansible developer started.

Lately, I worked more and more with Docker Swarm clusters and wanted to automate some tasks. Unfortunately, with the Ansible 2.7 there were only three modules available – thedocker_swarm, docker_swarm_serviceanddocker_secret. The tasks I wanted to automate required features as reading the Swarm configuration (this was partially covered by the docker_swarm), node status and change the node configuration. I had three options – use the shell module and execute the CLI command locally on the remote host, wait for someone to provide the module or write the module by myself.

I may not be a professional developer, but I had a great teacher when I started coding in the last class of primary school and some experience in programming. You can’t really be an engineer if you cannot write scripts or simpler apps. I think it is not a surprise I decided to test myself in professional project 😛

In the previous post, I presented the docker_swarm_facts module. I am a co-author there. Let me now show you the Swarm Node related modules fully developed and now maintained by me. All of them will be available in Ansible 2.8 release in next few weeks, but you can test it using the devel branch code on GitHub!

Tell me about my Docker Swarm node…

Before you start changing the configuration of any device, software etc. you really should get and store some information about its current state. To read the essentials about the Docker Swarm node you need to use the docker_node_facts module. The module inherits all default docker modules options like the TLS configuration. You can execute it locally using the Docker socket or remotely using the Docker API. Remember that Docker by default use an only local socket so you need to reconfigure your daemon to support remote management. And one of the most important requirements – you need to run the module on Docker Swarm Manager node.

    - name: Get the swarm node facts
      docker_node_facts:
        self: yes
        docker_host: "tcp://172.10.10.10:2376"
        tls: true

By default, Ansible will open an SSH connection to the remote host and execute the module there connecting to local Docker socket. The host is usually an inventory entry. Using the API requires providing the URL of the management interface as the docker_host parameter. You can execute the module still on the remote host (the same as in docker_host or different), but usually, in such case, the playbook is set with the parameter connection: local and run on the localhost. Understanding this difference is crucial for running any Swarm module and avoid problems.

You also need to tell the module information on which Swarm node you want the manager to return. The default behavior is returning facts about all Swarm nodes. If you know the names of the nodes you are interested in you can provide them as a list in the name parameter. Setting the self: yes option will tell the module you want facts only about the node module communicates with.

Module output matches the output of the docker node inspect CLI command. The nodes key in the returned structure contains an array of dictionaries where each element matches the CLI command. Its structure may vary depending on the version of Docker daemon and Docker API – you need to check the documentation for details.

Changing the node configuration

The docker_node module allows changing some parameters of the Swarm node configuration like the node role or the availability. The first parameter defines if the node is a manager or a worker. Initially, you must define the role when you add a node to the cluster. The availability defines how the swarm cluster will use the node for work distribution. Three allowed states are active (accepting new containers), pause (operating the existing containers but not accepting new ones) or drain (not accepting new containers and existing will restart on another node). The last state is useful during upgrades or when you want to remove a node from the cluster

The docker_node module also allows changing the labels assigned to the Swarm node. Labels are usefull for management and automated work distribution. Labels are the key-value pairs. The default module behavior is merging the labels provided in playbook with those already assigned to node and update the value of a label if it is already defined.

- name: Merge node labels and new labels
  docker_node:
    hostname: mynode
    labels:
      key: value

To override default behavior and replace the labels with new ones you must set labels_state: replace. To remove all assigned labels you also have to use this option, but without providing any new labels.

- name: Remove all labels assigned to node
  docker_node:
    hostname: mynode
    labels_state: replace

If you want to remove specified labels you need to provide the list of label keys in parameter labels_to_remove.

26 Nov

Conditional parameter value in Ansible playbooks

Juniper automation with Ansible

Ansible playbook is just a list of tasks executed one by one in the order you define them in the playbook code. Using the conditional statements you may skip execution of some tasks, but there is one general rule that should apply to all playbooks you create – keep the number of tasks at the minimum.

Execution of each task takes time, and until you provide additional optimization, Ansible will establish the connection, perform an authentication process and then terminate the connection with a remote device. The longer this process takes, the more time you waste when you execute the playbook. It is a good practice to consolidate the tasks – if you need to perform multiple commands on a remote network device, you should define them as a parameter of one task instead of running them in separate tasks. In rare cases it may lead to unexpected problems like the one I described in my post Automated scripts can send commands faster than RP can process.

Sometimes you need to vary the value of an option provided to the task module. If you need to get an output of a CLI command on Juniper device, you will use the module junos_command which is part of standard Ansible library. Using the display parameter, you can specify if command output will be encoded in XML or JSON format. JSON is a more flexible format, but it is not a supported output format on older JunOS version. If you try to request it but the firmware does not support it your task, and the whole playbook will fail. Most of the developers will create two separate task and the conditional test to check version as a task with the when option. However, let me show you the other, not that well known, way.

Read More

14 Nov

Should vendors ask for a license to enable the API?

My friend let me use one of his older, unused servers in his data center so I have a place to run some virtual machines for my automation projects. As you may expect there is no SLA for this server, no redundancy, limited storage. For me, it is still better to have lab running 24/7 there than using my desktop PC or pay for resources in the public cloud. The server is running ESXi 6.0 hypervisor and have a free license installed. 

The latest release of new VMware modules for Ansible was a trigger to develop a playbook that will let me back up the virtual machine from the datastore on a remote server to my home storage. Quite simple and straightforward idea lead to some unexpected problems and questions – Where should be the limit of features available in free licenses? Should automation be blocked, or at least some tasks, in free features?

Read More
24 Sep

If you build containers Alpine Linux is your friend

This post is related to Docker and automation

Every container image must start from a parent image or base image (the scratch). The parent image is the image you base your image on. The base image is like a completely empty container you need to fill with content. But in most cases, you will use another image as a parent, and you want it to be as minimal as possible. The Alpine Linux is your friend – remember this name and use it as much as possible.

Read More
28 Aug

Let Jenkins build the Jenkins image

Jenkins and Docker - flexible environment

Properly designed and implemented automation system requires its own infrastructure. You need to place components like code and configuration repository, agents responsible for executing the automated tests, acceptance tests software and tool to define automation processes. A leader in the last category is Jenkins. You can use it to automate the building, testing and deployment processes. Many of Jenkins features are available as plugins. In my previous post, I recommended running the infrastructure components as Docker containers. That includes Jenkins as well. You can use official images available on Docker Hub, but very soon you will find those images are missing many components, so you need to make your own. Of course not manually! Let Jenkins build the Jenkins image!

Read More
20 Aug

Run Jenkins in the ​container

Jenkins and Docker - flexible environment

The more you work with automation, the more you will like the containers. They fit and scale correctly in CI/CD model and can be easily managed. The whole infrastructure for automation should be flexible, easy to maintain and extendable – containers fit perfectly into this model. So why not start from putting Jenkins in the container?

Read More
18 Jun

Automated scripts can send commands faster than RP can process

Juniper automation with Ansible

When I was writing one Ansible playbook I faced an interesting situation. We all keep forgetting that automated tasks are executed faster than from CLI. Ane we do not take it into consideration when we write playbooks. Time is only a problem when playbook execution takes to much time.

On Juniper SRX you can enable VPN debugging using the traceoptions feature on IKE and IPSec processes. By default, it stores logs of all configured VPNs on the device in either /var/log/kmd file or one specified in traceoptions configuration. JunOS 11.4R3 brings additional enhancement to limit debugging to single VPN tunnel specified by local and peer IPs. You can turn it on by request security ike debug-enable command. This is very handy because in most cases you want to troubleshoot just the not operational connection.

There is a slight difference between the output of those commands in the log file. If you use the traceoptions you will see pretty much standard Juniper log output

[Mar 25 13:12:12]IPSec SA done callback called for sa-cfg VPN-V201 local:10.0.201.3, remote:10.0.201.4 IKEv1 with status Timed out

If you decide to perform troubleshooting using the second method you will get output like that

[Mar 25 13:20:12][10.0.201.3 <-> 10.0.201.4] IPSec SA done callback called for sa-cfg VPN-V201 local:10.0.201.3, remote:10.0.201.4 IKEv1 with status Timed out

The log now contains both peers IP addresses. It is really handy for my playbook. Because in one of the tasks I need to get specific information from the log output I can easily identify interesting lines using the peer IP addresses.

The Playbook

There are two tasks in my playbook divided by 60 seconds pause. In the first task, I send set of three commands to the router using the juniper_junos_command module here but it works the same in the one available in Ansible. I send three commands to save the time, and also provide the variables as a parameter – I gather them in previous tasks.

    - name: Prepare and start VPN debugging on remote device
      juniper_junos_command:
        commands:
          - "request security ike debug-disable"
          - "clear log VPN"
          - "request security ike debug-enable local {{ sa_tunnel_local_gateway }} remote {{ sa_tunnel_remote_gateway }}"
        format: xml

Then after one minute pause required to generate the logs I run another task where I simply read the log file via CLI command.

- name: Collect VPN logs
  juniper_junos_command:
    commands:
      - "show log VPN"
    format: xml
  register: result_VPN_log_xml

The problem

If you run those commands in this order from CLI you will get logs from troubleshooting the single VPN specified by IP parameters. But if you execute them from Ansible playbook you would not get the peer-specific logs output. It looks like the request command is ignored by the router. If you enable commands logging on the router you will see those from netconf as well. You can also check the debugging status directly from the router CLI. If no peer-specific debugging is running you will get following output

show security ike debug-status
Enabled
flag: all
level: 5

When the debug is running you will get information about its level and the peer’s addresses

show security ike debug-status
Enabled
flag: all
level: 7
Local IP: 10.0.201.3, Remote IP: 10.0.201.4

Because the delivery of command itself was not a problem then one of the workarounds could be a playbook, where I send only one command in each task. This workaround is working but it is not optimal. Remember that connection to the router is opened for each task and closed when the task completes. It takes time and the more command you want to send the longer it takes to finish the playbook.

Don’t send commands to fast

I asked JTAC for help in this case. It happened not to be standard customer problem, it was verified in their lab and at some point, the Engineering team was asked for support. I received the answer I like the best – “it is not a bug, it is a feature”!

Because there is no output when you execute any of the three commands from the first task the Ansible module sends the next command almost without the gap after the previous one. As Junos does not acknowledge that it has finished the first request command, the second comes in and gets ignored as the first one is not finished yet. Remember, you can never issue commands via CLI as fast as you can send them from the script.

I still want to send multiple commands in one task to save time, so here is other way to mittigate the problem

    - name: Prepare and start VPN debugging on remote device
      juniper_junos_command:
        commands:
          - "request security ike debug-disable"
          - "clear log VPN"
          - "show system uptime"
          - "request security ike debug-enable local {{ sa_tunnel_local_gateway }} remote {{ sa_tunnel_remote_gateway }}"
        format: xml

To slow down the task a little I put additional command before the ignored request command. It does not take much time to execute it and it generates the output that I simply ignore.

There is no documentation of how long the gap between commands should be but it would differ depending on platform.

 

06 Jun

Dynamic inventory from GIT on AWX

Blog post title - automation

In many deployments, people do not run the Ansible playbooks directly via command line. In the long term, it is not flexible and cannot provide proper permission granulation as well as trusted code control. So people use Ansible or its free version – AWX. Every playbook, no matter how you run it, needs an inventory definition. Sometimes you use static files with a list of hosts, another time you use the script that dynamically provides the inventory for the playbook. We call it then a dynamic inventory.

If you keep your Ansible project with all scripts and playbooks on GIT repository, you can import it to AWX. You can also maintain dynamic inventory scripts as a part of your project and use them to build AWX inventory.

Read More

26 Apr

Container names and docker-compose

This post is related to Docker and automation

You learn new things every day. Quite often by accident because something is not working as you expect or it just stopped working. This is one of those situations where I learned to be careful how to use container names using docker-compose and how containers refer to each other names. This is very important if they share the same network.

To perform some tests I created a lab where I run Ansible AWX (the free Ansible Tower version) and Netbox IPAM. I decided to use docker-compose because it gave me an easy way to manage the lab. By default, both AWX and Netbox is put in dedicated Network just for the containers of each installation. This means the isolation – the desired way you want to run containers. But in my lab, AWX need to talk to Netbox API to pull inventory. Because it is my lab I did the simplest thing.

Read More