Tag Archives: novaclient

fping Support in OpenStack

OpenStack is very good at launching virtual machines – that’s its purpose, isn’t it? But usually you want to monitor state of you machines somehow, and there are many reasonable ways.

  1. You can test daemons running on the machine, e.g., check up open ports or poll known services. Of course, this approach means that you know exactly what services should be running – and this is the most precise way to test system health.
  2. You can ask hypervisor if the machine is ok. That’s a very dirty check since hypervisor will likely report that VM is active while its operating system kernel can encounter problems.
  3. A compromise settlement may be pinging the machine. It’s a general solution since a lot of VMs respond to ping normally. Sure, VM can ignore ping or its daemons can have problems while host is responding to ping, but this solution is far easier to implement then check each machine according to an individual plan.

Let’s concentrate on the last two approaches. I would like to launch a machine and check it.

[root@a001 ~]# nova image-list
+--------------------------------------+--------------+--------+--------+
| ID                                   | Name         | Status | Server |
+--------------------------------------+--------------+--------+--------+
| 960dc70a-3e0e-496a-b8da-0e9cd91d3a44 | selenium-img | ACTIVE |        |
+--------------------------------------+--------------+--------+--------+
[root@a001 ~]# nova boot --flavor m1.small --image 960dc70a-3e0e-496a-b8da-0e9cd91d3a44 selenium-0
...
[root@a001 ~]# nova list
+--------------------------------------+-------------------+--------+-------------------------+
| ID                                   | Name              | Status | Networks                |
+--------------------------------------+-------------------+--------+-------------------------+
| a9060a07-d32a-4dcf-8387-1c7d69f897dc | selenium-0        | ACTIVE | selenium-net=10.109.0.4 |
+--------------------------------------+-------------------+--------+-------------------------+
[root@a001 ~]# fping 10.109.0.4
10.109.0.4 is unreachable

As you can see, VM status is reported as active, but the machine has not booted really. Even more, consider a damaged image (I use a text file for this purpose):

[root@a001 ~]# glance index 
ID                                   Name                           Disk Format          Container Format     Size          
------------------------------------ ------------------------------ -------------------- -------------------- --------------
7d8007fe-a63c-4d02-8edf-a6cc19fa1d73 text                           qcow2                ovf                           17043
[root@a001 ~]# nova boot --flavor m1.small --image 7d8007fe-a63c-4d02-8edf-a6cc19fa1d73 text-0
[root@a001 ~]# nova list
+--------------------------------------+-------------------+--------+-------------------------+
| ID                                   | Name              | Status | Networks                |
+--------------------------------------+-------------------+--------+-------------------------+
| a9060a07-d32a-4dcf-8387-1c7d69f897dc | selenium-0        | ACTIVE | selenium-net=10.109.0.4 |
| 461e73e4-7f88-4c8f-bb1f-49df9ec18d84 | text-0            | ACTIVE | selenium-net=10.109.0.5 |
+--------------------------------------+-------------------+--------+-------------------------+

Nova bravely reports that the new instance is active, but it obviously is not functioning: a text file is not a disk image with an operating system. And fping reveals that the VM is ill:

[root@a001 ~]# fping 10.109.0.5
10.109.0.5 is unreachable

We can extend nova API adding this fping feature. Nova will run fping for requested instances and report which ones seems to be truly alive. I have developed this extension and it was accepted to Grizzly on November 16, 2012 (https://github.com/openstack/nova/commit/a220aa15b056914df1b9debc95322d01a0e408e8).

fping API is simple and straightforward. We can ask to check all instances or a single one. In fact, we have two API calls.

  1. GET /os-fping/<uuid> – check a single instance.
  2. GET /os-fping?[all_tenants=1]&[include=uuid[,uuid...][&exclude=...] – check all VMs in the current project. If all_tenants is requested, data for all projects is returned (by default, this option is allowed only for admins). include and exclude are parameters specifying VM masks. These parameters are mutually exclusive and exclude is ignored if include is specified. Consider that VM list is VM_all, then if include list is set, the only VM_all * VM_to_include (set intersection) will be tested – thus we can check several instances in a single API call. If exclude list is provided, VM_all -
    VM_to_exclude
    (set difference) will be polled – thus we can skip testing for instances that are not supposed to respond to ping.

fping increases I/O load on nova-api node, so, by default, fping API is limited to 12 calls in an hour (nevertheless it’s a single or several instances poll).

I have added nova fping support to python-novaclient (https://github.com/openstack/python-novaclient/commit/ff69e4d3830f463afa48ca432600224f29a2c238) making easy to write a daemon in Python that will periodically check instance states and send notifications on detected problems. This daemon is available in Grid Dynamics Altai Private Cloud For Developers and is called instance-notifier (https://github.com/altai/instance-notifier). The daemon is installed and configured by Altai installer automatically. Despite Altai 1.0.2 runs Essex, not Grizzly, I have added nova-fping as an additional extension package.

Let’s see how to use fping from client side. We have three instances: selenium-0 (shut off), selenium-1 (up and running), and text (invalid image). Nova reports that they are active:

[root@a001 /]# nova list
+--------------------------------------+-------------------+--------+-------------------------+
| ID                                   | Name              | Status | Networks                |
+--------------------------------------+-------------------+--------+-------------------------+
| a9060a07-d32a-4dcf-8387-1c7d69f897dc | selenium-0        | ACTIVE | selenium-net=10.109.0.4 |
| 20325b87-6858-49df-ab30-795a189dd2ac | selenium-1        | ACTIVE | selenium-net=10.109.0.3 |
| 461e73e4-7f88-4c8f-bb1f-49df9ec18d84 | text-0            | ACTIVE | selenium-net=10.109.0.5 |
+--------------------------------------+-------------------+--------+-------------------------+

Check them with nova fping!

[root@a001 /]# python
Python 2.6.6 (r266:84292, Jun 18 2012, 14:18:47) 
[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from novaclient.v1_1 import Client
>>> cl = Client("admin", "topsecret", "systenant", "http://localhost:5000/v2.0")
>>> ping_list = cl.fping.list()
>>> ping_list
[<Fping: 461e73e4-7f88-4c8f-bb1f-49df9ec18d84>, <Fping: a9060a07-d32a-4dcf-8387-1c7d69f897dc>, <Fping: 20325b87-6858-49df-ab30-795a189dd2ac>]
>>> import json
>>> print json.dumps([p._info for p in ping_list], indent=4)
[
    {
        "project_id": "4fd17bd4ac834dcf8ba1236368f79986", 
        "id": "461e73e4-7f88-4c8f-bb1f-49df9ec18d84", 
        "alive": false
    }, 
    {
        "project_id": "4fd17bd4ac834dcf8ba1236368f79986", 
        "id": "a9060a07-d32a-4dcf-8387-1c7d69f897dc", 
        "alive": false
    }, 
    {
        "project_id": "4fd17bd4ac834dcf8ba1236368f79986", 
        "id": "20325b87-6858-49df-ab30-795a189dd2ac", 
        "alive": true
    }
]

As expected, nova fping reported that only selenium-1 (id=20325b87-6858-49df-ab30-795a189dd2ac) is really alive.

So, fping in nova is a fast and quite reliable way to check instance health. Like a phonendoscope, it cannot provide full information, but if a human doesn’t respire, he’s likely to be dead.

fping-phonendoscope

Advertisements

Improving novaclient CLI. Boot server with specific key.

Before now there was only one way to specify ssh-key when booting a new server: by passing path to the public key file in the local machine (if you don’t specify that path then taking public key file from the /home/.ssh/ directory occurs). That way you couldn’t use keypairs which you have created earlier by nova add-keypair command. But starting from now you can do it!

$ nova boot <server_name> --image <image_id> --flavor <flavor_id> [--key_path [<path_to_the_public_key>], --key_name <keypair_name>]

As you can see you are free to use both methods (old and new) and it is up to you what the way you go.

Here is a sample sequence of commands to work with the new feature:

$ nova keypair-add key1 > private_key_file — save returned private key to the file.

$ chmod 600 private_key_file — only owner shoud have access to the private key (-rw-------).

$ nova boot <server_name> --image 3 --flavor 1 --key_name key1 — boot new server with specified image id, flavor id and also inject previously created private key by its name.

$ nova list – show available servers. Check the status of the newly created server and if it is BUILD (creating in progress) perform this command until the status becomes ACTIVE. After that you can connect to the server by ssh. Let’s say servers’s ip is 172.30.254.4 (You can find out ip from the same table as the status).

$ ssh -i private_key_file root@172.30.254.4 — it is done — you are in the created server’s shell.

You can see sources on

https://github.com/griddynamics/osc-robot-python-novaclient

or binaries for RHEL

http://yum.griddynamics.net/yum/diablo/

and for CentOS

http://yum.griddynamics.net/yum/diablo-centos/