Tag Archives: OpenStack

OpenStack Unit Testing: a Long Journey Starts With a Small Step

Alessio Ababilov

This time I would like to talk about OpenStack unit tests we were working on at Grid Dynamics.

First and furthermost, writing unit tests for OpenStack is a tar pit: if you have started, you cannot resist. The task is really difficult by itself; moreover, creating tests for such quickly developed project is like a life on a volcano.

Our task seemed to be quite trivial. We have a set of projects with different test coverage. We have to cover them at least for 80%. Python coverage utility is at our service.

I have introduced the following infrastructure. We publish our patches on Gerrit that replicates its repositories to another server for easy recovering. Jenkins monitors Gerrit and recalculates coverage after every patch publication. Coverage for each project is gathered to table.

I hoped that coverage calculation is not a big deal, but I was too optimistic. Imagine that we simply use run_tests.sh script that is available for almost all the projects. Than we will face some problems.

  1. One does not simply use run_tests.sh. It installs a fresh virtual environment for Python packages using corresponding pip-requires and test-requires files. This environment lacks some necessary packages (e.g., iso8601) and its eventlet is incompatible with Python 2.6 that we use in CentOS. So, I had to prepare a virtual environment with a custom script. So does OpenStack Jenkins.
  2. It usually takes a lot of time to download and compile all Python dependencies. It was too long even for a Gentoo user, so I cached the environment using a common env for all the packages.
  3. pip is not a reliable utility. During package installation, it is used to erasing all package data except for metainformation. The last point means that the package is considered to be installed while its files are gone, so, you have to uninstall it and install again.

Well, now we have a shiny virtual environment. It’s time to run tests and calculate coverage. Currently, OpenStack unit tests for Nova, Glance, Quantum, and Keystone are slow and run about 10-20 minutes on our testing server. There is a good idea to migrate for nosetests to testr: this will allow to run the tests in parallel (the thing is that nosetests parallelization is incompatible with eventlet/greenlet widely used in OpenStack). Nova already uses testr, but updating other packages is a laborious task. Some tests depend on each other and cannot be run in parallel, here is an example of one such problem I’ve detected and fixed: https://bugs.launchpad.net/quantum/+bug/1125951.

I have used incremental testing to increase the speed. I cache .coverage file that accumulates statistics for previous testing and run only those tests that were added or updated. The latter are determined using git diff --name-only command. Also, incremental testing solves another problem: summarize coverage for all commits for given project. These commits are siblings on commit tree, so, you cannot checkout to the latest one and run all published tests that are not accepted yet.

OpenStack includes the Oslo project – a set of python libraries shared by other projects. Unfortunately, Oslo is not a true library now: its code it copied with a special script to other projects. Thus our coverage statistics would be incorrect if we include uncovered Oslo code to common report. So, I had to implement a blueprint to fix this issue.

All preparations are done, now we are ready for testing!

Our testing process differs from one used in OpenStack Jenkins. It happens that upstream tests fail just because they were not run during verification, but sometimes they pass somehow in OpenStack jobs and fail in my environment. Here is an example: Quantum’s test_policy passes in https://review.openstack.org/#/c/21091/, but they should fail because one necessary configuration parameter is not imported properly (my fix: https://review.openstack.org/#/c/21205/).

So, now OpenStack community has a new team that looks at unit tests from slightly different point of view detecting problems and fixing them.

fping Support in OpenStack

OpenStack is very good at launching virtual machines – that’s its purpose, isn’t it? But usually you want to monitor state of you machines somehow, and there are many reasonable ways.

  1. You can test daemons running on the machine, e.g., check up open ports or poll known services. Of course, this approach means that you know exactly what services should be running – and this is the most precise way to test system health.
  2. You can ask hypervisor if the machine is ok. That’s a very dirty check since hypervisor will likely report that VM is active while its operating system kernel can encounter problems.
  3. A compromise settlement may be pinging the machine. It’s a general solution since a lot of VMs respond to ping normally. Sure, VM can ignore ping or its daemons can have problems while host is responding to ping, but this solution is far easier to implement then check each machine according to an individual plan.

Let’s concentrate on the last two approaches. I would like to launch a machine and check it.

[root@a001 ~]# nova image-list
+--------------------------------------+--------------+--------+--------+
| ID                                   | Name         | Status | Server |
+--------------------------------------+--------------+--------+--------+
| 960dc70a-3e0e-496a-b8da-0e9cd91d3a44 | selenium-img | ACTIVE |        |
+--------------------------------------+--------------+--------+--------+
[root@a001 ~]# nova boot --flavor m1.small --image 960dc70a-3e0e-496a-b8da-0e9cd91d3a44 selenium-0
...
[root@a001 ~]# nova list
+--------------------------------------+-------------------+--------+-------------------------+
| ID                                   | Name              | Status | Networks                |
+--------------------------------------+-------------------+--------+-------------------------+
| a9060a07-d32a-4dcf-8387-1c7d69f897dc | selenium-0        | ACTIVE | selenium-net=10.109.0.4 |
+--------------------------------------+-------------------+--------+-------------------------+
[root@a001 ~]# fping 10.109.0.4
10.109.0.4 is unreachable

As you can see, VM status is reported as active, but the machine has not booted really. Even more, consider a damaged image (I use a text file for this purpose):

[root@a001 ~]# glance index 
ID                                   Name                           Disk Format          Container Format     Size          
------------------------------------ ------------------------------ -------------------- -------------------- --------------
7d8007fe-a63c-4d02-8edf-a6cc19fa1d73 text                           qcow2                ovf                           17043
[root@a001 ~]# nova boot --flavor m1.small --image 7d8007fe-a63c-4d02-8edf-a6cc19fa1d73 text-0
[root@a001 ~]# nova list
+--------------------------------------+-------------------+--------+-------------------------+
| ID                                   | Name              | Status | Networks                |
+--------------------------------------+-------------------+--------+-------------------------+
| a9060a07-d32a-4dcf-8387-1c7d69f897dc | selenium-0        | ACTIVE | selenium-net=10.109.0.4 |
| 461e73e4-7f88-4c8f-bb1f-49df9ec18d84 | text-0            | ACTIVE | selenium-net=10.109.0.5 |
+--------------------------------------+-------------------+--------+-------------------------+

Nova bravely reports that the new instance is active, but it obviously is not functioning: a text file is not a disk image with an operating system. And fping reveals that the VM is ill:

[root@a001 ~]# fping 10.109.0.5
10.109.0.5 is unreachable

We can extend nova API adding this fping feature. Nova will run fping for requested instances and report which ones seems to be truly alive. I have developed this extension and it was accepted to Grizzly on November 16, 2012 (https://github.com/openstack/nova/commit/a220aa15b056914df1b9debc95322d01a0e408e8).

fping API is simple and straightforward. We can ask to check all instances or a single one. In fact, we have two API calls.

  1. GET /os-fping/<uuid> – check a single instance.
  2. GET /os-fping?[all_tenants=1]&[include=uuid[,uuid...][&exclude=...] – check all VMs in the current project. If all_tenants is requested, data for all projects is returned (by default, this option is allowed only for admins). include and exclude are parameters specifying VM masks. These parameters are mutually exclusive and exclude is ignored if include is specified. Consider that VM list is VM_all, then if include list is set, the only VM_all * VM_to_include (set intersection) will be tested – thus we can check several instances in a single API call. If exclude list is provided, VM_all -
    VM_to_exclude
    (set difference) will be polled – thus we can skip testing for instances that are not supposed to respond to ping.

fping increases I/O load on nova-api node, so, by default, fping API is limited to 12 calls in an hour (nevertheless it’s a single or several instances poll).

I have added nova fping support to python-novaclient (https://github.com/openstack/python-novaclient/commit/ff69e4d3830f463afa48ca432600224f29a2c238) making easy to write a daemon in Python that will periodically check instance states and send notifications on detected problems. This daemon is available in Grid Dynamics Altai Private Cloud For Developers and is called instance-notifier (https://github.com/altai/instance-notifier). The daemon is installed and configured by Altai installer automatically. Despite Altai 1.0.2 runs Essex, not Grizzly, I have added nova-fping as an additional extension package.

Let’s see how to use fping from client side. We have three instances: selenium-0 (shut off), selenium-1 (up and running), and text (invalid image). Nova reports that they are active:

[root@a001 /]# nova list
+--------------------------------------+-------------------+--------+-------------------------+
| ID                                   | Name              | Status | Networks                |
+--------------------------------------+-------------------+--------+-------------------------+
| a9060a07-d32a-4dcf-8387-1c7d69f897dc | selenium-0        | ACTIVE | selenium-net=10.109.0.4 |
| 20325b87-6858-49df-ab30-795a189dd2ac | selenium-1        | ACTIVE | selenium-net=10.109.0.3 |
| 461e73e4-7f88-4c8f-bb1f-49df9ec18d84 | text-0            | ACTIVE | selenium-net=10.109.0.5 |
+--------------------------------------+-------------------+--------+-------------------------+

Check them with nova fping!

[root@a001 /]# python
Python 2.6.6 (r266:84292, Jun 18 2012, 14:18:47) 
[GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from novaclient.v1_1 import Client
>>> cl = Client("admin", "topsecret", "systenant", "http://localhost:5000/v2.0")
>>> ping_list = cl.fping.list()
>>> ping_list
[<Fping: 461e73e4-7f88-4c8f-bb1f-49df9ec18d84>, <Fping: a9060a07-d32a-4dcf-8387-1c7d69f897dc>, <Fping: 20325b87-6858-49df-ab30-795a189dd2ac>]
>>> import json
>>> print json.dumps([p._info for p in ping_list], indent=4)
[
    {
        "project_id": "4fd17bd4ac834dcf8ba1236368f79986", 
        "id": "461e73e4-7f88-4c8f-bb1f-49df9ec18d84", 
        "alive": false
    }, 
    {
        "project_id": "4fd17bd4ac834dcf8ba1236368f79986", 
        "id": "a9060a07-d32a-4dcf-8387-1c7d69f897dc", 
        "alive": false
    }, 
    {
        "project_id": "4fd17bd4ac834dcf8ba1236368f79986", 
        "id": "20325b87-6858-49df-ab30-795a189dd2ac", 
        "alive": true
    }
]

As expected, nova fping reported that only selenium-1 (id=20325b87-6858-49df-ab30-795a189dd2ac) is really alive.

So, fping in nova is a fast and quite reliable way to check instance health. Like a phonendoscope, it cannot provide full information, but if a human doesn’t respire, he’s likely to be dead.

fping-phonendoscope

LVM disks support in OpenStack Nova Folsom

Recently, we have implemented LVM disk support functionality (based on our work for Diablo) and successfully delivered it to upstream.

New LVM image backend can be enabled by setting libvirt_images_type flag to value lvm.

You must also specify volume group name for VM disks, which is done by setting libvirt_images_volume_group option (for example libvirt_images_volume_group=NovaVG).

Backend supports usual logical volumes with full space allocation. However, it is possible to use sparse logical volumes (which are created with option –virtualsize). In this case, you need specify libvirt_sparse_logical_volumes=True in nova.conf. When this mode is enabled, nova will file warning messages to log on attempt to allocate sparse logical volume with possible size, that exceeds size of volume group used to hold VM disks.

For more information on this functionality please see blueprint.

You can also review code changes here.

RHEL and Centos RPM packages for OpenStack Essex (2012.1) release is out

We are happy to inform you, that we have prepared Essex RPM packages for RHEL and Centos. We tested packages for compatibility with Scientific Linux as well. For Centos you should also use EPEL repository.

RHEL yum repo: http://yum.griddynamics.net/yum/essex/.

Centos yum repo: http://yum.griddynamics.net/yum/essex-centos/

Essex Packages:

  • openstack-nova-essex
  • openstack-glance-essex
  • openstack-keystone-essex
  • openstack-swift-essex
  • openstack-quantum-essex
  • python-quantumclient
  • python-novaclient-essex
Setup instructions for essex (for testing, not for production) :

http://openstack.griddynamics.com/setup_single_essex.html.

We are waiting for your questions/comments at our mailing list: openstack@griddynamics.com.

Grid Dynamics’ GitHub repos are renamed

In order to unify our repo names with openstack GitHub origanization, “osc-robot-” prefix is removed.

So, these repos are renamed:

  • osc-robot-keystone;
  • osc-robot-nova;
  • osc-robot-openstackx;
  • osc-robot-glance;
  • osc-robot-swift;
  • osc-robot-openstack-compute;
  • osc-robot-noVNC;
  • osc-robot-python-novaclient.

Also, we add python-keystoneclient repo (forked from openstack/python-keystoneclient) that contains patches for RedHat packaging.

Improving novaclient CLI. Boot server with specific key.

Before now there was only one way to specify ssh-key when booting a new server: by passing path to the public key file in the local machine (if you don’t specify that path then taking public key file from the /home/.ssh/ directory occurs). That way you couldn’t use keypairs which you have created earlier by nova add-keypair command. But starting from now you can do it!

$ nova boot <server_name> --image <image_id> --flavor <flavor_id> [--key_path [<path_to_the_public_key>], --key_name <keypair_name>]

As you can see you are free to use both methods (old and new) and it is up to you what the way you go.

Here is a sample sequence of commands to work with the new feature:

$ nova keypair-add key1 > private_key_file — save returned private key to the file.

$ chmod 600 private_key_file — only owner shoud have access to the private key (-rw-------).

$ nova boot <server_name> --image 3 --flavor 1 --key_name key1 — boot new server with specified image id, flavor id and also inject previously created private key by its name.

$ nova list – show available servers. Check the status of the newly created server and if it is BUILD (creating in progress) perform this command until the status becomes ACTIVE. After that you can connect to the server by ssh. Let’s say servers’s ip is 172.30.254.4 (You can find out ip from the same table as the status).

$ ssh -i private_key_file root@172.30.254.4 — it is done — you are in the created server’s shell.

You can see sources on

https://github.com/griddynamics/osc-robot-python-novaclient

or binaries for RHEL

http://yum.griddynamics.net/yum/diablo/

and for CentOS

http://yum.griddynamics.net/yum/diablo-centos/