Planet TripleO


February 21, 2017

Juan Antonio Osorio

Deploying a TLS everywhere environment with oooq and an existing FreeIPA server

As an attempt to make the “TLS everywhere” bits more usable and easier for people to try out, I added the deployment steps to tripleo-quickstart.

This currently works if you have an existing FreeIPA server installed somewhere accessible. Note that in this example, the IP is set to ‘’. this is because that’s the value that we use in CI. So use what suits your deployment.

The main things to be added to the configuration are the following:

# Main switch to enable all the workflow
enable_tls_everywhere: true

# Undercloud FQDN

# Hostnames and domain relevant for the overcloud

# Nameservers for both the undercloud and the overcloud
overcloud_dns_servers: [""]
undercloud_undercloud_nameservers: [""]

freeipa_admin_password: FreeIPA4All

  • enable_tls_everywhere: This is the main switch that will enable the whole workflow. It defaults to false.
  • undercloud_undercloud_hostname: This will set the hostname for the undercloud node and will be used in this workflow to create the host principal for the undercloud.
  • The following are the hostnames for the overcloud VIPs. They will be used as the keystone endpoints. Please note that these values are network dependant, and the names should reflect it. The values are these:

    • overcloud_cloud_name
    • overcloud_cloud_name_internal
    • overcloud_cloud_name_storage
    • overcloud_cloud_name_storage_management
    • overcloud_cloud_name_ctlplane
  • overcloud_cloud_domain: This is the domain for the cloud deployment. It will be used for the overcloud nodes, and should match the FreeIPA kerberos realm.
  • overcloud_dns_servers: This is a list of servers that will be used as the nameservers for the overcloud nodes. It gets persisted in the DnsServers parameter in heat.
  • undercloud_undercloud_nameservers: This is a list of servers that will be used as the nameservers for the undercloud node.
  • freeipa_admin_password: This is the password for the admin user of your FreeIPA server.
  • freeipa_server_hostname: The FQDN of your FreeIPA server.

The main things that are added to the deployment workflow are the following:

  • Before installing the undercloud, we install the novajoin package, and use the FreeIPA credentials to set up the necessary permissions/privileges in FreeIPA, as well as create the undercloud service principal.

  • Before uploading the overcloud images to glance, we install a specific version of cloud-init for novajoin to work. This is because the version that’s currently in CentOS has a bug; and the newest version available has dependency issues that doesn’t let Heat software deployments work.

  • It adds the relevant environment files to the overcloud deploy script created by quickstart. These will in turn deploy the overcloud with TLS-everywhere enabled.

In some instances, you might not want to give your FreeIPA credentials to ansible. If this is the case, you’ll need to run the preparation script for novajoin yourself. If you want to do this, you will also need to set up the following flag:

prepare_novajoin: false

February 21, 2017 08:04 AM

February 14, 2017

Ben Nemec

Improving TripleO CI Throughput

If you spend any significant amount of time working on TripleO, you have probably run into the dreaded CI queue. In this case that typically refers to the check-tripleo queue that runs OVB. Why does that queue back up more than the regular check queue and what can/have we done about it? That's what we're going to talk about here.

The Problems

Before we discuss solutions, it's important to understand the problems we face in scaling TripleO's OVB CI. It's a very different beast from the rest of OpenStack CI. Here's a (probably incomplete) list of how:

  • Our test environments are considerably larger than normal. An average TripleO OVB job makes use of 5 VMs. As of this writing, the most that any regular infra job uses is 3, and that's an experimental TripleO job. In general they max out at 2, and most use a single node. A TripleO environment averages around 35 GB of memory (generally our limiting factor) per test environment, as well as a lot of vcpus and disk.
  • Our test environments are also considerably more complex. Those 5 VMs are attached to some combination of 6 different neutron subnets. One of the VMs is also configured as an IPMI server that controls the others. We use Heat to deploy them in order to keep the whole thing manageable. This adds yet another layer of complexity because regular infra doesn't know how to deploy the Heat stacks, so we have to run private infrastructure to handle that.
  • Related to the previous point, out test environments have some unusual requirements on the host cloud. While some work has been done to reduce the number of ways our CI cloud is a snowflake, there are currently just 2 available clouds on which our jobs can run. And one of those is being used by TripleO developers and not available for CI. So we have exactly one cloud of ~30 128 GB compute nodes for TripleO CI.
  • In the interest of maximizing compute capacity, our CI cloud has a single, nonha controller. When a large number of test environments are being created and/or deleted at once, it can overload that controller and cause all kinds of strange (read: broken) behavior.
  • TripleO CI jobs are long. At this time, I estimate we average about 1 hour 50 minutes per job. And I'm happy with that number, if you can believe it. It used to be well above 2 hours. If that sounds bad, keep in mind that we're deploying not one, but two OpenStack clouds in that time. We're also simulating the baremetal deployment part of the process, which no other OpenStack CI jobs do.

Just for some context, at our peak utilization of TripleO CI I've seen as many as 750 jobs being run in a 24 hour period. You can do the math on the number of VMs, memory, and networks that involves. It also means even small regressions in performance can have a huge impact on our daily throughput. A 5 minute regression per job adds up to 62.5 extra hours per day spent running test jobs. The good news is that a 5 minute improvement has the same impact in the positive.

The Solutions

Best strap in tight for this section, because we've been busy.

One useful tool that I want to point out is a simple webapp I wrote to keep an eye on the check-tripleo queue: check-tripleo queue status. It can show other queues as well, but it was specifically designed for the tripleo queues so some things may not make sense elsewhere. It's also designed to be as compact as possible, and it may not be obvious what some of the numbers mean. If there's interest, I can write a more complete post about the tool itself.

There are two main categories of changes that helped our CI throughput: bug fixes and optimizations. I'll start with the bugs that were hurting performance.


  • Five minute delay DHCP'ing isolated nics This has actually bitten us twice. It's a fairly long-standing bug that goes back to at least Mitaka and causes deployments to spend 5 minutes attempting to DHCP nics that will never get a response. Fixing it saved time in every single job we run.
  • IPA image build not skipped even if image already exists This bug crept in when we moved to a YAML-based image build system. There was an issue with the check for existing images that meant even when we could use cached images in CI, we were spending 10 minutes rebuilding the IPA image. This didn't affect every job (some can't use cached images), but it was a big time suck for the ones it did.
  • overcloud-full ramdisk being rebuilt twice Thanks to a recent change, we ended up with two different image elements doing forced ramdisk rebuilds during our image builds. This was a less serious performance hit, but it still saves 1.5-2 minutes per job when we have to build images.


  • Run with actual node count Due to a scheduler race betwee Nova and Ironic, we had previously added an extra node to each test environment so the scheduler could retry when it failed. This no longer seems to be necessary, and removing the extra node freed up around 20% of the resources from each test environment. It also makes environment creation and deletion faster because there is less to do.
  • Disable optional undercloud features in longer jobs The undercloud has grown a lot of new services over the past few cycles, and this has caused it to take an increasingly long time to install. Since we aren't exercising many of these features in CI anyway, there's no point deploying them in all jobs. This is saving around 10 minutes in the ha and updates jobs.
  • Deploy network envs appropriate for the job Not all of our jobs require the full 6 networks I discussed earlier. Since neutron-server is one of the biggest CPU users on the CI cloud controller, reducing the number of ports attached to the VMs was a big win in terms of controller capacity. It also reduces the time to create a test environment by a minute or more for some jobs. And in case that's not enough, this change will also allow us to test with bonded nics in CI.
  • Always use cached images in updates job The updates job is especially painful from a runtime perspective. Not only does it deploy two full clouds, but it also has to update one of them, which takes a significant amount of time as well. Since the updates job is never run in isolation and image builds for it are not job-specific, there's no reason we can't always use cached images. If an image build is broken by a patch it will be caught by one of the other jobs. This can save as much as 30+ minutes in updates jobs.
  • Parallelization wherever possible. There were a few patches related to this, but essentially there are some processes in CI (such as log collection) that were being run in serial. Since our VMs are typically going to be running on different compute nodes, there's really no benefit to that and running those processes in parallel can save significant amounts of time.
  • Use http delorean urls instead of https At some point, our default delorean repos (which is where we get OpenStack and friends) switched to using https by default. While this is good from a security perspective, it's bad from a CI perspective because it means we can't cache those packages. This is both slower and results in more bandwidth wasted on both ends. Note: As of this writing, the problem is only half fixed. Some of our repos force redirect http to https so there's nothing we can do on our end.

And I think this one deserves special notice: Clean up testenv if Jenkins instance goes away Previously we had an issue where test environments were being left around for some time after the job they were attached to had been killed. This can happen, for example, if a new patch set is pushed to a change that has jobs actively running on it. Zuul kills the active jobs on the old patch set and starts new ones on the new patch set. However, before this change we did not immediately clean up the test environment from the killed jobs. This was very problematic and caused us to exceed our capacity in the CI cloud on several occasions. It also meant we couldn't make full use of the capacity at other times because the more jobs we ran the more likely it was that this situation would occur. Since the patch, I have never seen us exceed our configured capacity for jobs, and the problem scenario has occurred 1300 times in the two weeks since the change merged. That's a lot of resources not wasted.

All of these optimizations combined have both reduced our job runtimes and allowed us to run more jobs at once. We've increased our concurrent job limit from 60 to 70, and the CI cloud is still under less load than it was before. We could probably go even higher, but since things are generally under control right now there's no need to push the limit. There's also diminishing returns (more jobs running at once means more load on the compute nodes, which leads to lower performance) and some existing limits in the cloud that would require downtime to change if we go much higher. It could be done if necessary, but so far it hasn't been.

It's also worth noting that the effort to keep the CI queues reasonable is ongoing. Even while we merged the changes discussed above, other changes happened that regressed CI performance. Some because they were adding new things to deploy that take more time, and some for unanticipated reasons. Unfortunately, performance regressions tend to get ignored until they become so painful that jobs time out. This is a bad approach because CI performance affects every developer working on TripleO, and I'm hoping we can do a better job of keeping things in good shape going forward.

And just to drive the previous point home, in the time since I started writing this post and publishing it, we've regressed the ha job performance enough to start causing job failures.

by bnemec at February 14, 2017 07:32 PM

January 31, 2017

Dougal Matthews

Interactive Mistral Workflows over Zaqar

It is possible to do some really nice automation with the Mistral Workflow engine. However, sometimes user input is required or desirable. I set about to write an interactive Mistral Workflow, one that could communicate with a user over Zaqar.

If you are not familiar with Mistral Workflows you may want to start here, here or here.

The Workflow

Okay, this is what I came up with.

version: '2.0'


    - input_queue: "workflow-input"
    - output_queue: "workflow-output"


      action: zaqar.queue_post
      retry: count=5 delay=1
        queue_name: <% $.output_queue %>
          body: "Send some input to '<% $.input_queue %>'"
      on-success: read_user_input

      pause-before: true
      action: zaqar.queue_pop
        queue_name: <% $.input_queue %>
        user_input: <% task(read_user_input).result[0].body %>
      on-success: done

      action: std.echo output=<% $.user_input %>
      action: zaqar.queue_post
      retry: count=5 delay=1
        queue_name: <% $.output_queue %>
          body: "You sent: '<% $.user_input %>'"

Breaking it down...

  1. The Workflow uses two queues one for input and one for output - it would be possible to use the same for both but this seemed simpler.

  2. On the first task, request_user_input, the Workflow sends a Zaqar message to the user requesting a message be sent to the input_queue.

  3. The read_user_input task pauses before it starts, see the pause-before: true. This means we can unpause the Workflow after we send a message. It would be possible to create a loop here that polls for messages, see below for more on this.

  4. After the input is provided, the Workflow must be un-paused manually. It then reads from the queue and sends the message back to the user via the output Zaqar queue.

See it in Action

We can demonstrate the Workflow with just the Mistral client. First you need to save it to a file and use the mistral workflow-create command to upload it.

First we trigger the Workflow execution.

$ mistral execution-create interactive-workflow
| Field             | Value                                |
| ID                | e8e2bfd5-3ae4-46db-9230-ada00a2c0c8c |
| Workflow ID       | bdd1253e-68f8-4cf3-9af0-0957e4a31631 |
| Workflow name     | interactive-workflow                 |
| Description       |                                      |
| Task Execution ID | <none>                               |
| State             | RUNNING                              |
| State info        | None                                 |
| Created at        | 2017-01-31 08:22:17                  |
| Updated at        | 2017-01-31 08:22:17                  |

The Workflow will complete the first task and then move to the PAUSED state before read_user_input. This can be seen with the mistral execution-list command.

In this Workflow we know there will now be a message in Zaqar now. The Mistral action zaqar.queue_pop can be used to receive it...

$ mistral run-action zaqar.queue_pop '{"queue_name": "workflow-output"}'
{"result": [{"body": "Send some input to 'workflow-input'", "age": 4, "queue": {"_metadata": null, "client": null, "_name": "workflow-output"}, "href": null, "ttl": 3600, "_id": "589049397dcad341ecfb72cf"}]}

The JSON is a bit hard to read, but you can see the message body Send some input to 'workflow-input'.

Great. We can do that with another Mistral action...

$ mistral run-action zaqar.queue_post '{"queue_name": "workflow-input", "messages":{"body": {"testing": 123}}}'
{"result": {"resources": ["/v2/queues/workflow-input/messages/589049447dcad341ecfb72d0"]}}

After sending the message back to the requested Workflow we can unpause it. This can be done like this...

$ mistral execution-update -s RUNNING e8e2bfd5-3ae4-46db-9230-ada00a2c0c8c
| Field             | Value                                |
| ID                | e8e2bfd5-3ae4-46db-9230-ada00a2c0c8c |
| Workflow ID       | bdd1253e-68f8-4cf3-9af0-0957e4a31631 |
| Workflow name     | interactive-workflow                 |
| Description       |                                      |
| Task Execution ID | <none>                               |
| State             | RUNNING                              |
| State info        | None                                 |
| Created at        | 2017-01-31 08:22:17                  |
| Updated at        | 2017-01-31 08:22:38                  |

Finally we can confirm it worked by getting a message back from the Workflow...

$ mistral run-action zaqar.queue_pop '{"queue_name": "workflow-output"}'
{"result": [{"body": "You sent: '{u'testing': 123}'", "age": 6, "queue": {"_metadata": null, "client": null, "_name": "workflow-output"}, "href": null, "ttl": 3600, "_id": "5890494f7dcad341ecfb72d1"}]}

You can see a new message is returned which shows the input we sent.


As mentioned above, the main limitation here is that you need to manually unpause the Workflow. It would be nice if there was a way for the Zaqar message to automatically do this.

Polling for messages in the Workflow would be quite easy, with a retry loop and Mistral's continue-on. However, that would be quite resource intensive. If you wanted to do this, a Workflow task like this would probably do the trick.

    action: zaqar.queue_pop
      queue_name: <% $.input_queue %>
    timeout: 14400
      delay: 15
      count: <% $.timeout / 15 %>
      continue-on: <% len(task(wait_for_message).result) > 0 %>

The other limitation is that this Workflow now requires a specific interaction pattern that isn't obvious and documenting it might be a little tricky. However, I think the flexible execution it provides might be worthwhile in some cases.

by Dougal Matthews at January 31, 2017 07:40 AM

January 26, 2017

Ben Nemec

Setting a Root Password on overcloud-full

By default the overcloud-full image built as part of a TripleO deployment does not have a root password set. Sometimes it can be useful to set one, particularly if you're having network setup trouble that prevents you from ssh'ing into the instance after it's deployed. One simple way to do this is as follows:

sudo yum install -y libguestfs-tools
virt-customize -a overcloud-full.qcow2 --root-password password:password
. stackrc
openstack overcloud image upload --update-existing

This will install the necessary tools, set the root password, then upload it to the undercloud Glance. There are no doubt other ways to handle setting a password, but this one is pretty simple and doesn't require rebuilding the image from scratch.

Note that to set a different password, you change the "password" after the :. So the virt-customize call would look like:

virt-customize -a overcloud-full.qcow2 --root-password password:some-other-password


by bnemec at January 26, 2017 06:27 PM

Carlos Camacho

OpenStack and services for BigData applications

Yesterday I had the opportunity of presenting together with Daniel Mellado a brief talk about OpenStack and it’s integration with services to support Big Data tools (OpenStack Sahara).

It was a combined talk for two Meetups MAD-for-OpenStack and Data-Science-Madrid.

The presentation is stored in GitHub.


  • We prepared a 1 hour presentation that had to be presented in 20min.
  • Wasn’t able to have access to our demo server.

by Carlos Camacho at January 26, 2017 12:00 AM

January 25, 2017

Dan Prince

Docker Puppet

Today TripleO leverages Puppet to help configure and manage the deployment of OpenStack services. As we move towards using Docker one of the big questions people have is how will we generate config files for those containers. We'd like to continue to make use of our mature configuration interfaces (Heat parameters, Hieradata overrides, Puppet modules) to allow our operators to seamlessly take the step towards a fully containerized deployment.

With the recently added composable service we've got everything we need. This is how we do it...

Install puppet into our base container image

Turns out the first thing you need of you want to generate config files with Puppet is well... puppet. TripleO uses containers from the Kolla project and by default they do not install Puppet. In the past TripleO uses an 'agent container' to manage the puppet installation requirements. This worked okay for the compute role (a very minimal set of services) but doesn't work as nicely for the broader set of OpenStack services because packages need to be pre-installed into the 'agent' container in order for config file generation to work correctly (puppet overlays the default config files in many cases). Installing packages for all of OpenStack and its requirements into the agent container isn't ideal.

Enter TripleO composable services (thanks Newton!). TripleO now supports composability and Kolla typically has individual containers for each service so it turns out the best way to generate config files for a specific service is to use the container for the service itself. We do this in two separate runs of a container: one to create config files, and the second one to launch the service (bind mounting/copying in the configs). It works really well.

But we still have the issue of how do we get puppet into all of our Kolla containers. We were happy to discover that Kolla supports a template-overrides mechanism (A jinja template) that allows you to customize how containers are built. This is how you can use that mechanism to add puppet into the Centos base image used for all the OpenStack docker containers generated by Kolla build scripts.

$ cat template-overrides.j2
{% extends parent_template %}
{% set base_centos_binary_packages_append = ['puppet'] %}

kolla-build --base centos --template-override template-overrides.j2

Control the Puppet catalog

A puppet manifest in TripleO can do a lot of things like installing packages, configuring files, starting a service, etc. For containers we only want to generate the config files. Furthermore we'd like to do this without having to change our puppet modules.

One mechanism we use is the --tags option for 'puppet apply'. This option allows you to specify which resources within a given puppet manifest (or catalog) should be executed. It works really nicely to allow you to select what you want out of a puppet catalog.

An example of this is listed below where we have a manifest to create a '/tmp/foo' file. When we run the manifest with the 'package' tag (telling it to only install packages) it does nothing at all.

$ cat test.pp 
file { '/tmp/foo':
  content => 'bar',
$ puppet apply --tags package test.pp
Notice: Compiled catalog for undercloud.localhost in environment production in 0.10 seconds
Notice: Applied catalog in 0.02 seconds
$ cat /tmp/foo
cat: /tmp/foo: No such file or directory

When --tags doesn't work

The --tags option of 'puppet apply' doesn't always give us the behavior we are after which is to generate only config files. Some puppet modules have custom resources with providers that can execute commands anyway. This might be a mysql query or an openstackclient command to create a keystone endpoint. Remember here that we are trying to re-use puppet modules from our baremetal configuration and these resources are expected to be in our manifests... we just don't want them to run at the time we are generating config files. So we need an alternative mechanism to suppress (noop out) these offending resources.

To do this we've started using a custom built noop_resource function that exists in puppet-tripleo. This function dynamically configures a default provider for the named resource. For mysql this ends up looking like this:

['Mysql_datadir', 'Mysql_user', 'Mysql_database', 'Mysql_grant', 'Mysql_plugin'].each |String $val| { noop_resource($val) }

Running a puppet manifest with this at the top will noop out any of the named resource types and they won't execute. Thus allowing puppet apply to complete and finish generating the config files within the specified manifest.

The good news is most of our services don't require the noop_resource in order to generate config files cleanly. But for those that do the interface allows us to effectively disable the resources we don't want to execute.

Putting it all together:

Bringing everything together in tripleo-heat-templates to create one container configuration interface that will allow us to configurably generate per-service config files. It looks like this:

  • manifest: the puppet manifest to use to generate config files (Thanks to composable services this is now per service!)
  • puppet_tags: the puppet tags to execute within this manifest
  • config_image: the docker image to use to generate config files. Generally we use the same image as the service itself.
  • config_volume: where to output the resulting config tree (includes /etc/ and some other directories).

And then we've created a custom tool to drive this per-service configuration called The tool supports using the information above in a Json file format drive generation of the config files in a single action.

It ends up working like this:

Video demo: Docker Puppet

And thats it. Our config interfaces are intact. We generate the config files we need. And we get to carry on with our efforts to deploy with containers.


by Dan Prince at January 25, 2017 02:00 PM

January 23, 2017

James Slagle

Update on TripleO with already provisioned servers

In a previous post, I talked about using TripleO with already deployed and provisioned servers. Since that was published, TripleO has made a lot of progress in this area. I figured it was about time for an update on where the project is with this feature.

Throughout the Ocata cycle, I’ve had the chance to help make this feature more
mature and easier to consume for production deployments.

Perhaps most importantly, for pulling their deployment metadata from Heat, the servers are now configured to use a Swift Temporary URL instead of having to rely on a Keystone username and password.

Also, instead of having to bootstrap the servers with all the expected packages
and initial configuration that TripleO typically expects from instances that it
has deployed from pre-built images, you can now start with a basic CentOS image
installed with only the initial python-heat-agent packages and the agent

There have also been other bug fixes and enhancements to enable this to work
with things such as network isolation and fixed predictable IP’s for all

I’ve started on some documentation that shows how to use this feature for
TripleO deployments:
The documentation is still in progress, but I invite people to give it a try
and let me know how it works.

Using this feature, I’ve been able to deploy an Overcloud on 4 servers in a
remote lab from a virtualized Undercloud running in an entirely different lab.
There’s no L2 provisioning network connecting the 2 labs, and I don’t have
access to run a DHCP server on it anyway. The 4 Overcloud servers were
initially provisioned with the existing lab provisioning system

This flexibility helps build upon the composable nature of the
tripleo-heat-templates framework that we’ve been developing in TripleO
in that it allows integration with already existing provisioning environments.

Additionally, we’ve been using this capability extensively in our
Continuous Integration tests. Since TripleO does not have to be responsible for
provisioning the initial operating system on instances, we’ve been able to make
use of virtual instances provided by the OpenStack Infra project and
their managed Nodepool instance.

Like all other OpenStack CI jobs running in the standard check and gate queues,
our jobs are spread across several redundant OpenStack clouds. That means we
have a lot more virtual compute capacity for running tests than we previously
had available.

We’ve further been able to define job definitions using 2, 3, and 4 nodes in
the same test. These multinode tests, and the increased capacity, allow us to
test different deployment scenarios such as customized composable roles, and
recently, a job upgrading from the previous OpenStack release all the way to

We’ve also scaled out our testing using scenario tests. Scenario tests allow us
to run a test with a specific configuration based on which files are actually
modified by the patch being tested. This allows the project to make
sure that patches affecting a given service are actually tested, since a
scenario test will be triggered deploying that service. This is important to
scaling our CI testing, because it’s unrealistic to expect to be able to deploy
every possible OpenStack service and test that it can be initially deployed, is
functional, and can be upgraded on every single TripleO patch.

If this is something you try out and have any feedback, I’d love to hear it and
see how we could improve this feature and make it easier to use.

by slagle at January 23, 2017 02:05 PM

January 16, 2017

Carlos Camacho

TripleO deep dive session #7 (Undercloud - TripleO UI)

This is the seven release of the TripleO “Deep Dive” sessions

In this session Liz Blanchard and Ana Krivokapic will give us some bits about how to contribute to the TripleO UI project. Once checking this session we will have a general overview about the project’s history, properties, architecture and contributing steps.

So please, check the full session content on the TripleO YouTube channel.

Here you will be able to see a quick overview about how to install the UI as a development environment.

The summarized steps are also available in this blog post.

Sessions index:

    * TripleO deep dive #1 (Quickstart deployment)

    * TripleO deep dive #2 (TripleO Heat Templates)

    * TripleO deep dive #3 (Overcloud deployment debugging)

    * TripleO deep dive #4 (Puppet modules)

    * TripleO deep dive #5 (Undercloud - Under the hood)

    * TripleO deep dive #6 (Overcloud - Physical network)

    * TripleO deep dive #7 (Undercloud - TripleO UI)

by Carlos Camacho at January 16, 2017 04:00 PM

January 12, 2017

Dougal Matthews

Calling Ansible from Mistral Workflows

I have spoken with a few people that were interested in calling Ansible from Mistral Workflows.

I finally got around to trying to make this happen. All that was needed was a very small and simple custom action that I put together, uploaded to github and also published to pypi.

Here is an example of a simple Mistral Workflow that makes use of these new actions.

version: 2.0

  type: direct
      action: ansible-playbook
        playbook: path/to/playbook.yaml

Installing and getting started with this action is fairly simple. This is how I done it in my TripleO undercloud.

sudo pip install mistral-ansible-actions;
sudo mistral-db-manage populate;
sudo systemctrl restart openstack-mistral*;

There is one gotcha that might be confusing. The Mistral Workflow runs as the mistral user, this means that the user needs permission to access the Ansible playbook files.

After you have installed the custom actions, you can test it with the Mistral CLI. The first command should work without anything extra setup, the second requires you to create a playbook somewhere and provide access.

mistral run-action ansible '{"hosts": "localhost", "module": "setup"}'
mistral run-action ansible-playbook '{"playbook": "path/to/playbook.yaml"}'

The action supports a few other input parameters, they are all listed for now in the README in the git repo. This is a very young project, but I am curious to know if people find it useful and what other features it would need.

If you want to write custom actions, check out the Mistral documentation.

by Dougal Matthews at January 12, 2017 02:20 PM

December 16, 2016

Giulio Fidente

TripleO to deploy Ceph standlone

Here is a nice Christmas present: you can use TripleO for a standalone Ceph deployment, with just a few lines of YAML. Assuming you have an undercloud ready for a new overcloud, create an environment file like the following:

  OS::TripleO::Services::CephMon: /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-mon.yaml
  OS::TripleO::Services::CephOSD: /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-osd.yaml

    - OS::TripleO::Services::CephMon
    - OS::TripleO::Services::CephOSD

and launch a deployment with:

openstack overcloud deploy --compute-scale 0 --ceph-storage-scale 1 -e the_above_env_file.yaml

The two lines from the environment file in resource_registry are mapping (and enabling) the CephMon and CephOSD services in TripleO while the lines in parameters are defining which services should be deployed on the controller and cephstorage roles.

This will bring up a two nodes overcloud with one node running ceph-mon and the other ceph-osd but the actual Christmas gift is that it implicitly provides and allows usage of all the features we already know about TripleO, like:

  • baremetal provisioning
  • network isolation
  • a web GUI
  • lifecycle management
  • ... containers
  • ... upgrades

For example, you can scale up the Ceph cluster with:

openstack overcloud deploy --compute-scale 0 --ceph-storage-scale 2 -e the_above_env_file.yaml

and this will provision a new Ironic node with the cephstorage role, configuring the required networks on it and updating the cluster config for the new OSDs. (Note the --ceph-storage-scale parameter going from 1 to 2 in the second example).

Even more interestingly is that the above will work for any service, not just Ceph, and new services can be added to TripleO with just some YAML and puppet, letting TripleO take care of a number of common issues in any deployment tool, for example:

  • supports multinode deployments
  • synchronizes and orders the deployment steps across different nodes
  • supports propagation of config data across different services

Time to try it and join the fun in #tripleo :)

by Giulio Fidente at December 16, 2016 10:00 PM

December 15, 2016

Julie Pichon

A Quick Introduction to Mistral Usage in TripleO (Newton) | For developers

Since Newton, Mistral has become a central component to the TripleO project, handling many of the operations in the back-end. I recently gave a short crash course on Mistral, what it is and how we use it to a few people and thought it might be useful to share some of my bag of tricks here as well.

What is Mistral?

It's a workflow service. You describe what you want as a series of steps (tasks) in YAML, and it will coordinate things for you, usually asynchronously.

Link: Mistral overview.

We are using it for a few reasons:

  • it lets us manage long-running processes (e.g. introspection) and track their state
  • it acts a common interface/API, that is currently used by both the TripleO CLI and UI thus avoiding duplication, and can also be consumed directly by external non-OpenStack consumers (e.g. ManageIQ).


A workbook contains multiple workflows. (The TripleO workbooks live at

A workflow contains a series of 'tasks' which can be thought of as steps. We use the default 'direct' type of workflow on TripleO, which means tasks are executed in the order written, moving around based on the on-success and on-error values.

Every task calls to an action (or to another workflow), which is where the work actually gets done.

OpenStack services are automatically mapped into actions thanks to the mappings defined in Mistral, so we get a ton of actions for free already.

Useful tip: with the following commands you can see locally which actions are available, for a given project.

$ mistral action-list | grep $projectname

You can of course create your own actions. Which we do. Quite a lot.

$ mistral action-list | grep tripleo

An execution is what an instance of a running workflow is called, once you started one.

Link: Mistral terminology (very detailed, with diagrams and examples).

Where the TripleO Mistral workflows live

Let's look at a couple of examples.

A short one to start with, scaling down

It takes some input, starts with the 'delete_node' task and continues on to on-success or on-error depending on the action result.

Note: You can see we always end the workflow with send_message, which is a convention we use in the project. Even if an action failed and moves to on-error, the workflow itself should be successful (a failed workflow would indicate a problem at the Mistral level). We end with send_message because we want to let the caller know what was the result.

How will the consumer get to that result? We associate every workflow with a Zaqar queue. This is a TripleO convention, not a Mistral requirement. Each of our workflow takes a queue_name as input, and the clients are expected to listen to the Zaqar socket for that queue in order to receive the messages.

Another point, about the action itself on line 20: tripleo.scale.delete_node is a TripleO-specific action, as indicated in the name. If you were interested in finding the code for it, you should look at the entry_points in setup.cfg for tripleo-common (where all the workflows live):

which would lead you to the code at:

A bit more complex: node configuration

It's "slightly more complex" in that it has a couple more tasks, and it also calls to another workflow (line 426). You can see it starts with a call to ironic.node_list in its first task at line 417, which comes for free with Mistral. No need to reimplement it.

Debugging notes on workflows and Zaqar

Each workflow creates a Zaqar queue, to send progress information back to the client (CLI, web UI, other...).

Sometimes these messages get lost and the process hangs. It doesn't mean the action didn't complete successfully.

  • Check the Zaqar processes are up and running: $ sudo systemctl | grep zaqar (this has happened to me after reboots)
  • Check Mistral for any errored workflow: $ mistral execution-list
  • Check the Mistral logs (executor.log and engine.log are usually where the interesting errors are)
  • Ocata has timeouts for some of the commands now, so this is getting better

Following a workflow through its execution via CLI

This particular example will run somewhat fast so it's more of a "tracing back what happened afterwards."

$ openstack overcloud plan create my-new-overcloud
Started Mistral Workflow. Execution ID: 05d550f2-5d13-4782-be7f-a775a1d86a84
Default plan created

The CLI nicely tells you which execution ID to look for, so let's use it:

$ mistral task-list 05d550f2-5d13-4782-be7f-a775a1d86a84

| ID                                   | Name                            | Workflow name                              | Execution ID                         | State   | State info                   |
| c6e0fef0-4e65-4ee6-9ae4-a6d9e8451fd0 | verify_container_doesnt_exist   | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | ERROR   | Failed to run action [act... |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 72c1310d-8379-4869-918e-62eb04530e46 | verify_environment_doesnt_exist | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | ERROR   | Failed to run action [act... |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 74438300-8b18-40fd-bf73-62a1d90f71b3 | create_container                | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 667c0e4b-6f6c-447d-9325-ab6c20c8ad98 | upload_to_container             | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| ef447ea6-86ec-4a62-bca2-a083c66f96d3 | create_plan                     | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| f37ebe9f-b39c-4f7a-9a60-eceb80782714 | ensure_passwords_exist          | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 193f65fb-502a-4e4c-9a2d-053966500990 | plan_process_templates          | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 400d7e11-aea8-45c7-96e8-c61523d66fe4 | plan_set_status_success         | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 9df60103-15e2-442e-8dc5-ff0d61dba449 | notify_zaqar                    | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |

This gives you an idea of what Mistral did to accomplish the goal. You can also map it back to the workflow defined in tripleo-common to follow through the steps and find out what exactly was run. It if the workflow stopped too early, this can give you an idea of where the problem occurred.

Side-node about plans and the ERRORed tasks above

As of Newton, information about deployment is stored in a "Plan" which is implemented as a Swift container together with a Mistral environment. This could change in the future but for now that is what a plan is.

To create a new plan, we need to make sure there isn't already a container or an environment with that name. We could implement this in an action in Python, or since Mistral already has commands to get a container / get an environment we can be clever about this and reverse the on-error and on-success actions compared to usual:

If we do get a 'container' then it means it already exists and the plan already exists, so we cannot reuse this name. So 'on-success' becomes the error condition.

I sometimes regret a little us going this way because it leaves exception tracebacks in the logs, which is misleading when folks go to the Mistral logs for the first time in order to debug some other issue.

Finally I'd like to end all this by mentioning the Mistral Quick Start tutorial, which is excellent. It takes you from creating a very simple workflow to following its journey through the execution.

How to create your own action/workflow in TripleO

Mistral documentation:

In short:

  • Start writing your python code, probably under tripleo_common/actions
  • Add an entry point referencing it to setup.cfg
  • /!\ Restart Mistral /!\ Action code is only taken in once Mistral starts

This is summarised in the TripleO common README (personally I put this in a script to easily rerun it all).

Back to deployments: what's in a plan

As mentioned earlier, a plan is the combination of a as a Swift container + Mistral environment. In theory this is an implementation detail which shouldn't matter to deployers. In practice knowing this gives you access to a few more debugging tricks.

For example, the templates you initially provided will be accessible through Swift.

$ swift list $plan-name

Everything else will live in the Mistral environment. This contains:

  • The default passwords (which is a potential source of confusion)
  • The parameters_default aka overriden parameters (this takes priority and would override the passwords above)
  • The list of enabled environments (this looks nicer for plans created from the UI, as they are all munged into one user-environment.yaml file when deploying from CLI - see bug 1640861)
$ mistral environment-get $plan-name

For example, with an SSL-deployment done from the UI:

$ mistral environment-get ssl-overcloud
| Field       | Value                                                                             |
| Name        | ssl-overcloud                                                                     |
| Description | <none>                                                                            |
| Variables   | {                                                                                 |
|             |     "passwords": {                                                                |
|             |         "KeystoneFernetKey1": "V3Dqp9MLP0mFvK0C7q3HlIsGBAI5VM1aW9JJ6c5lLjo=",     |
|             |         "KeystoneFernetKey0": "ll6gbwcbhyAi9jNvBnpWDImMmEAaW5dog5nRQvzvEz4=",     |
|             |         "HAProxyStatsPassword": "NXgvwfJ23VHJmwFf2HmKMrgcw",                      |
|             |         "HeatPassword": "Fs7K3CxR636BFhyDJWjsbAQZr",                              |
|             |         "ManilaPassword": "Kya6gr2zp2x8ApD6wtwUUMcBs",                            |
|             |         "NeutronPassword": "x2YK6xMaYUtgn8KxyFCQXfzR6",                           |
|             |         "SnmpdReadonlyUserPassword": "5a81d2d83ee4b69b33587249abf49cd672d08541",  |
|             |         "GlancePassword": "pBdfTUqv3yxpH3BcPjrJwb9d9",                            |
|             |         "AdminPassword": "KGGz6ApEDGdngj3KMpy7M2QGu",                             |
|             |         "IronicPassword": "347ezHCEqpqhmANK4fpWK2MvN",                            |
|             |         "HeatStackDomainAdminPassword": "kUk6VNxe4FG8ECBvMC6C4rAqc",              |
|             |         "ZaqarPassword": "6WVc8XWFjuKFMy2qP2qqqVk82",                             |
|             |         "MysqlClustercheckPassword": "M8V26MfpJc8FmpG88zu7p3bpw",                 |
|             |         "GnocchiPassword": "3H6pmazAQnnHj24QXADxPrguM",                           |
|             |         "CephAdminKey": "AQDloEFYAAAAABAAcCT546pzZnkfCJBSRz4C9w==",               |
|             |         "CeilometerPassword": "6DfAKDFdEFhxWtm63TcwsEW2D",                        |
|             |         "CinderPassword": "R8DvNyVKaqA44wRKUXEWfc4YH",                            |
|             |         "RabbitPassword": "9NeRMdCyQhekJAh9zdXtMhZW7",                            |
|             |         "CephRgwKey": "AQDloEFYAAAAABAACIfOTgp3dxt3Sqn5OPhU4Q==",                 |
|             |         "TrovePassword": "GbpxyPdnJkUCjXu4AsjmgqZVv",                             |
|             |         "KeystoneCredential0": "1BNiiNQjthjaIBnJd3EtoihXu25ZCzAYsKBpPQaV12M=",    |
|             |         "KeystoneCredential1": "pGZ4OlCzOzgaK2bEHaD1xKllRdbpDNowQJGzJHo6ETU=",    |
|             |         "CephClientKey": "AQDloEFYAAAAABAAoTR3S00DaBpfz4cyREe22w==",              |
|             |         "NovaPassword": "wD4PUT4Y4VcuZsMJTxYsBTpBX",                              |
|             |         "AdminToken": "hdF3kfs6ZaCYPUwrTzRWtwD3W",                                |
|             |         "RedisPassword": "2bxUvNZ3tsRfMyFmTj7PTUqQE",                             |
|             |         "MistralPassword": "mae3HcEQdQm6Myq3tZKxderTN",                           |
|             |         "SwiftHashSuffix": "JpWh8YsQcJvmuawmxph9PkUxr",                           |
|             |         "AodhPassword": "NFkBckXgdxfCMPxzeGDRFf7vW",                              |
|             |         "CephClusterFSID": "3120b7cc-b8ac-11e6-b775-fa163e0ee4f4",                |
|             |         "CephMonKey": "AQDloEFYAAAAABAABztgp5YwAxLQHkpKXnNDmw==",                 |
|             |         "SwiftPassword": "3bPB4yfZZRGCZqdwkTU9wHFym",                             |
|             |         "CeilometerMeteringSecret": "tjyywuf7xj7TM7W44mQprmaC9",                  |
|             |         "NeutronMetadataProxySharedSecret": "z7mb6UBEHNk8tJDEN96y6Acr3",          |
|             |         "BarbicanPassword": "6eQm4fwqVybCecPbxavE7bTDF",                          |
|             |         "SaharaPassword": "qx3saVNTmAJXwJwBH8n3w8M4p"                             |
|             |     },                                                                            |
|             |     "parameter_defaults": {                                                       |
|             |         "OvercloudControlFlavor": "control",                                      |
|             |         "ComputeCount": "2",                                                      |
|             |         "ControllerCount": "3",                                                   |
|             |         "OvercloudComputeFlavor": "compute",                                      |
|             |         "NtpServer": ""                                  |
|             |     },                                                                            |
|             |     "environments": [                                                             |
|             |         {                                                                         |
|             |             "path": "overcloud-resource-registry-puppet.yaml"                     |
|             |         },                                                                        |
|             |         {                                                                         |
|             |             "path": "environments/inject-trust-anchor.yaml"                       |
|             |         },                                                                        |
|             |         {                                                                         |
|             |             "path": "environments/tls-endpoints-public-ip.yaml"                   |
|             |         },                                                                        |
|             |         {                                                                         |
|             |             "path": "environments/enable-tls.yaml"                                |
|             |         }                                                                         |
|             |     ],                                                                            |
|             |     "template": "overcloud.yaml"                                                  |
|             | }                                                                                 |
| Scope       | private                                                                           |
| Created at  | 2016-12-02 16:27:11                                                               |
| Updated at  | 2016-12-06 21:25:35                                                               |

Note: 'environment' is an overloaded word in the TripleO world, be careful. Heat environment, Mistral environment, specific templates (e.g. TLS/SSL, Storage...), your whole setup, ...

Bonus track

There is documentation on going from zero (no plan, no nodes registered) till running a deployment, directly using Mistral:

Also, with the way we work with Mistral and Zaqar, you can switch between the UI and CLI, or even using Mistral directly, at any point in the process.


Thanks to Dougal for his feedback on the initial outline!

Tagged with: open-source, openstack, tripleo

by jpichon at December 15, 2016 11:09 AM

October 14, 2016

Juan Antonio Osorio

Changing the SSL cypher and rules for TripleO's HAProxy

To change the ssl cipher and TLS rules for TripleO’s HAProxy, one needs to set up the following attributes for the haproxy.pp manifest in puppet-tripleo:

  • ssl_cipher_suite: This will set a default cipher suite that HAProxy will use for all endpoints.
  • ssl_options: This will set the SSL options used for all bind lines in HAProxy.

We can set these two options via hieradata, and since we use the same manifest for the undercloud and the overcloud, this is relatively straight forward.

So, lets say that we want to set our cipher to be the one recommended by Mozilla:


Please not that the cipher is pretty long, and in the files it should all be in one line.

and we want to set the bind options to set this for the TLS options:

no-sslv3 no-tls-tickets


To set this extra options in hiera, we need to use the hieradata_override option in undercloud.conf. This option enables us to tell the undercloud which file to use to get extra hiera that we need to set up. So, lets say that the file in our case is called haproxy-hiera-overrides.yaml, in that case we would set the following in undercloud.conf:

hieradata_override = haproxy-hiera-overrides.yaml

the file haproxy-hiera-overrides.yaml would then contain:

tripleo::haproxy::ssl_options: no-sslv3 no-tls-tickets

Now, with this set, we can run openstack undercloud install and after some minutes of wait we’re set!


For the overcloud, we need an environment file that passes the parameter ExtraConfig to the overcloud stack. that parameter would contain the extra hiera that we need. So the environment would look like the following:

    tripleo::haproxy::ssl_options: no-sslv3 no-tls-tickets

So, assuming this file is called extraconfig.yaml, we could pass it to the overcloud deploy command as an environment (with the -e parameter).

October 14, 2016 11:31 AM

October 10, 2016

Steven Hardy

TripleO composable/custom roles

This is a follow-up to my previous post outlining the new composable services interfaces , which covered the basics of the new for Newton composable services model.

The final piece of the composability model we've been developing this cycle is the ability to deploy user-defined custom roles, in addition to (or even instead of) the built in TripleO roles (where a role is a group of servers, e.g "Controller", which runs some combination of services).

What follows is an overview of this new functionality, the primary interfaces, and some usage examples and a summary of future planned work.

Fully Composable/Custom Roles

As described in previous posts TripleO has for a long time provided a fixed architecture with 5 roles (where "roles" means groups of nodes) e.g Controller, Compute, BlockStorage, CephStorage and ObjectStorage.

This architecture has been sufficient to enable standardized deployments, but it's not very flexible.  With the addition of the composable-services model, moving services around between these roles becomes much easier, but many operators want to go further, and have full control of service placement on any arbitrary roles.

Now that the custom-roles feature has been implemented, this is possible, and operators can define arbitrary role types to enable fully composable deployments. When combined with composable services represents a huge step forward for TripleO flexibility! :)

Usage examples

To deploy with additional custom roles (or to remove/rename the default roles), a new interface has been added to the python-tripleoclient “overcloud deploy interface”, so you simply need to copy the default roles_data.yaml, modify to suit your requirements (for example by moving services between roles, or adding a new role), then do a deployment referencing the modified roles_data.yaml file:

cp /usr/share/openstack-tripleo-heat-templates/roles_data.yaml my_roles_data.yaml
<modify my_roles_data.yaml>
openstack overcloud deploy –templates -r my_roles_data.yaml

Alternatively you can copy the entire tripleo-heat-templates tree (or use a git checkout):

cp -r /usr/share/openstack-tripleo-heat-templates my-tripleo-heat-templates
<modify my-tripleo-heat-templates/roles_data.yaml>
openstack overcloud deploy –templates my-tripleo-heat-templates

Both approaches are essentially equivalent, the -r option simply overwrites the default roles_data.yaml during creation of the plan data (stored in swift on the undercloud), but it's slightly more convenient if you want to use the default packaged tripleo-heat-templates instead of constantly rebasing a copied tree.

So, lets say you wanted to deploy one additional node, only running the OS::TripleO::Ntp composable service, you'd copy roles_data.yaml, and append a list entry like this:

- name: NtpRole
  CountDefault: 1
    - OS::TripleO::Services::Ntp

(Note that in practice you'll probably also want some of the common services deployed on all roles, such as OS::TripleO::Services::Kernel, OS::TripleO::Services::TripleoPackages, OS::TripleO::Services::TripleoFirewall and OS::TripleO::Services::VipHosts)


Nice, so how does it work?

The main change made to enable custom roles is a pre-deployment templating step which runs Jinja2. We define a roles_data.yaml file(which can be overridden by the user), which contains a list of role names, and optionally some additional data related to default parameter values (such as the default services deployed on the role, and default count in the group)

The roles_data.yaml definitions look like this:

- name: Controller
CountDefault: 1
  - OS::TripleO::Services::CACerts
  - OS::TripleO::Services::CephMon
    - OS::TripleO::Services::CinderApi
    - ...

The format is simply a yaml list of maps, with a mandatory “name” key in each map, and a number of optional FooDefault keys which set the parameter defaults for the role (as a convenience so the user won't have to specify it via an environment file during the overcloud deployment).

A custom mistral action is used to run Jinja2 when creating or updating a “deployment plan” (which is a combination of some heat templates stored in swift, and a mistral environment containing user parameters) – and this basically consumes the roles_data.yaml list of required roles, and outputs a rendered tree of Heat templates ready to deploy your overcloud.
Custom Roles, overview

There are two types of Jinja2 templates which are rendered differently, distinguished by the file extension/suffix:


This will pass in the contents of the roles_data.yaml list, and iterate over each role in the list, The resulting file in the plan swift container will be named foo.yaml.
Here's an example of the syntax used for j2 templating inside these files:

   - ','
{% for role in roles %}
   - {get_attr: [{{}}ServiceChain, role_data, service_names]}
{% endfor %}

This example is from overcloud.j2.yaml, it does a jinja2 loop appending service_names for all roles *ServiceChain resources (which are also dynamically generated via a similar loop), which is then processed on deployment via a heat list_join function,


This will generate a file per-role, where only the name of the role is passed in during the templating step, with the resulting files being called rolename-foo.yaml. (Note that If you have a role which requires a special template, it is possible to disable this file generation by adding the pathto the j2_excludes.yaml file)

Here's an example of the syntax used in these files (taken from the role.role.j2.yaml file, which is our new definition of server for a generic role):

type: OS::TripleO::Server
command: {get_param: ConfigCommand}
image: {get_param: {{role}}Image}

As you can see, this simply allows use of a {{role}} placeholder, which is then substituted with the role name when rendering each file (one file per role defined in the roles_data.yaml list).

Debugging/Development tips

When making changes to either the roles_data.yaml, and particularly when making changes to the *.j2.yaml files in tripleo-heat-templates, it's often helpful to view the rendered templates before any overcloud deployment is attempted.

This is possible via use of the “openstack overcloud plan create” interface (which doesn't yet support the -r option above, so you have to copy or git clone the tree), combined with swiftclient:

openstack overcloud plan create overcloud –templates my_tripleo_heat_templates
mkdir tmp_templates && pushd tmp_templates
swift download overcloud

This will download the full tree of rendered files from the swift container (named “overcloud” due to the name passed to plan create), so you can e.g view the rendered overcloud.yaml that's generated by combining the overcloud.j2.yaml template with the roles_data.yaml file.

If you make a mistake in your *.j2.yaml file, the jinja2 error should be returned via the plan create command, but it can also be useful to tail -f /var/log/mistral/mistral-server.log for additional information during development (this shows the output logged from running jinja2 via the custom mistral action plugin).

Limitations/future work

These new interfaces allow for much greater deployment flexibility and choice, but there are a few remaining issues which will be addressed in future development cycles:
  1. All services managed by pacemaker are still tied to the Controller role. Thanks to the implementation of a more lightweight HA architecture during the Newton cycle, the list of services managed by pacemaker is considerably reduced, but there's still a number of services (DB & RPC services primarily) which are, and until the composable-ha blueprint is completed (hopefully during Ocata), these services cannot be moved to a non Controller role.
  2. Custom isolated networks cannot be defined. Since arbitrary roles types can now be defined, there may be a requirement to define arbitrary additional networks for network-isolation, but right now this is not possible.
  3. roles_data.yaml must be copied. As in the examples above, it's necessary to copy either roles_data.yaml, (or the entire tripleo-heat-templates tree), which means if the packaged roles_data.yaml changes (such as to add new services to the built-in roles), you must merge these changes in with your custom roles_data. In future we may add a convenience interface which makes it easier to e.g add a new role without having to care about the default role definitions.
  4. No model for dependencies between services.  Currently ensuring the right combination of services is deployed on specific roles is left to the operator, there's no validation of incompatible or inter-dependent services, but this may be addressed in a future release.

by Steve Hardy ( at October 10, 2016 09:04 AM

September 08, 2016

Emilien Macchi

Scaling-up TripleO CI coverage with scenarios

TripleO CI up to eleven!



When the project OpenStack started, it was “just” a set of services with the goal to spawn a VM. I remember you run everything on your laptop and test things really quickly.
The project has now grown, and thousands of features have been implemented, more backends / drivers are supported and new projects joined the party.
It makes testing very challenging because everything can’t be tested in CI environment.

TripleO aims to be an OpenStack installer, that takes care of services deployment. Our CI was only testing a set of services and a few plugins/drivers.
We had to find a way to test more services, more plugins, more drivers, in a efficient way, and without wasting CI resources.

So we thought that we could create some scenarios with a limited set of services, configured with a specific backend / plugin, and one CI job would deploy and test one scenario.
Example: scenario001 would be the Telemetry scenario, testing required services like Keystone, Nova, Glance, Neutron, but also Aodh, Ceilometer and Gnocchi.

Puppet OpenStack CI is using this model for a while and it works pretty well. We’re going to reproduce it into TripleO CI to have consistency.


How scenarios are run when patching TripleO?

We are using a feature in Zuul that allows to select which scenario we want to test, depending on the files we try to patch in a commit.
For example, if I submit a patch in TripleO Heat Templates and I try to modify “puppet/service/ceilometer-api.yaml” which is the composable service for Ceilometer-API, Zuul will trigger scenario001. See Zuul layout:

- name : ^gate-tripleo-ci-centos-7-scenario001-multinode.*$
    - ^puppet/services/aodh.*$
    - ^manifests/profile/base/aodh.*$
    - ^puppet/services/ceilometer.*$
    - ^manifests/profile/base/ceilometer.*$
    - ^puppet/services/gnocchi.*$
    - ^manifests/profile/base/gnocchi.*$


How can I bring my own service in a scenario?

The first step is to look at Puppet CI matrix and see if we already test the service in a scenario. If yes, please keep this number consistent with TripleO CI matrix. If not, you’ll need to pick a scenario, usually the less loaded to avoid performances issues.
Now you need to patch openstack-infra/project-config and specify the files that are deploying your service.
For example, if your service is “Zaqar”, you’ll add something like:

- name : ^gate-tripleo-ci-centos-7-scenario002-multinode.*$
    - ^puppet/services/zaqar.*$
    - ^manifests/profile/base/zaqar.*$

Everytime you’ll send a patch to TripleO Heat Templates in puppet/services/zaqar* files or in puppet-tripleo manifests/profile/base/zaqar*, scenario002 will be triggered.

Finally, you need to send a patch to openstack-infra/tripleo-ci:

  • Modify to add the new service in the matrix.
  • Modify templates/scenario00X-multinode-pingtest.yaml and add a resource to test the service (in Zaqar, it could be a Zaqar Queue).
  • Modify test-environments/scenario00X-multinode.yaml and add the TripleO composable services and parameters to deploy the service.

Once you send the tripleo-ci patch, you can block it with -1 workflow to avoid accidental merge. Now go on openstack/tripleo-heat-templates and try to modify zaqar composable service by adding a comment or something you actually want to test. In the commit message, add “Depends-On: XXX” where XXX is the commit ID of the tripleo-ci patch. When you’ll send the patch, you’ll see that Zuul will trigger the appropriate scenario and your service will be tested.



What’s next?

  • Allow to extend testing outside pingtest. Some services, for example Ironic, can’t be tested with pingtest. Maybe run Tempest for a set of services would be something to investigate.
  • Zuul v3 is the big thing we’re all waiting to extend the granularity of our matrix. A current limitation current Zuul version (2.5) is that we can’t run scenarios in Puppet OpenStack modules CI because we don’t have a way combine both files rules that we saw before AND running the jobs for a specific project without files restrictions (ex: puppet-zaqar for scenario002). In other words, our CI will be better with Zuul v3 and we’ll improve our testing coverage by running the right scenarios on the right projects.
  • Extend the number of nodes. We currently use multinode jobs which deploy an undercloud and a subnode for overcloud (all-in-one). Some use-cases might require a third node (example with Ironic).

Any feedback on this blog post is highly welcome, please let me know if you want me to cover something more in details.

by Emilien at September 08, 2016 10:52 PM

September 01, 2016

Steven Hardy

Complex data transformations with nested Heat intrinsic functions

Disclaimer, what follows is either pretty neat, or pure-evil depending your your viewpoint ;)  But it's based on a real use-case and it works, so I'm posting this to document the approach, why it's needed, and hopefully stimulate some discussion around optimizations leading to a improved/simplified implementation in the future.

The requirement

In TripleO we have a requirement enable composition of different services onto different roles (groups of physical nodes), we need input data to configure the services which combines knowledge of the enabled services, which nodes/role they're running on, and which overlay network each service is bound to.

To do this, we need to input several pieces of data:

1. A list of the OpenStack services enabled for a particular deployment, expressed as a heat parameter it looks something like this:

    type: comma_delimited_list
      - heat_api

      - heat_engine
      - nova_api
      - neutron_api
      - glance_api
      - ceph_mon

2. A mapping of service names to one of several isolated overlay networks, such as "internal_api" "external" or "storage" etc:

    type: json
      heat_api_network: internal_api
      nova_api_network: internal_api
      neutron_api_network: internal_api
      glance_api_network: storage
      ceph_mon_network: storage

3. A mapping of the network names to the actual IP address (either a single VIP pointing to a loadbalancer, or a list of the IPs bound to that network for all nodes running the service):

    type: json

The implementation, step by step

Dynamically generate an initial mapping for all enabled services

Here we can use a nice pattern which combines the heat repeat function with map_merge:

        SERVICE_ip: SERVICE_network
         SERVICE: {get_param: EnabledServices}

Step1: repeat dynamically generates lists (including lists of maps as in this case), so we use it to generate a list of maps for every service in the EnabledServices list with a placeholder for the network, e.g:

  - heat_api_ip: heat_api_network
  - heat_engine_ip: heat_engine_network
  - nova_api_ip: nova_api_network
  - neutron_api_ip: neutron_api_network
  - glance_api_ip: glance_api_network
  - ceph_mon_ip: ceph_mon_network

Step2: map_merge combines this list of maps with only one key to one big map for all EnabledServices

  heat_api_ip: heat_api_network
  heat_engine_ip: heat_engine_network 
  nova_api_ip: nova_api_network
  neutron_api_ip: neutron_api_network
  glance_api_ip: glance_api_network
  ceph_mon_ip: ceph_mon_network

Substitute placeholder for the actual network/IP

We approach this in two passes, with two nested map_replace calls (a new function I wrote for newton Heat which can do key/value substitutions on any mapping):

    - map_replace:

    - heat_api_ip: heat_api_network
      heat_engine_ip: heat_engine_network 
      nova_api_ip: nova_api_network
      neutron_api_ip: neutron_api_network
      glance_api_ip: glance_api_network
      ceph_mon_ip: ceph_mon_network
         - values: {get_param: ServiceNetMap}
     - values: {get_param: NetIpMap}

Step3: The inner map_replace substitutes the placeholder into the actual network provided in the ServiceNetMap mapping, which gives e.g

  heat_api_ip: internal_api
  heat_engine_ip: heat_engine_network
  nova_api_ip: internal_api
  neutron_api_ip: internal_api
  glance_api_ip: storage
  ceph_mon_ip: storage

Note that if there's no network assigned in ServiceNetMap for the service, no replacement will occur, so the value will remain e.g heat_engine_network, more on this later..

Step4: the outer map_replace substitutes the network name, e.g internal_api, with the actual VIP for that network provided by the ServiceNetMap mapping, which gives the final mapping of:

  heat_engine_ip: heat_engine_network 

Filter any values we don't want

As you can see we got a value we don't want - heat_engine is like many non-api services in that it's not bound to any network, it only talks to rabbitmq, so we don't have any entry in ServiceNetMap for it.

We can therefore remove any entries which remain in the mapping using the yaql heat function, which is an interface to run yaql queries inside a heat template. 

It has to be said yaql is very powerful, but the docs are pretty sparse (but improving), so I tend to read the unit tests instead of the docs for usage examples.

    expression: dict($$[1]) and not $[1].endsWith("_network")))

          heat_engine_ip: heat_engine_network 

Step5: filter all map values where the value is a string, and the string ends with "_network" via yaql, which gives:



So, that's it - we now transformed two input maps and a list into a dynamically generated mapping based on the list items! :)

Implementation, completed

Pulling all of the above together, here's a full example (you'll need a newton Heat environment to run this), it combines all steps described above into one big combination of nested intrinsic functions:
Edit - also available on github

heat_template_version: 2016-10-14

description: >
  Example of nested heat functions

    type: json

    type: comma_delimited_list
      - heat_api
      - nova_api
      - neutron_api
      - glance_api
      - ceph_mon

    type: json
      heat_api_network: internal_api
      nova_api_network: internal_api
      neutron_api_network: internal_api
      glance_api_network: storage
      ceph_mon_network: storage

    description: Mapping of service names to IP address for the assigned network
        expression: dict($$[1]) and not $[1].endsWith("_network")))
              - map_replace:
                  - map_merge:
                          SERVICE_ip: SERVICE_network
                          SERVICE: {get_param: EnabledServices}
                  - values: {get_param: ServiceNetMap}
              - values: {get_param: NetIpMap}



by Steve Hardy ( at September 01, 2016 09:31 AM

August 26, 2016

Giulio Fidente

Ceph, TripleO and the Newton release

Time to roll up some notes on the status of Ceph in TripleO. The majority of these functionalities were available in the Mitaka release too but the examples work with code from the Newton release so they might not apply identical to Mitaka.

The TripleO default configuration

No default is going to fit everybody, but we want to know what the default is to improve from there. So let's try and see:

uc$ openstack overcloud deploy --templates tripleo-heat-templates -e tripleo-heat-templates/environments/puppet-pacemaker.yaml -e tripleo-heat-templates/environments/storage-environment.yaml --ceph-storage-scale 1
Deploying templates in the directory /home/stack/example/tripleo-heat-templates
Overcloud Deployed

Monitors go on the controller nodes, one per node, the above command is deploying a single controller though. First interesting thing to point out is:

oc$ ceph --version
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)

Jewel! Kudos to Emilien for bringing support for it in puppet-ceph. Continuing our investigation, we notice the OSDs go on the cephstorage nodes and are backed by the local filesystem, as we didn't tell it to do differently:

oc$ ceph osd tree
-1 0.03999 root default
-2 0.03999     host overcloud-cephstorage-0
 0 0.03999         osd.0                         up  1.00000          1.00000

Notice we got SELinux covered:

oc$ ls -laZ /srv/data
drwxr-xr-x. ceph ceph system_u:object_r:ceph_var_lib_t:s0 .

And use CephX with autogenerated keys:

oc$ ceph auth list
installed auth entries:

        key: AQC2Pr9XAAAAABAAOpviw6DqOMG0syeEYmX2EQ==
        caps: [mds] allow *
        caps: [mon] allow *
        caps: [osd] allow *
        key: AQC2Pr9XAAAAABAAA78Svmmt+LVIcRrZRQLacw==
        caps: [mon] allow r
        caps: [osd] allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics

But which OpenStack service is using Ceph? The storage-environment.yaml file has some informations:

uc$ grep -v '#' tripleo-heat-templates/environments/storage-environment.yaml | uniq

   OS::TripleO::Services::CephMon: ../puppet/services/ceph-mon.yaml
   OS::TripleO::Services::CephOSD: ../puppet/services/ceph-osd.yaml
   OS::TripleO::Services::CephClient: ../puppet/services/ceph-client.yaml

   CinderEnableIscsiBackend: false
   CinderEnableRbdBackend: true
   CinderBackupBackend: ceph
   NovaEnableRbdBackend: true
   GlanceBackend: rbd
   GnocchiBackend: rbd

The registry lines enable the Ceph services, the parameters instead are setting Ceph as backend for Cinder, Nova, Glance and Gnocchi. They can be configured to use other backends, see the comments in the environment file. Regarding the pools:

oc$ ceph osd lspools
0 rbd,1 metrics,2 images,3 backups,4 volumes,5 vms,

Despite the replica size set by default to 3, we only have a single OSD so with a single OSD the cluster will never get into HEALTH_OK:

oc$ ceph osd pool get vms size
size: 3

Good to know, now a new deployment with more interesting stuff.

A more realistic scenario

What makes it "more realistic"? We'll have enough OSDs to cover the replica size. We'll use physical disks for our OSDs (and journals) and not the local filesystem. We'll cope with a node with a different disks topology and we'll decrease the replica size for one of the pools.

Set a default disks map for the OSD nodes

Define a default configuration for the storage nodes, telling TripleO to use sdb for the OSD data and sdc for the journal:

          journal: /dev/sdc

Customize the disks map for a specific node

For the node which has two (instead of a single) rotatory disks, we'll need a specific map. First get its system-uuid from the Ironic introspection data:

uc$ openstack baremetal introspection data save | jq .extra.system.product.uuid

then create the node specific map:

    OS::TripleO::CephStorageExtraConfigPre: tripleo-heat-templates/puppet/extraconfig/pre_deploy/per_node.yaml

    NodeDataLookup: >
         {"/dev/sdb": {"journal": "/dev/sdd"},
          "/dev/sdc": {"journal": "/dev/sdd"}

Fine tune pg_num, pgp_num and replica size for a pool

Finally, to override the replica size (and why not, PGs number) of the "vms" pool (where by default the Nova ephemeral disks go):

        size: 2
        pg_num: 128
        pgp_num: 128

Zap all disks for the new deployment

We also want to clear and prepare all the non-root disks with a GPT label, which will allow us, for example, to repeat the deployment multiple times reusing the same nodes. The implementation of the disks cleanup script can vary, but we can use a sample script and wire it to the overcloud nodes via NodeUserData:

uc$ curl -O

    OS::TripleO::NodeUserData: ceph_wipe_disk.yaml

    ceph_disks: "/dev/sdb /dev/sdc /dev/sdd"

All the above environment files could have been merged in a single one but we split them out in multiple ones for clarity. Now the new deploy command:

uc$ openstack overcloud deploy --templates tripleo-heat-templates -e tripleo-heat-templates/environments/puppet-pacemaker.yaml -e tripleo-heat-templates/environments/storage-environment.yaml --ceph-storage-scale 3 -e ceph_pools_config.yaml -e ceph_mynode_disks.yaml -e ceph_default_disks.yaml -e ceph_wipe_env.yaml
Deploying templates in the directory /home/stack/example/tripleo-heat-templates
Overcloud Deployed

Here is our OSDs tree, with two instances running on the node with two rotatory disks (sharing the same journal disk):

oc$ ceph os tree
-1 0.03119 root default
-2 0.00780     host overcloud-cephstorage-1
 0 0.00780         osd.0                         up  1.00000          1.00000
-3 0.01559     host overcloud-cephstorage-2
 1 0.00780         osd.1                         up  1.00000          1.00000
 2 0.00780         osd.2                         up  1.00000          1.00000
-4 0.00780     host overcloud-cephstorage-0
 3 0.00780         osd.3                         up  1.00000          1.00000

and the custom PG/size values for for "vms" pool:

oc$ ceph osd pool get vms size
size: 2
oc$ ceph osd pool get vms pg_num
pg_num: 128

Another simple customization could have been to set the journals size. For example:

      ceph::profile::params::osd_journal_size: 1024

Also we did not provide any customization for the crushmap but a recent addition from Erno makes it possible to disable global/osd_crush_update_on_start so that any customization becomes possible after the deployment is finished.

Also we did not deploy the RadosGW service as it is still a work in progress, expected for the Newton release. Submissions for its inclusion are on review.

We're also working on automating the upgrade from the Ceph/Hammer release deployed with TripleO/Mitaka to Ceph/Jewel, installed with TripleO/Newton. The process will be integrated with the OpenStack upgrade and again the submissions are on review in a series.

For more scenarios

The mechanism recently introduced in TripleO to make composable roles, discussed in a Steven's blog post, makes it possible to test a complete Ceph deployment using a single controller node too (hosting the OSD service as well), just by adding OS::TripleO::Services::CephOSD to the list of services deployed on the controller role.

And if the above still wasn't enough, TripleO continues to support configuration of OpenStack with a pre-existing, unmanaged Ceph cluster. To do so we'll want to customize the parameters in puppet-ceph-external.yaml and deploy passing that as argument instead. For example:

    OS::TripleO::Services::CephExternal: tripleo-heat-templates/puppet/services/ceph-external.yaml

    # NOTE: These example parameters are required when using Ceph External and must be obtained from the running cluster
    #CephClusterFSID: '4b5c8c0a-ff60-454b-a1b4-9747aa737d19'
    #CephClientKey: 'AQDLOh1VgEp6FRAAFzT7Zw+Y9V6JJExQAsRnRQ=='
    #CephExternalMonHost: ','

    # the following parameters enable Ceph backends for Cinder, Glance, Gnocchi and Nova
    NovaEnableRbdBackend: true
    CinderEnableRbdBackend: true
    CinderBackupBackend: ceph
    GlanceBackend: rbd
    GnocchiBackend: rbd
    # If the Ceph pools which host VMs, Volumes and Images do not match these
    # names OR the client keyring to use is not named 'openstack',  edit the
    # following as needed.
    NovaRbdPoolName: vms
    CinderRbdPoolName: volumes
    GlanceRbdPoolName: images
    GnocchiRbdPoolName: metrics
    CephClientUserName: openstack
    # finally we disable the Cinder LVM backend
    CinderEnableIscsiBackend: false

Come help in #tripleo @ freenode and don't forget to check the docs at! Some related topics are described there, for example, how to set the root device via Ironic for the nodes with multiple disks or how to push in ceph.conf additional arbitraty settings.

by Giulio Fidente at August 26, 2016 03:00 AM

August 04, 2016

Dan Prince

TripleO: onward dark owl

Onward dark owl

I was on PTO last week and started hacking on the beginnings of what could be a new Undercloud installer that:

  • Uses a single process Heat (heat-all)
  • It does not require MySQL, Rabbit
  • Uses noauth (no Keystone)
  • Drives the deployment locally via os-collect-config

The prototype ends up looking like this:

openstack undercloud deploy --templates=/root/tripleo-heat-templates

A short presentation of the reasons behind this and demo of the prototype is available here:

Video demo: TripleO onward dark owl

An etherpad with links to the code/patches is here:


by Dan Prince at August 04, 2016 10:00 PM

June 16, 2016

Marios Andreou

Deploying a stable/mitaka OpenStack with tripleo-docs (and grep, git-blame and git-log).

Deploying a stable/mitaka OpenStack with tripleo-docs (and grep, git-blame and git-log).

This post is about how I was able to mostly successfully follow the tripleo-docs, to deploy a stable/mitaka 3-control 1-compute development (virt) setup so I can ultimately test upgrading this to Newton.

I wasn’t sure there was something worth writing here, but then the same tools I used to address the two issues I hit deploying mitaka kept coming up during the week when trying to upgrade that environment. I’ve had to use a lot of grep and git blame/log to get to the bottom of issues I’m seeing trying to upgrade the undercloud from stable/mitaka to latest/newton.

The Newton upgrade work is ongoing and possibly worthy of a future post.

I guess this post is mostly about git blame, and using URI munging using the change-id to get to actual gerrit code reviews from an error/issue you are seeing.

For the record I deployed stable/mitaka following the instructions at tripleo-docs and setting stable/mitaka repos in appropriate places. For example, during the virt-setup and the undercloud installation I followed the ‘Stable Branch’ admonition and enabled mitaka repos like:

sudo curl -o /etc/yum.repos.d/delorean-mitaka.repo
sudo curl -o /etc/yum.repos.d/delorean-deps-mitaka.repo

Then when building images I enabled the mitaka repo like:

export NODE_DIST=centos7
export DELOREAN_REPO_FILE="delorean.repo"

The two issues I hit:

The pebcak issue.

This issue is the pebcak issue because whilst there is indeed a bona-fide bug that I hit here, I only hit that because I had a nit in my deployment command.

My deployment command looked like this:

openstack overcloud deploy --templates --control-scale 3 --compute-scale 1
  --libvirt-type qemu
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml
-e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml
-e network_env.yaml --ntp-server ""

Deploying like that ^^^ got me this:

The files ('overcloud-without-mergepy.yaml', 'overcloud.yaml') not found
in the /usr/share/openstack-tripleo-heat-templates/ directory

Err.. no I’m pretty sure those files are there (!)

# [stack@instack ~]$ ls -l /usr/share/openstack-tripleo-heat-templates/overcloud-without-mergepy.yaml
  lrwxrwxrwx. 1 root root 14 Jun 17 08:55 /usr/share/openstack-tripleo-heat-templates/overcloud-without-mergepy.yaml -> overcloud.yaml

So I know that message is very likely from the tripleoclient so I traced it. The code has actually already been fixed on master so grep gave me nothing there. However when I also tried against stable/mitaka:

[m@m python-tripleoclient]$ git checkout stable/mitaka
Switched to branch 'stable/mitaka'
[m@m python-tripleoclient]$ grep -rni "not found in the" ./*
./tripleoclient/v1/  message = "The files {0} not
found in the {1} directory".format(

So then we can now use git blame to get to the code review that fixed it. Since we now know the file that error message comes from, we can use git blame against master branch. Since it is fixed on master, something must have fixed it:

[m@m python-tripleoclient]$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
[m@m python-tripleoclient]$ git blame tripleoclient/v1/

1077cf13 tripleoclient/v1/        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  382)     def _try_overcloud_deploy_with_compat_yaml(self, tht_root, stack,
1077cf13 tripleoclient/v1/        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  383)                                                stack_name, parameters,
1077cf13 tripleoclient/v1/        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  384)                                                environments, timeout):
7a05679e tripleoclient/v1/        (James Slagle               2016-04-01 08:57:41 -0400  385)         messages = ['The following errors occurred:']
1077cf13 tripleoclient/v1/        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  386)         for overcloud_yaml_name in constants.OVERCLOUD_YAML_NAMES:
1077cf13 tripleoclient/v1/        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  387)             overcloud_yaml = os.path.join(tht_root, overcloud_yaml_name)
1077cf13 tripleoclient/v1/        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  388)             try:
1077cf13 tripleoclient/v1/        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  389)                 self._heat_deploy(stack, stack_name, overcloud_yaml,
1077cf13 tripleoclient/v1/        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  390)                                   parameters, environments, timeout)
7a05679e tripleoclient/v1/        (James Slagle               2016-04-01 08:57:41 -0400  391)             except six.moves.urllib.error.URLError as e:
7a05679e tripleoclient/v1/        (James Slagle               2016-04-01 08:57:41 -0400  392)                 messages.append(str(e.reason))
1077cf13 tripleoclient/v1/        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  393)             else:
1077cf13 tripleoclient/v1/        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  394)                 return
7a05679e tripleoclient/v1/        (James Slagle               2016-04-01 08:57:41 -0400  395)         raise ValueError('\n'.join(messages))

So the git blame may not display great above, but I see the following line as particularly interesting since it is different to stable/mitaka:

7a05679e tripleoclient/v1/        (James Slagle               2016-04-01 08:57:41 -0400  392)                 messages.append(str(e.reason))

So now we can use git log to see the actual commit and check it is the one we are looking for:

[m@m python-tripleoclient]$ git log 7a05679e
commit 7a05679ebc944e3bec6f20c194c40fae1cf39d8d
Author: James Slagle <>
Date:   Fri Apr 1 08:57:41 2016 -0400

Show correct missing files when an error occurs

This function was swallowing all missing file exceptions, and then
printing a message saying overcloud.yaml or
overcloud-without-mergepy.yaml were not found.

The problem is that the URLError could occur for any missing file, such
as a missing environment file, typo in a relative patch or filename,
etc. And in those cases, the error message is actually quite misleading,
especially if the overcloud.yaml does exist at the exact shown path.

This change makes it such that the actual missing file paths are shown
in the output.

Closes-Bug: 1584792
Change-Id: Id9a70cb50d7dfa3dde72eefe0a5eaea7985236ff

Now that sounds promising! So not only do we have the actual bug number, but we have the Change-Id. We can use that to get to the gerrit code review:

[m@m ~]$ gimmeGerrit Id9a70cb50d7dfa3dde72eefe0a5eaea7985236ff

Where gimmeGerrit is a bash alias in my .profile:

  2  gimme_gerrit() {$
  3      gerrit_url=",$1,n,z"$
  4      firefox $gerrit_url$
  5  }$
  93 alias gimmeGerrit=gimme_gerrit$

So from the review to master I just made a cherry-pick to stable/mitaka.

Now the reason I was seeing this issue in the first place, was because my deploy command was indeed wrong (it’s just that the error message was eaten by this particular bug). I was using ‘network_env.yaml’ but I had actually created network-env.yaml. Yes, much palmface, but if I hadn’t I wouldn’t have backported the fix so meh.

The overcloud needs moar memory bug.

It is more or less well known in the tripleo community that 4GB overcloud nodes will no longer cut it even in a virt environment, which is why we default to 5GB on current master instack-undercloud.

I was seeing OOM issues on the overcloud nodes with current stable/mitaka like:

16021:Jun 14 10:53:07 overcloud-controller-0 os-collect-config[2330]: u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mError: /Stage[main]/Main/Pacemaker::Constraint::Base[storage_mgmt_vip-then-haproxy]/Exec[Creating order constraint storage_mgmt_vip-then-haproxy]: Could not evaluate: Cannot allocate memory - fork(2)\u001b[0m\n\u001b[1;31mError: /Stage[main]/Main/Pacemaker::Resource::Service[openstack-nova-novncproxy]/Pacemaker::Resource::Systemd[openstack-nova-novncproxy]/Pcmk_resource[openstack-nova-novncproxy]: Could not evaluate: Cannot allocate memory - /usr/sbin/pcs resource show openstack-nova-novncproxy > /dev/null 2>&1 2>&1\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Base[nova-vncproxy-then-nova-api-constraint]/Exec[Creating order constraint nova-vncproxy-then-nova-api-constraint]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Colocation[nova-api-with-nova-vncproxy-colocation]/Pcmk_constraint[colo-openstack-nova-api-clone-openstack-nova-novncproxy-clone]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Base[nova-consoleauth-then-nova-vncproxy-constraint]/Exec[Creating order constraint nova-consoleauth-then-nova-vncproxy-constraint]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Colocation[nova-vncproxy-with-nova-consoleauth-colocation]/Pcmk_constraint[

16313:Jun 14 10:53:07 overcloud-controller-0 os-collect-config[2330]:
Error: /Stage[main]/Sahara::Service::Api/Service[sahara-api]: Could not
evaluate: Cannot allocate memory - fork(2)
16314:Jun 14 10:53:07 overcloud-controller-0 os-collect-config[2330]:
Error: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/Exec[concat_/etc/haproxy/haproxy.cfg]:
Could not evaluate: Cannot allocate memory - fork(2)

Suspecting from previous experience this would be defaulted in instack-undercloud:

[m@m instack-undercloud]$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
[m@m instack-undercloud]$ grep -rni 'NODE_MEM' ./*
./scripts/instack-virt-setup:89:export NODE_MEM=${NODE_MEM:-5120}

[m@m instack-undercloud]$ git blame scripts/instack-virt-setup | grep  NODE_MEM
2dec7d75 (Carlos Camacho  2016-03-30 09:17:44 +0000  89) export NODE_MEM=${NODE_MEM:-5120}

So using git log to see more about 2dec7d75:

[m@m instack-undercloud]$ git log 2dec7d75
commit 2dec7d7521799c0323d076cd66ba71ebb444c706
Author: Carlos Camacho <>
Date:   Wed Mar 30 09:17:44 2016 +0000

    Overcloud is not able to deploy with the default 4GB of RAM using instack-undercloud

    When deploying the overcloud with the default value of 4GB of RAM the overcloud fails throwing "Cannot allocate memory" errors.
    By increasing the default memory to 5GB the error is solved in instack-undercloud

    Change-Id: I29036edeebefc1959643a04c5396e72863fdca5f
    Closes-Bug: #1563750

So as in the case of the pebcak issue, gimmeGerrit yields the review so I then just cherrypicked that to stable/mitaka too.

June 16, 2016 03:00 PM

June 02, 2016

Marios Andreou

Monitoring a tripleo Overcloud upgrade

Monitoring a tripleo Overcloud upgrade

The tripleo overcloud upgrades workflow (WIP Docs) has been well tested for upgrades to stable/liberty. There is ongoing work to adapt this workflow for upgrades to stable/mitaka/newton (current master), as well as to change the process altogether and make it more composable.

This post is a description of the kinds of things I look for when monitoring a stable/liberty upgrade - verification points after a given step and some explanation in various points that may/not be helpful. I recently had to share a lot of this information as as part of a customer POC upgrade and thought it would be useful to have written down somewhere.

For reference, the overcloud being upgraded in the examples below was deployed like:

openstack overcloud deploy --templates /home/stack/tripleo-heat-templates
  -e /home/stack/tripleo-heat-templates/overcloud-resource-registry-puppet.yaml
  -e /home/stack/tripleo-heat-templates/environments/puppet-pacemaker.yaml
  --control-scale 3 --compute-scale 1 --libvirt-type qemu
  -e /home/stack/tripleo-heat-templates/environments/network-isolation.yaml
  -e /home/stack/tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml
  -e network_env.yaml --ntp-server ''

Upgrade your undercloud.

The first thing to check and very likely have to re-instate is any post-create customizations you had to make to your undercloud, such as creation of a new ovs interface for talking to your overcloud nodes, or any custom IP routes. The undercloud upgrade will revert those and you’ll have to re-add/create them.

The upgrade to liberty delivers a new script for the undercloud, so you can check this:

[stack@instack ~]$ which

Other than that I always just sanity check that services are running OK post upgrade:

[stack@instack ~]$ openstack-service status
MainPID=2107 Id=neutron-dhcp-agent.service ActiveState=active
MainPID=2106 Id=neutron-openvswitch-agent.service ActiveState=active
MainPID=1191 Id=neutron-server.service ActiveState=active
MainPID=1232 Id=openstack-glance-api.service ActiveState=active
MainPID=1172 Id=openstack-glance-registry.service ActiveState=active
MainPID=1201 Id=openstack-heat-api-cfn.service ActiveState=active

Execute the upgrade initialization step

This is called the initialization step since it sets up the repos on the overcloud nodes (for the upgrade we are going to) and delivers the upgrade script to the non-controller nodes. This step is instigated through the inclusion of the major-upgrade-pacemaker-init.yaml in the deployment command. For example:

openstack overcloud deploy --templates /home/stack/tripleo-heat-templates
  -e /home/stack/tripleo-heat-templates/overcloud-resource-registry-puppet.yaml
  -e /home/stack/tripleo-heat-templates/environments/puppet-pacemaker.yaml
  --control-scale 3 --compute-scale 1 --libvirt-type qemu
  -e /home/stack/tripleo-heat-templates/environments/network-isolation.yaml
  -e /home/stack/tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml
  -e /home/stack/tripleo-heat-templates/environments/major-upgrade-pacemaker-init.yaml
  -e network_env.yaml --ntp-server ''

Once the heat stack has gone to UPDATE_COMPLETE you can check all non controller nodes for the presence of the newly delivered upgrade script

[root@overcloud-novacompute-0 ~]# ls -l /root
-rwxr-xr-x. 1 root root 348 Jun  3 11:26

One point to note is that the rpc version which we will use for pinning nova rpc during the upgrade is set in the compute upgrade script:

[root@overcloud-novacompute-0 ~]# cat
### This file is automatically delivered to the compute nodes as part of the
### tripleo upgrades workflow

# pin nova to kilo (messaging +-1) for the nova-compute service

crudini  --set /etc/nova/nova.conf upgrade_levels compute mitaka

yum -y install python-zaqarclient  # needed for os-collect-config
yum -y update

The line with the upgrade_levels compute above is actually written using the parameter we passed in the major-upgrade-pacemaker-init.yaml

You should also see the updated /etc/yum.repos.d/* on all overcloud nodes after this step so you can confirm that is all in order for the upgrade to proceed.

Upgrade controller nodes (and your entire pacemaker cluster)

(I skipped upgrading swift nodes, as it isn’t very interesting/much to say, see the WIP Docs for more or ping me).

This step will upgrade your controller nodes and during this process the entire cluster will be taken offline - this is normal. This step is instigated by including the major-upgrade-pacemaker.yaml environment file. For example:

openstack overcloud deploy --templates /home/stack/tripleo-heat-templates
  -e /home/stack/tripleo-heat-templates/overcloud-resource-registry-puppet.yaml
  -e /home/stack/tripleo-heat-templates/environments/puppet-pacemaker.yaml
  --control-scale 3 --compute-scale 1 --libvirt-type qemu
  -e /home/stack/tripleo-heat-templates/environments/network-isolation.yaml
  -e /home/stack/tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml
  -e /home/stack/tripleo-heat-templates/environments/major-upgrade-pacemaker.yaml
  -e network_env.yaml --ntp-server ''

I typically observe the pacemaker cluster during the upgrade process. For example on controller 1 i have watch -d pcs status and on controller-2 I have watch -d pcs status | grep -ni stop -C 2. During the upgrade the pacemaker cluster goes down completely at some point, before yum packages are updated and then the cluster is brought back up.

Once you start to see pacemaker services go down it means that the code in is running and eventually the cluster is stopped completely.

Every 2.0s: pcs status | grep -ni stop -C2 -B1                                                               Fri Jun  3 11:52:07 2016

Error: cluster is not currently running on this node

At this point you can start to monitor /var/log/yum.log to see packages being upgraded.

[root@overcloud-controller-0 ~]# tail -f /var/log/yum.log
Jun 03 11:51:52 Updated: erlang-otp_mibs-18.3.3-1.el7.x86_64
Jun 03 11:51:52 Installed: python2-rjsmin-1.0.12-2.el7.x86_64
Jun 03 11:51:52 Updated: python-django-compressor-2.0-1.el7.noarch
Jun 03 11:51:53 Updated: ntp-4.2.6p5-22.el7.centos.2.x86_64
Jun 03 11:51:53 Updated: rabbitmq-server-3.6.2-3.el7.noarch

Once the cluster starts to come back online and services start then you know that is being executed.

After the stack is UPDATE_COMPLETE, you can check the rpc pin is set on nova.conf on all controllers:

[root@overcloud-controller-0 ~]# grep -rni upgrade -A 1 /etc/nova/*
/etc/nova/nova.conf-107-compute = mitaka

Upgrade compute and ceph nodes

This uses the script, to execute the on each non controller node, for example:

[stack@instack ~]$ --upgrade overcloud-novacompute-0

On both node types you can check that the yum update has been executed successfully. Note that the script is customized for each node type, so they will be different between computes and ceph nodes for example. However in all cases there will at some point be a yum -y update. See the and for more info on how else they might differ.

For compute nodes you can check that the upgrade_levels is set for the nova rpc pinning in /etc/nova/nova.conf (which in the case of computes is used by nova-compute itself, api/sched/conductor etc are on controller).

[root@overcloud-novacompute-0 ~]# grep -rni upgrade -A 1 /etc/nova/*
/etc/nova/nova.conf-107-compute = mitaka

Upgrade converge - apply config deployment wide and restart things.

The last step in the upgrade workflow is where we re-apply the deployment-wide config as specified by the tripleo-heat-templates used in the deploy/upgrade commands. It is instigated by including the major-upgrade-pacemaker-converge.yaml environment file, for example:

openstack overcloud deploy --templates /home/stack/tripleo-heat-templates
  -e /home/stack/tripleo-heat-templates/overcloud-resource-registry-puppet.yaml
  -e /home/stack/tripleo-heat-templates/environments/puppet-pacemaker.yaml
  --control-scale 3 --compute-scale 1 --libvirt-type qemu
  -e /home/stack/tripleo-heat-templates/environments/network-isolation.yaml
  -e /home/stack/tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml
  -e /home/stack/tripleo-heat-templates/environments/major-upgrade-pacemaker-converge.yaml
  -e network_env.yaml --ntp-server ''

For both major-upgrade-pacemaker-init.yaml (upgrade initialisation) as well as major-upgrade-pacemaker.yaml (controller upgrade) we specify for the resource registry:

OS::TripleO::ControllerPostDeployment: OS::Heat::None
OS::TripleO::ComputePostDeployment: OS::Heat::None
OS::TripleO::ObjectStoragePostDeployment: OS::Heat::None
OS::TripleO::BlockStoragePostDeployment: OS::Heat::None
OS::TripleO::CephStoragePostDeployment: OS::Heat::None

which means that things like the controller-config-pacemaker.yaml do not happen for controllers during those steps. That is, application of the overcloud_**.pp manifests does not happen during upgrade initialisation or controller upgrade.

However for converge we simply do not override this in the major-upgrade-pacemaker-converge.yaml environement file so that the normal puppet manifests get applied for each node, delivering any config changes (e.g. updates to liberty had to deal with a rabbitmq password change causing issues such as this).

Since we are applying new config we need to make sure everything is restarted properly to pick this up so we use the after the normal puppet manifests are applied.

So during this step, the pacemaker cluster will first go into an “unmanaged” state and this is to be expected and not a cause for alarm. This is because as a matter of practice, before applying the controller puppet manifest, we set he cluster to maintenance mode (as we are going to write to the pacemaker resource definitions/constraints to the cib) like this which uses the script here.

After the manifest is applied we unset maintenance mode here.

You should then see services restarting as is being executed. Seeing all the services running again at this point is a good indication that the converge step is coming to an end successfully.

June 02, 2016 03:00 PM

April 18, 2016

James Slagle

TripleO with already deployed servers

Recently I’ve been prototyping how to use TripleO with already deployed
and provisioned servers. In such a scenario, Nova and Ironic would not be used
to do the initial operating system provisioning of the Overcloud nodes.
Instead, the nodes would already be powered on, running an OS, and ready to
start to configure OpenStack.

There are a couple of reasons why I find this worth prototyping. It would allow
users to make use of other provisioning systems and technologies, such as
Foreman, Cobbler, kickstart, etc. It also allows users or developers
to use other virtual infrastructure for testing, as it would be possible to
deploy to any virt instances where you may not be able to pxe provision.

It’s worth mentioning how this concept relates to other ongoing work in TripleO
such as OpenStack Virtual Baremetal (OVB) and split-stack. OVB is an effort to
use OpenStack itself to create virt instances as needed for TripleO testing.
The prototype I’ve explored could use OpenStack itself (as I’ll show), but it
doesn’t have to, as it can make use of any running server, including actual
baremetal. OVB also still exercises Nova and Ironic to do the provisioning,
where as the deployed server idea does not.

Split-stack is a concept of splitting the single overcloud stack in TripleO
into 2 or more stacks. The stacks would be split along primary
responsibilities, such as infrastructure provisioning, network configuration,
bootstrap configuration, and OpenStack configuration. Not all the stacks would
be required, so split-stack would also allow for using already deployed servers
that were provisioned with other tools. Split-stack is an architecture change
for TripleO, and is probably a little ways down the roadmap.

Instead, I wanted to prototype a solution that would fit relatively easily into
the existing architecture. To do so, it makes the OS::Nova::Server resource
pluggable in tripleo-heat-templates, via a OS::TripleO::Server resource. By
default, OS::TripleO::Server is just mapped back to OS::Nova::Server.

To use already deployed servers, I use Heat’s resource-registry to
alternatively map OS::TripleO::Server to a new nested stack called
deployed-server.yaml. This nested stack has no OS::Nova::Server resources, so
no nova servers will be created.

It needs to have the same interface (properties/outputs) as OS::Nova::Server so
that’s it’s a pluggable replacement in the templates. To do so it applies some
SoftwareDeployments to the deployed servers to query for their hostnames and IP
addresses to set as outputs on the stack as those values are needed elsewhere
in the templates.

In essence, how it works is that the SoftwareDeployments used to apply the
network configuration and puppet manifests to the overcloud nodes will be
associated with this nested stack instead of an instance of OS::Nova::Server.

The deployed servers will be configured out of band to query for available
SoftwareDeployments for their associated nested stack, and they’ll then run the
necessary hooks (puppet/script/os-apply-config) to apply the configuration to
create an overcloud.

There are a few other patches needed to enable this all to work, I won’t detail
all those here, but I used a single topic branch called “deployed-server” in
gerrit so they’re all grouped together.

Configuring the networking on the servers can be a bit of a challenge
depending on the infrastructure in use. For instance, if you can’t route
traffic for a private subnet due to the firewall configuration that is outside
your control, it makes things a bit more difficult. In those cases, tunnels or
vpn’s could be used. I plan to detail some of the networking configurations in
a later post.

To test it out initially though, I decided to use the Rackspace public cloud
where I could create a private network with its own dedicated subnet that I
controlled. I hadven’t actually directly used the Rackspace public cloud in a few months, overall I was really pleased with the web Control Panel and the performance of the

I created a new network and called it “cltplane”, and gave it the default subnet that TripleO uses for deployment:


I then created 3 servers in the cloud, and made sure to attach each one to the
ctlplane network that I had created. I used the “7.5 GB Compute v1” flavor,
which has 4 vcpus and 7.5 GB ram. Of the 3 servers, one would be the
undercloud, and the other 2 would be for the overcloud nodes.


On the undercloud server, I just installed a normal TripleO undercloud using
the standard process. For the local_interface configuration setting, I
specified eth2, since that was the interface connected to the ctlplane network
I had created in the cloud.

For the 2 deployed servers, I launched a vanilla Centos 7 image offered by
Rackspace in their cloud. Once the servers were up, I used a script to add the
needed packages and initial configuration to the servers. The goal of the
script is just to make the instance look the same as the initial overcloud-full
image — nothing more than that. The bulk of that work just makes use of
instack to apply the same elements that are used in the diskimage-builder build
of overcloud-full.

At this point, I’m ready to start the overcloud deployment.

Here’s what my deployment command looks like:

openstack overcloud deploy
 --control-scale 1 \
 --compute-scale 1 \
 --overcloud-ssh-user root \
 --ntp-server \
 --templates /home/stack/deployed-server/tripleo-heat-templates \
 -e /home/stack/deployed-server/tripleo-heat-templates/environments/puppet-pacemaker.yaml \
 -e /home/stack/deployed-server/tripleo-heat-templates/deployed-server/deployed-server-environment.yaml \
 -e /home/stack/deployed-server/deployed-server-hosts.yaml"

And the contents of deployed-server-hosts.yaml:

 OS::TripleO::Controller::Net::SoftwareConfig: /home/stack/deployed-server/tripleo-heat-templates/net-config-static-bridge.yaml
 OS::TripleO::Compute::Net::SoftwareConfig: /home/stack/deployed-server/tripleo-heat-templates/net-config-static.yaml

 NeutronPublicInterface: nic3
 HypervisorNeutronPublicInterface: nic3
 ControlPlaneDefaultRoute: ""
 ControlPlaneSubnetCidr: "24"
 EC2MetadataIp: ""

One aspect here is that we still need to configure os-collect-config on each of
the overcloud nodes, but we can’t do that until we know the unique nested stack id’s
to query for SoftwareDeployment data. So, once those stacks are created, we can
look up their uuid’s and go configure os-collect-config on each already
deployed server. I wrote another script to do that automatically.

Once that is done, the servers start pulling configuration from Heat, and the
overcloud stack should run to CREATE_COMPLETE.

Now, it of course took me a few iterations to get this working, but once it did
and the overcloud finishes, here is what you’re left with:

# On the undercloud
[stack@undercloud ~]$ source stackrc

[stack@undercloud ~]$ nova list
| ID | Name | Status | Task State | Power State | Networks |

[stack@undercloud ~]$ ironic node-list
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |

[stack@undercloud ~]$ openstack stack list
| ID                                   | Stack Name | Stack Status    | Creation Time       | Updated Time |
| 798e62b5-0e59-4cce-b3c7-b3f4c5ee7862 | overcloud  | CREATE_COMPLETE | 2016-04-17T13:46:06 | None         |

No nova servers deployed, but we have a CREATE_COMPLETE stack :).

Here’s how the deployed-server nested stack looks as a Heat resource:

[stack@undercloud ~]$ openstack stack resource list overcloud
| resource_name                             | physical_resource_id                          | resource_type                                     | resource_status | updated_time        |
| Controller                                | 1ae04997-b1ec-4ebe-bef7-5831b9169638          | OS::Heat::ResourceGroup                           | CREATE_COMPLETE | 2016-04-17T13:46:07 |
| Compute                                   | 37a935df-2de0-4b88-8731-2f0b2039098c          | OS::Heat::ResourceGroup                           | CREATE_COMPLETE | 2016-04-17T13:46:07 |

[stack@undercloud ~]$ openstack stack resource list 1ae04997-b1ec-4ebe-bef7-5831b9169638
| resource_name | physical_resource_id                 | resource_type           | resource_status | updated_time        |
| 0             | 0b52fbc9-154e-44fe-90c7-df9972de828b | OS::TripleO::Controller | CREATE_COMPLETE | 2016-04-17T13:46:24 |

[stack@undercloud ~]$ openstack stack resource list 0b52fbc9-154e-44fe-90c7-df9972de828b
| resource_name            | physical_resource_id                 | resource_type                                   | resource_status | updated_time        |
| Controller               | 85a628ea-d606-482e-85c2-36fdde9028a6 | OS::TripleO::Server                             | CREATE_COMPLETE | 2016-04-17T13:46:25 |

[stack@undercloud ~]$ openstack stack resource list 85a628ea-d606-482e-85c2-36fdde9028a6
| resource_name        | physical_resource_id                 | resource_type                     | resource_status | updated_time        |
| deployed-server      | 6f33d5f9-ebed-4660-9962-21c98892b92e | OS::TripleO::DeployedServerConfig | CREATE_COMPLETE | 2016-04-17T13:46:29 |
| HostsEntryDeployment | 96fc194a-2278-4e2d-aa5d-bb8f548ab1c9 | OS::Heat::SoftwareDeployment      | CREATE_COMPLETE | 2016-04-17T13:46:29 |
| InstanceIdDeployment | 8b79d743-163d-4556-a905-b999c8899411 | OS::Heat::StructuredDeployment    | CREATE_COMPLETE | 2016-04-17T13:46:29 |
| InstanceIdConfig     | 31acdf25-96eb-4bf6-8c15-a3d4bd61ac9c | OS::Heat::StructuredConfig        | CREATE_COMPLETE | 2016-04-17T13:46:29 |
| ControlPlanePort     | b7535b97-a0c5-48b0-b6ad-d0a01bf833a0 | OS::Neutron::Port                 | CREATE_COMPLETE | 2016-04-17T13:46:29 |
| HostsEntryConfig     | 7390c1e3-1d42-4d13-8c5a-34738c5e7c17 | OS::Heat::SoftwareConfig          | CREATE_COMPLETE | 2016-04-17T13:46:29 |

[stack@undercloud ~]$ openstack stack resource list 6f33d5f9-ebed-4660-9962-21c98892b92e
| resource_name          | physical_resource_id                 | resource_type            | resource_status | updated_time        |
| deployed-server-config | afa2a513-75db-4507-b8d5-922203d5db8c | OS::Heat::SoftwareConfig | CREATE_COMPLETE | 2016-04-17T13:46:30 |

Let’s have a look at the neutron ports created:

[stack@undercloud ~]$ neutron port-list
| id                                   | name                            | mac_address       | fixed_ips                                                                         |
| 0068899e-13d1-4c3f-9b27-ee1f66f5a2a7 | redis_virtual_ip                | fa:16:3e:41:bc:8a | {"subnet_id": "2d01b5fa-450c-484d-8a86-6e023721f08e", "ip_address": ""} |
| 6b50e4e0-defb-4740-a4f3-86c9a571651d | deployed-server-1-ctlplane-port | fa:16:3e:3a:91:17 | {"subnet_id": "2d01b5fa-450c-484d-8a86-6e023721f08e", "ip_address": ""} |
| 708953bc-5fc5-4b6a-8713-07b52cff871b |                                 | fa:16:3e:91:51:9a | {"subnet_id": "2d01b5fa-450c-484d-8a86-6e023721f08e", "ip_address": ""}  |
| bbb6c5e8-09d2-4ebf-918e-c9b22bfe50fd | control_virtual_ip              | fa:16:3e:65:19:e6 | {"subnet_id": "2d01b5fa-450c-484d-8a86-6e023721f08e", "ip_address": ""} |
| d9144e0a-dd83-482f-a2a5-33860ea58864 | deployed-server-2-ctlplane-port | fa:16:3e:25:dd:d9 | {"subnet_id": "2d01b5fa-450c-484d-8a86-6e023721f08e", "ip_address": ""} |

Now, let’s examine the deployed overcloud using the generated overcloudrc:

[stack@undercloud ~]$ source overcloudrc 

[stack@undercloud ~]$ nova service-list
| Id | Binary           | Host                          | Zone     | Status  | State | Updated_at                 | Disabled Reason |
| 1  | nova-scheduler   | deployed-server-1             | internal | enabled | up    | 2016-04-17T14:10:48.000000 | -               |
| 7  | nova-consoleauth | deployed-server-1             | internal | enabled | up    | 2016-04-17T14:10:46.000000 | -               |
| 8  | nova-conductor   | deployed-server-1             | internal | enabled | up    | 2016-04-17T14:10:53.000000 | -               |
| 9  | nova-compute     | deployed-server-2.localdomain | nova     | enabled | up    | 2016-04-17T14:10:44.000000 | -               |

[stack@undercloud ~]$ nova hypervisor-list
| ID | Hypervisor hostname           | State | Status  |
| 1  | deployed-server-2.localdomain | up    | enabled |

If we ssh into the controller we can see the right IP addresses applied (including the VIP’s):

[stack@undercloud ~]$ ssh root@
Last login: Mon Apr 18 13:08:17 2016 from

[root@deployed-server-1 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet scope host lo
    valid_lft forever preferred_lft forever
 inet6 ::1/128 scope host 
    valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
 link/ether bc:76:4e:21:17:15 brd ff:ff:ff:ff:ff:ff
 inet brd scope global eth0
    valid_lft forever preferred_lft forever
 inet6 2001:4802:7806:102:be76:4eff:fe21:1715/64 scope global 
    valid_lft forever preferred_lft forever
 inet6 fe80::be76:4eff:fe21:1715/64 scope link 
    valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
 link/ether bc:76:4e:21:28:db brd ff:ff:ff:ff:ff:ff
 inet brd scope global eth1
    valid_lft forever preferred_lft forever
 inet6 fe80::be76:4eff:fe21:28db/64 scope link 
    valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP qlen 1000
 link/ether bc:76:4e:21:0d:b3 brd ff:ff:ff:ff:ff:ff
 inet6 fe80::be76:4eff:fe21:db3/64 scope link 
    valid_lft forever preferred_lft forever
5: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN 
 link/ether d6:fa:eb:d5:5f:8d brd ff:ff:ff:ff:ff:ff
6: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
 link/ether bc:76:4e:21:0d:b3 brd ff:ff:ff:ff:ff:ff
 inet brd scope global br-ex
    valid_lft forever preferred_lft forever
 inet brd scope global br-ex
    valid_lft forever preferred_lft forever
 inet brd scope global br-ex
    valid_lft forever preferred_lft forever
 inet6 fe80::be76:4eff:fe21:db3/64 scope link 
    valid_lft forever preferred_lft forever

We have a successful overcloud stack with the proper networking applied and a
functioning OpenStack deployed. My next steps would be to test this prototype
further using a full 3 node HA cluster and network isolation with separate

Overall, I think this is a useful concept. I could see it being used in
additional ways as well such as using Heat to configure the undercloud or being
able to test TripleO using regular nodepool instances.

by slagle at April 18, 2016 02:17 PM

Last updated: February 27, 2017 06:41 AM

TripleO: OpenStack Deployment   Documentation | Code Reviews | CI Status | CI Extended | Planet