Planet TripleO


October 30, 2018

Emilien Macchi

OpenStack Containerization with Podman – Part 3 (Upgrades)

For this third episode, here are some thoughts on how upgrades from Docker to Podman could work for us in OpenStack TripleO. Don’t miss the first and second episodes where we learnt how to deploy and operate Podman containers.

Edit: the upstream code merged and we finally decided we wouldn’t remove the container during the migration from Docker to Podman. We would only stop it, and then remove containers at the end of the upgrade process. The principles remain the same and the demo is still valid at this point.

I spent some time this week to investigate how we could upgrade the OpenStack Undercloud that is running Docker containers to run Podman containers, without manual intervention nor service disruption. The way I see it as this time (the discussion is still ongoing), is we could remove the Docker containers in Paunch, just before starting the Podman containers and service in Systemd. It would be done per container, in serial.

for container in containers:
    docker rm container
    podman run container
    create systemd unit file && enable service

In the follow demo, you can see the output of openstack undercloud upgrade with a work in progress prototype. You can observe the HAproxy running in Docker, and during the Step 1 of containers deployment, the container is stopped (top right) and immediately started in Podman (bottom right).

You might think “that’s it?”. Of course not. There are still some problems that we want to figure out:

  • Migrate containers not managed by Paunch (Neutron containers, Pacemaker-managed containers, etc).
  • Whether or not we want to remove the Docker container or just stop (in the demo the containers are removed from Docker).
  • Stopping Docker daemon at the end of the upgrade (will probably be done by upgrade_tasks in Docker service from TripleO Heat Templates).

The demo is a bit long as it shows the whole upgrade output. However if you want to see when HAproxy is stopped from Docker and started in Podman, go to 7 minutes. Also don’t miss the last minute of the video where we see the results (podman containers, no more docker containers managed by Paunch, and SystemD services).

Thanks for following this series of OpenStack / Podman related posts. Stay in touch for the next one! By the way, did you know you could follow our backlog here? Any feedback on these efforts are warmly welcome!

by Emilien at October 30, 2018 04:02 PM

October 05, 2018

Emilien Macchi

OpenStack Containerization with Podman – Part 2 (SystemD)

In the first post, we demonstrated that we can now use Podman to deploy a containerized OpenStack TripleO Undercloud. Let’s see how we can operate the containers with SystemD.

Podman, by design, doesn’t have any daemon running to manage the containers lifecycle; while Docker runs dockerd-current and docker-containerd-current which take care of a bunch of things, such as restarting the containers when they are in failure (and configured to do it, with restart policies).

In OpenStack TripleO, we still want our containers to restart when they are configured to, so we thought about managing the containers with SystemD. I recently wrote a blog post about how Podman can be controlled by SystemD, and we finally implemented it in TripleO.

The way it works, as of today, is that any container managed by Podman with a restart policy in Paunch container configuration, will be managed by SystemD.

Let’s take the example of Glance API. This snippet is the configuration of the container at step 4:

        - glance_api:
            start_order: 2
            image: *glance_api_image
            net: host
            privileged: {if: [cinder_backend_enabled, true, false]}
            restart: always
              test: /openstack/healthcheck
            volumes: *glance_volumes

As you can see, the Glance API container was configured to always try to restart (so Docker would do so). With Podman, we re-use this flag and we create (+ enable) a SystemD unit file:

Description=glance_api container
ExecStart=/usr/bin/podman start -a glance_api
ExecStop=/usr/bin/podman stop -t 10 glance_api

How it works underneath:

  • Paunch will run podman run to start the container, during the deployment steps.
  • If there is a restart policy, Paunch will create a SystemD unit file.
  • The SystemD service is named by the container name, so if you were used to the old services name before the containerization, you’ll have to refresh your mind. By choice, we decided to go with the container name to avoid confusion with the podman ps output.
  • Once the containers are deployed, they need to be stopped / started / restarted by SystemD. If you run Podman CLI to do it, SystemD will take over (see in the demo).

Stay in touch for the next post in the series of deploying TripleO and Podman!

by Emilien at October 05, 2018 10:10 PM

October 02, 2018

Cedric Jeanneret

Tripleo-lab, Podman and TripleO, a love story

Still working on the integration of Podman in TripleO. Yeah, it’s a long and tricky road to success. But there are some really nice outcomes.

Tripleo-lab also allows to build or install custom packages. The built is based on the official OpenStack Gerrit, and uses the official tripleo-ci tools.

How can we use all of that? It’s really simple. Let’s say we want to deploy an undercloud with podman, using custom tripleo-heat-templates and openstack-selinux packages. In Tripleo-lab, create a “local_env” directory, and add some files in it:

Describe what instance you want to build

# local_env/1under.yaml
  - name: undercloud
    cpu: 6
    memory: 20000
    disksize: 100
      - mac: "24:42:53:21:52:15"
      - mac: "24:42:53:21:52:16"
    autostart: yes

Ensure you’re using the latest packages from Master

# local_env/master.yaml
tripleo_version: master

Set podman as container CLI

# local_env/podman.yaml
  - section: DEFAULT
    option: container_cli
    value: podman

Fetch and install custom packages based on changes

# local_env/patches.yaml
  - name: 'tripleo-heat-templates'
    refs: '35/600535/16'

Note the “local_env” directory is ignored by default from the git repository.

Once you have those files with the wanted content, just launch ansible-playbook:

ansible-playbook builder.yaml -e @local_env/1under.yaml \
	-e @local_env/master.yaml \
	-e @local_env/podman.yaml \
	-e @local_env/patches.yaml \
	-t lab

For now, there are still a “small” issue with podman, as apparently some containers want to load kernel modules, and this action require elevated privileges as well as the absence of selinux separation. I’m currently working on the removal of those nasty calls, at least for the known modules coming from kolla.

Another row of patches are being prepared in order to load them from within tripleo-heat-templates instead, as a “host_prep_tasks”. I just need to make those modules persistent across reboots - for now it’s not the case with the current set of patches.

But in the end, it should all work as expected :). The kolla thing isn’t 100% necessary, as the “modprobe” command is smart enough to NOT try to reload an already loaded module, so if we load it from the host before the container starts, we’re safe, but still. Not having modprobe calls from withing official containers is a good thing.

Happy hacking ;).

October 02, 2018 10:00 AM

September 26, 2018

Ben Nemec

OVB 1.0 and Upcoming Changes

The time has come to declare a 1.0 version for OVB. There are a couple of reasons for this:

  1. OVB has been stable for quite a while
  2. It's time to start dropping support for ancient behaviors/clouds

The first is somewhat self-explanatory. Since its inception, I have attempted to maintain backward compatibility to the earliest deployments of OVB. This hasn't always been 100% successful, but when incompatibilities were introduced they were considered bugs that had to be fixed. At this point the OVB interface has been stable for a significant period of time and it's time to lock that in.

However, on that note it is also time to start dropping support for some of those earliest environments. The limitations of the original architecture make it more and more difficult to implement new features and there are very few to no users still relying on it. Declaring a 1.0 and creating a stable branch for it should allow us to move forward with new features while still providing a fallback for anyone who might still be using OVB on a Kilo-based cloud (for example). I'm not aware of any such users, but that doesn't mean they don't exist.

Specifically, the following changes are expected for OVB 2.0:

  • Minimum host cloud version of Newton. This allows us to default to using Neutron port-security, which will simplify the configuration matrix immensely.
  • Drop support for parameters in environment files. All OVB configuration environment files should be using parameter_defaults now anyway, and some upcoming features require us to force the switch. This shouldn't be too painful as it mostly requires s/parameters:/parameter_defaults:/ in any existing environments.
  • Part of the previous point is a change to how ports and networks are created. This means that if users have created custom port or network layouts they will need to update their templates to reflect the new way of passing in network details. I don't know that anyone has done this, so I expect the impact to be small.

The primary motivation for these changes is the work to support routed networks in OVB. It requires customization of some networks that were hard-coded in the initial version of OVB, which means that making them configurable without breaking compatibility would be difficult/impossible. Since the necessary changes should only break very old style deployments, I feel it is time to make a clean cut and move on from them. As I noted earlier, I don't believe this will actually affect many OVB users, if any.

If these changes do sound like they may break you, please contact me ASAP. It would be a good idea to test your use-case against the routed-networks branch to make sure it still works. If so, great! There's nothing to do. That branch already includes most of the breaking changes. If not, we can investigate how to maintain compatibility, or if that's not possible you may need to continue using the 1.0 branch of OVB which will exist indefinitely for users who still absolutely need the old behaviors and can't move forward for any reason. There is currently no specific timeline for when these changes will merge back to master, but I hope to get it done in the relatively near future. Don't procrastinate. :-)

Some of these changes have been coming for a while - the lack of port-security in the default templates is starting to cause more grief than maintaining backward compatibility saves. The routed networks architecture is a realization of the original goal for OVB, which is to deploy arbitrarily complex environments for testing deployment tools. If you want some geek porn, check out this network diagram for routed networks. It's pretty cool to be able to deploy such a complex environment with a couple of configuration files and a single command. Once it is possible to customize all the networks it should be possible to deploy just about any environment imaginable (challenge accepted... ;-). This is a significant milestone for OVB and I look forward to seeing it in action.

by bnemec at September 26, 2018 03:55 PM

Juan Antonio Osorio

Oslo Policy Deep Dive (part 2)

In the previous blog post I covered all you need to know to write your own policies and understand where they come from.

Here, We’ll go through some examples of how you would change the policy for a service, and how to take that new policy into use.

For this, I’ve created a repository to try things out and hopefully get you practicing this kind of thing. Of course, things will be slightly different in your environment, depending on how you’re running OpenStack. But you should get the basic idea.

We’ll use Barbican as a test service to do basic policy changes. The configuration that I’m providing is not meant for production, but it makes it easier to make changes and test things out. It’s a very minimal and simple barbican configuration that has the “unauthenticated” context enabled. This means that it doesn’t rely on keystone, and it will use whatever roles and project you provide in the REST API.

The default policy & how to change it

As mentioned in the previous blog post, nowadays, the default policy in “shipped” as part of the codebase. For some services, folks might still package the policy.json file. However, for our test service (Barbican), this is not the case.

You can easily overwrite the default policy by providing a policy.json file yourself. By default, oslo.policy will read the project’s base directory, and try to get the policy.json file from there. For barbican, this will be /etc/barbican/policy.json. For keystone, /etc/keystone/policy.json.

It is worth noting that this file is configurable by setting the policy_file setting in your service’s configuration, which is under the oslo_policy group of the configuration file.

If you have a running service, and you add or modify the policy.json file, the changes will immediately take effect. No need to restart nor reload your service.

The way this works is that olso.policy will attempt to read the file’s modification time (using os.path.getmtime(filename)), and cache that. If on a subsequent read, the modification time has changed, it’ll re-read the policy file and load the new rules.

It is also worth noting that when using policy.json, you don’t need to provide the whole policy, only the rules and aliases you’re planning to change.

If you need to get the policy of a specific service, it’s fairly straightforward given the tools that oslo.policy provides. All you need to do is the following:

oslopolicy-policy-generator --namespace $SERVICE_NAME

It is important to note that this will get you the effective policy that’s being executed. So, any changes that you make to the policy will be reflected in the output of this command.

If you want to get a sample file for the default policy with all the documentation for each option, you’ll do the following:

oslopolicy-sample-generator --namespace $SERVICE_NAME

So, in order to output Barbican’s effective policy, we’ll do the following:

oslopolicy-policy-generator --namespace barbican

Note that this outputs the policy in yaml format, and oslo.policy reads policy.json by default, so you’ll have to tranform such file into json to take it into use.

Setting up the testing environment

NOTE: If you just plan to read through this and not actually do the exercises, you may skip this section.

Lets clone the repository first:

git clone
cd barbican-policy-tests

Now that we’re in the repo, you’ll notice several scripts there. To provide you with a consistent environemnt, I decided to rely on containeeeers!!! So, in order to continue, you’ll need to have Docker installed in your system.

(Maybe in the future I’ll update this to run with Podman and Buildah)

To build the minimal barbican container, execute the following:


You can verify that you have the barbican-minimal image with the latest tag by running docker images.

To test that the image was built correctly and you can run barbican, execute the following:


You will notice barbican is running, and can see the name of its container with docker ps. You’ll notice its listening on the port 9311 on localhost.



In the following exercises, we’ll do some changes to the Barbican policy. To do this, it’s worth understanding some things about the service and the policy itself.

Barbican is Secret Storage as a service. To simplify things, we’ll focus on the secret storage side of things.

There are the operations you can do on a secret:

  • secrets:get: List all secrets for the specific project.

  • secrets:post: Create a new secret.

  • secret:decrypt: Decrypt the specified secret.

  • secret:get: Get the metadata for the specified secret.

  • secret:put: Modify the specified secret.

  • secret:delete: Delete the specified secret.

Barbican also assumes 5 keystone roles, and bases its policy on the usage of these roles:

  • admin: Can do all operations on secrets (List, create, read, update, delete and decrypt)

  • creator: Can do all operations on secrets; This role is limited on other resources (such as secret containers), but we’ll ignore other resources in this exercises.

  • observer: In the context of secrets, observers can only list secrets and view a specific secret’s metadata.

  • audit: In the context of secrets, auditors can only view a specific secret’s metadata (but cannot do anything else).

  • service_admin: can’t do anything related to secrets. This role is meant for admin operations that change the Barbican service itself (such as quotas).

The Barbican default policy also comes with some useful aliases as defaults:

"admin": "role:admin",
"observer": "role:observer",
"creator": "role:creator",
"audit": "role:audit",
"service_admin": "role:key-manager:service-admin",

So this makes overwriting specific roles fairly straight forward.

Scenario #1

The Keystone default roles proposal proposes the usage of three roles, which should also work with all OpenStack services. These roles are: reader, member and admin.

Lets take this into use in Barbican, and replace our already existing observer role, for reader.

In this case, we can take the alias into use, by doing very minimal changes, we can replace the usage of observer entirely.

I have already defined this role in the aforementioned repo, lets take a look:

"observer": "role:reader"

And that’s it!

Now in the barbican policy, every instance of the “rule:observer” assertion will actually reference the “reader” role.

Testing scenario #1

There is already a script that runs barbican and takes this policy into use. Lets run it, and verify that we can effectively use the reader role instead of the observer role:

# Run the container

# Create a sample secret

# Attempt to list the available secrets with the reader role. This
# operation should succeed.
./ reader

# Attempt to list the available secrets with the observer role. This
# operation should fail.
./ observer

# Once you're done, you can stop the container

Scenario #2

Barbican’s audit role is meant to only read a very minimal set of things from the barbican’s entities. For some, this role might not be very useful, and it also doesn’t fit with Keystone’s set of default roles, so lets delete it!

As before, I have already defined a policy for this purpose:

"audit": "!"

As you can see, this replace the audit alias, and any attempt to use it will be rejected in the policy, effectively dissallowing the audit role use.

Testing scenario #2

There is already a script that runs barbican and takes this policy into use. Lets run it, and verify that we can effectively no longer use the audit role:

# run the container

# create a secret

# Attempt to view the secret metadata with the creator role. This
# operation should succeed.
curl -H 'X-Project-Id: 1234' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<some ID> | python -m json.tool

# Attempt to view the secret metadata with the audit role. This
# operation should fail.
curl -H 'X-Project-Id: 1234' -H 'X-Roles: audit' \
    http://localhost:9311/v1/secrets/<some ID> | python -m json.tool

# Once you're done, you can stop the container

Scenario #3

Now that we have tried a couple of things and it has gone fine. Lets put it all together and replicate the Keystone default role recommendation.

Here’s what we’ll do: As before, we’ll replace the observer role with reader. We’ll also replace the creator role with member, and finally, we’ll remove the audit role.

Here’s the policy file:

"observer": "role:reader",
"creator": "role:member",
"audit": "!"

This time, we’ll change the policy file in-place, as this is something you might need to do or automate in your own deployment.

Testing scenario #3

Here, we’ll run a minimal container that doesn’t take any specific policy into use. We’ll log into it, modify the policy.json file, and test out the results.

# Run the container

# Open a bash session in the container
docker exec -ti (docker ps | grep barbican-minimal | awk '{print $1}') bash

# (In the container) Create the new policy file
cat <<EOF > /etc/barbican/policy.json
"observer": "role:reader",
"creator": "role:member",
"audit": "!"

# (In the container) Exit the container

# Attempt to create a sample secret with the creator role. This operation
# should fail
./ creator

# Attempt to create a sample secret with the member role. This operation
# should succeed
./ member

# Attempt to list the available secrets with the observer role. This
# operation should fail.
./ observer

# Attempt to list the available secrets with the reader role. This
# operation should succeed.
./ reader

# Attempt to view the secret metadata with the audit role. This
# operation should fail.
curl -H 'X-Project-Id: 1234' -H 'X-Roles: audit' \
    http://localhost:9311/v1/secrets/<some ID> | python -m json.tool

# Attempt to view the secret metadata with the creator role. This
# operation should fail.
curl -H 'X-Project-Id: 1234' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<some ID> | python -m json.tool

# Attempt to view the secret metadata with the member role. This
# operation should succeed.
curl -H 'X-Project-Id: 1234' -H 'X-Roles: member' \
    http://localhost:9311/v1/secrets/<some ID> | python -m json.tool

# Once you're done, you can stop the container

Scenario #4

For our last case, lets assume that for some reason you need a “super-admin” role that is able to read everybody’s secret metadata. There is no equivalent of this role in Barbican, so we’ll have to modify more things in order to get this to work.

To simplify things, we’ll only modify the GET operation for secret metadata.

Please note that this is only done for learning purposes, do not try this in production.

First thing we’ll need is to retrieve the policy line that actually gets executed for secret metadata. In Barbican, it’s the secret:get policy.

From whithin the container, or if you have the barbican package installed somewhere, you can do the following in order to get this exact policy:

oslopolicy-policy-generator --namespace barbican | grep "secret:get"

This will get us the following line:

"secret:get": "rule:secret_non_private_read or rule:secret_project_creator or rule:secret_project_admin or rule:secret_acl_read"

Note that in the barbican policy, we explicitly check for most users that the user is in the same project as the project that the secret belongs to. In this case, we’ll omit this in order to enable the “super-admin” to retrieve any secret’s metadata.

Here is the final policy.json file we’ll use:

"super_admin": "role:super-admin",
"secret:get": "rule:secret_non_private_read or rule:secret_project_creator or rule:secret_project_admin or rule:secret_acl_read or rule:super_admin"

Testing scenario #4

Here, we’ll run a minimal container that doesn’t take any specific policy into use. We’ll log into it, modify the policy.json file, and test out the results.

# Run the container

# Open a bash session in the container
docker exec -ti (docker ps | grep barbican-minimal | awk '{print $1}') bash

# (In the container) Lets verify what the current policy is for "secret:get".
# This should output the default rule.
oslopolicy-policy-generator --namespace barbican | grep "secret:get"

# (In the container) Create the new policy file
cat <<EOF > /etc/barbican/policy.json
"super_admin": "role:super-admin",
"secret:get": "rule:secret_non_private_read or rule:secret_project_creator or rule:secret_project_admin or rule:secret_acl_read or rule:super_admin"

# (In the container) Lets verify what the current policy is for "secret:get".
# This should output the updated policy.
oslopolicy-policy-generator --namespace barbican | grep "secret:get"

# (In the container) Exit the container

# Lets now create a couple of secrets with the creator role in the default
# project (1234).

# This will be secret #1
./ creator
# This will be secret #2
./ creator

# Lets now create a couple of secrets with the creator role in another project
# (1111).

# This will be secret #3
./ creator 1111

Using the creator role and project ‘1234’, you should only be able to retrieve secrets #1 and #2, but should get an error with secret #3.

# So... this should work
curl -H 'X-Project-Id: 1234' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<secret #1> | python -m json.tool

# this should work
curl -H 'X-Project-Id: 1234' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<secret #2> | python -m json.tool

# ...And this should fail
curl -H 'X-Project-Id: 1234' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<secret #3> | python -m json.tool

Using the creator role and project ‘1111’, you should only be able to retrieve secret #3, but should get an error with secrets #1 and #2

# So... this should fail
curl -H 'X-Project-Id: 1111' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<secret #1> | python -m json.tool

# this should fail
curl -H 'X-Project-Id: 1111' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<secret #2> | python -m json.tool

# ...And this should work
curl -H 'X-Project-Id: 1111' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<secret #3> | python -m json.tool

Finally, lets try our new super-admin role. As you will notice, you don’t even need to be part of the projects to get the metadata:

# So... this should work
curl -H 'X-Project-Id: POLICY' -H 'X-Roles: super-admin' \
    http://localhost:9311/v1/secrets/<secret #1> | python -m json.tool

# this should work
curl -H 'X-Project-Id: IS' -H 'X-Roles: super-admin' \
    http://localhost:9311/v1/secrets/<secret #2> | python -m json.tool

# ...And this should work too
curl -H 'X-Project-Id: COOL' -H 'X-Roles: super-admin' \
    http://localhost:9311/v1/secrets/<secret #3> | python -m json.tool


You have now learned how to do simple modifications to your service’s policy!

With great power comes great responsibility… And all those things… But seriously, be careful! You might end up with unintended results.

In the next blog post, we’ll cover implied roles and how you can use them in your policies!

September 26, 2018 02:01 PM

Cedric Jeanneret

Working on Podman integration in TripleO: SELinux in da place

Working on TripleO deploy framework is probably the most interesting thing you might want to do. It allows you to discover a bunch of new things almost every week if not day.

In my case, although I knew the “SELinux” name and its purpose, I never really worked with it. I knew RHEL has it, and enforces the policies, and that it’s the same case for CentOS. But beyond that, I was clueless.

That changed dramatically once I got to work on Podman integration in TripleO.

Some basics: with the current release, we deploy the undercloud and overcloud using containers, with the Docker engine. It does work. But without the SELinux separation we can get using containers.

It was deactivated from the very beginning, meaning docker containers aren’t as isolated as we might think.

This tiny “hack” has been applied to the Docker daemon, and allows to avoid any SELinux issues when we bind-mount volumes in one or multiple containers.

“Unfortunately”, this hack doesn’t work with Podman, as that nasty boy doesn’t have a daemon, and no real way to get a global configuration.

This means we had two choices: either modify all the calls to the container engine in order to add the right option (--security-opt label=disable), or make it work with an enforcing SELinux.

I chose the latter. Of course, it took some time (about 4 weeks), because I had to:

  • understand how SELinux works
  • understand how SELinux works with containers
  • understand what was failing with the deploy
  • step-by-step correct the issues

If the two steps were easy (a couple of days), the next were really, really painful, as I had to launch a deploy each time and check in parallel what was going on in the audit.log file.

Also, an interesting difference between Podman and Docker: volumes. If a directory/file doesn’t exist on the host filesystem, Docker will create it. On the contrary, Podman will just fail. Unfrotunately for me, this docker capability was widely used, without knowing it was used…

In the end, a few patches were issued, and are being reviewed as I’m writing this blog post:

With all those patches, we’re able to deploy a complete, working undercloud, with added security, as we get proper SELinux separation for a vast majority of our containers. Some of them can’t currently run with that separation, but we’re still working on them, hoping to get a fine solution.

Of course, other patches were also involved, and we had to report issues to the Podman team - they are really responsive and concerned, meaning we could get a really fast answer and correction for every issue we got.

A really nice thing is, we should be able to re-enable separation with Docker as well, as the SELinux types are the same. Meaning: I’ve improved the overall security of the product. And that’s cool ;).

September 26, 2018 08:00 AM

September 19, 2018

Juan Antonio Osorio

Adding custom databases and database users in TripleO

For folks integrating with TripleO, it has been quite painful to always need to modify puppet in order to integrate with the engine. This has been typically the case for things like adding a HAProxy andpoint and adding a database and a database user (and grants). As mentioned in a previous post, this is no longer the case for HAProxy endpoints, and this ability has been in TripleO for a a couple of releases now.

With the same logic in mind, I added this same functionality for mysql databases and database users. And this relecently landed in Stein. So, all you need to do is add something like this to your service template:

          password: 'myPassword'
          dbname: 'mydatabase'
          user: 'myuser'
          host: {get_param: [EndpointMap, MysqlInternal, host_nobrackets]}
            - '%'
            - "%{hiera('mysql_bind_host')}"

This will create:

  • A database called mydatabase
  • A user that can access that database, called myuser
  • The user myuser will have the password myPassword
  • And grants will be created so that user can connect from the hosts specificed in the host and allowed_hosts parameters.

Now you don’t need to modify puppet to add a new service to TripleO!

September 19, 2018 04:50 AM

September 17, 2018

Marios Andreou

My summary of the OpenStack Stein PTG in Denver

My summary of the OpenStack Stein PTG in Denver

After only 3 take off/landings I was very happy to participate in the Stein PTG in Denver. This is a brief summary with pointers of the sessions or rooms I attended in the order they happened (Stein PTG Schedule)

Upgrades CI with the stand-alone deployment

We had a productive impromptu round table (weshay++) in one of the empty rooms with the tripleo ci folks present (weshay, panda, sshnaidm, arxcruz, marios) the tripleo upgrades folks present (chem and holser) as well emeritus PTL mwahaha around the stand-alone and how we can use it for upgrades ci. We introduced the proposed spec and one of the main topics discussed was, ultimately is it worth it, to solve all of these subproblems to only end up with some approximation of the upgrade?

The consensus was yes since we can have 2 types of upgrades job: use the stand-alone to ci the actual tasks, i.e. upgrade_tasks and deployment_tasks for each service in the tripleo-heat-templates, and another job (the current job which will be adapted) to ci the upgrades workflow tripleoclient/mistral workflows etc. There was general consensus in this approach between the upgrades and ci representatives so that we could try and sell it to the wider team in the tripleo room on wednesday together.

Upgrades Special Interest Group

Room etherpad.

Monday afternoon was spent in the upgrades SIG room. There was first discussion of the placement api extraction and how this would have to be dealt with during the upgrade, with a solution sketched out around the db migrations required.

This lead into discussion around pre-upgrade checks that could deal with things like db migrations (or just check if something is missing and fail accordingly before the upgrade). As I was reminded during the lunchtime presentations pre upgrade checks is one of the Stein community goals (together with python-3). The idea is that each service would own a set of checks that should be performed before an upgrade is run and that they would be invoked via the openstack client (sthing along the lines of ‘openstack pre-upgrade-check nova’ - I believe there is already some implementation (from the nova team) but I don’t readily have details.

There was then a productive discussion about the purpose and direction of the upgrades SIG. One of the points raised was that the SIG should not be just about the fast forward upgrade even though that has been a main focus until now. The pre-upgrade checks are a good example of that and the SIG will try and continue to promote these with adoption by all the OpenStack services. On that note I proposed that whilst the services themselves will own the service specific pre-upgrade checks, it’s the deployment projects which will own the pre-upgrade infrastructure checks, such as healthy cluster/database or responding service endpoints.

There was ofcourse discussion around the fast forward upgrade with status updates from the deployment projects present (kolla-ansible, TripleO, charms, OSA). TripleO is the only project with an implemented workflow at present. Finally there was a discussion about whether we’re doing better in terms of operator experience for upgrades in general and how we can continue to improve (e.g. rolling upgrades was one of the discussed points here).

Edge room

Room etherpad Room etherpad2 Use cases Edge primer

I was only in attendance for the first part of this session which was about understanding the requirements (and hopefully continuing to find the common ground). The room started with a review of the various proposed use cases from dublin and any review of work since then. One of the main points raised by shardy is that in TripleO whilst we have a number of exploratory efforts ongoing (like split controlplane for example) it would be good to have a specific architecture to aim for and that is missing currently. It was agreed that the existing use cases will be extended to include the proposed architecture and that these can serve as a starting point for anyone looking to deploy with edge locations.

There are pointers to the rest of the edge sessions in the etherpad above.

TripleO room

Room etherpad Team picture

The order of sessions was slightly revised from that listed in the etherpad above because the East coast storms forced folks to change travel plans. The following order is to the best of my recollection ;)

TripleO and Edge cloud deployments

Session etherpad

There was first a summary from the Edge room from shardy and then tripleo specific discussion around the current work (split controlplane). There was some discussion around possibly using/repurposing “the multinode job” for multiple stacks to simulate the Edge locations in ci. There was also discussion around the networking aspects (though this will depend on the architecture which we don’t yet have fully targetted) with respect to the tripleo deployment networks (controlplane/internalapi etc) in an edge deployment. Finally there was consideration of the work needed in tripleo-common and the mistral workflows needed for the split controlplane deployment.

OS / Platform

(tracked on main tripleo etherpad linked above)

The main items discussed here were Python 3 support, removing instack-undercloud and “that upgrade” to Centos8 on Stein.

For Python3 the discussion included the fact that in TripleO we are bound by whatever python the deployed services support (as well as what the upstream distribution will be i.e. Centos 7/8 and which python ships where).

For the Centos8/Stein upgrade the upgrades folks chem and holser lead the discussion outlining how we will need a completely new workflow, which may be dictated in large by how the Centos8 is delivered. One of the approaches discussed here was to use a completely external/distinct upgrade workflow for the OS, versus the TripleO driven OpenStack upgrade itself. We got into more details about this during the Baremetal session see below).

TripleO CI

Session etherpad

One of the first items raised was the stand-alone deployment and its use in ci. The general proposal is that we should use a lot more of it! In particular to replace existing jobs (like scenarios 1/2) with a standalone deployement.

There was also discussion around the stand-alone for the upgrades ci as we agreed with the upgrades folks on Monday (spec). The idea of service vs workflow upgrades was presented/solidified here and I have just updated v8 of the spec accordingly to emphasise this point.

Other points discussed in the CI session were testing ovb in infra and how we could make jobs voting. The first move will be towards removing te-broker.

There was also some consideration of the involvement of the ci team with other squads and vice versa. There is a new column in our trello board called “requests from other DFG”.

A further point raised was the reproducer scripts and future directions including running and not only generating this in ci. As related side note it sounds like folks are using the reproducer and having some successes.

Ansible / Framework

(tracked on main tripleo etherpad linked above)

In this session an overview of the work towards splitting out the ansible tasks from the tripleo-heat-templates into re-usable roles was given by jillr and slagle. More info and pointers in the the main tripleo etherpad above.


Session etherpad

Discussion around the workflow to change overcloud/service passwords (this is currently borked!). In particular problems around trying to CI this since the deploy takes too long to have deploy + stack update for the passwords and validation within the timeout. Possibly could be a 3rd party (but then non voting) job for now. There was also an overview of work towards using Castellan with TripleO, as well as discussion around selinux and locking down ssh.


Session etherpad

CLI/UI feature parity is a main goal for this cycle (and further probably it seems there is a lot to do) and plan management operations around this. Also good discussion around validations with Tengu joining remotely via Bluejeans to champion the effort of providing a nice way to run these via the tripleoclient.


Session etherpad

This session started with discussion around metalsmith vs nova on the undercloud and the required upgrade path to make this so. Also considered were the overcloud image customization and discussions around network automation (ansible with python-networking-ansible ml2 driver ).

However unexpectedly and the most interesting part of this session personally was an impromptu design session started by ipilcher (prompted by a question from phuongh who I believe was new to the room). The session was about the upgrade to Centos8 and three main approaches were explored, the “big bang” (everything off upgrade everything back), “some kind of rolling upgrade” and finally supporting either Centos8/Rocky or Centos7/Stein. The first and third were deemed unworkable but there was a very lively and well engaged group design session trying to navigate to a workable process for the ‘rolling upgrade’ aka split personality. Thanks to ipilcher (via bandini) the whiteboards looked like this.

September 17, 2018 03:00 PM

July 24, 2018

Carlos Camacho

Vote for the OpenStack Berlin Summit presentations!

¡¡¡Please vote!!!

I pushed some presentations for this year OpenStack summit in Berlin, the presentations are related to updates, upgrades, backups, failures and restores.

Happy TripleOing!

by Carlos Camacho at July 24, 2018 12:00 AM

June 07, 2018

Ben Nemec

Vancouver Summit - Deja Vu Edition

This was the first repeat OpenStack Summit location for me. While there have been repeat locations in the past, I wasn't at the first Summit at any of those locations. I think that means I'm getting old. :-)

There was a lot that had changed, and also a lot that stayed the same. The Vancouver Convention Center is still a fantastic venue, with plenty of space for sessions. And although I did attend all of the Oslo sessions, just like last time, we didn't over-schedule Oslo this time so I had a chance to attend some non-Oslo sessions as well. Since I'm also focusing on Designate these days, I made sure to catch all of those sessions, even the one at 5:30 PM on Thursday when everyone was a bit tired and ready to leave. And it was good - there was useful information in that presentation. I felt more productive at this Summit than last time, which is certainly a good thing.

With the intro out of the way, let's get down to the nuts and bolts of the sessions I attended.

Default Roles

This is a thing that many operators have been asking to have for quite a while. Unfortunately, early attempts were problematic because the state of policy in OpenStack was not good. Fortunately, since then most (if not all) projects have adopted oslo.policy which allows proper deprecation of policy rules so we can actually fix some of the issues. Currently there are three proposed default roles: auditor (aka read-only), member, and admin. While two of those already exist, apparently there are many bugs around how they work which should be cleaned up as part of this work, resulting in three actually usable roles. This will eventually be tested via the Patrole project, but there's no point in writing tests now for broken behavior so the testing will happen in parallel with the implementation.

Chances are that the default roles won't satisfy everyone, but the hope is that they can address the needs of the majority of operators (I think 80% was mentioned as a target) to reduce the pain of configuring a production OpenStack deployment. I know there were a few operators in the room who didn't feel the proposed roles would help them, and there was some post-session discussion that hopefully surfaced their concerns to the Keystone team who are driving this work.

Storyboard Migration

This is something that has been gaining momentum lately, with some larger teams either migrated or in the process of migrating to Storyboard. It was of particular interest to me as the Oslo PTL because I expect Oslo to be a fairly painful migration due to the sheer number of different Launchpad projects under the Oslo umbrella. However, it sounds like the migration process is becoming more mature as more projects make the move so by the time Oslo does migrate I have hope that it will go smoothly.

One significant request that came out of the session was some sort of redirect system that could point people who click on a bug reference in Gerrit to the appropriate location for projects that have migrated. I believe there was a suggestion that a separate server could be set up that had knowledge of which projects tracked bugs where and then could respond with appropriate redirects. Someone obviously would have to find the time to set that up though.

Migration to stestr

The current PTI for unit tests says to use stestr as the test runner. However, quite a few projects have not yet migrated, and this session was essentially to discuss why and how to move forward. One of the big "why"s was that people didn't know about it. A message was sent to openstack-dev, but I'm not sure it was clear to everyone that there was action required. Also, since most projects are still working, there's a natural "if it ain't broke, don't fix it" attitude toward this. Most of the projects that had been migrated were done by mtreinish, who has been driving this initiative.

Moving forward, there will be more of an emphasis on the benefits of moving to stestr and some work to provide a guide for how to do so. That should help get everyone into a consistent place on this.

Ops/Dev: One Community

Generally speaking, everyone recognizes that there needs to be a tighter feedback loop between developers and operators so that the people running OpenStack are getting the features and bugfixes they really need. In the past this hasn't always been the case, but things like co-locating the PTG and Ops meetup are intended to help. This was a discussion of how to further improve that relationship.

There was a point raised that developer events in the past had been perceived as (and perhaps were) hostile to operators on older releases. The good news is that this attitude seems to have softened considerably in the past few cycles, with initiatives like fast-forward upgrades and extended maintenance releases acknowledging that in the real world not everyone can do continuous deployment and always be on the latest bleeding edge code.

A lot of the discussion around future changes to improve this had to do with the current split in mailing lists. Creating a separation between users, operators, and developers of OpenStack is obviously not ideal. The main suggestion was to move to a more traditional -discuss and -announce split, with the latter being a low-traffic list just used for major announcements. There was some concern that even though the development mailing list is not quite as active as it was at its peak, there is still an order of magnitude more traffic there than on the other lists and it might become overwhelming if everyone just got dumped into it. Related to this, there was some discussion of moving to mailman3, which provides a forum-like interface to allow casual contributors to join discussions without subscribing to the full firehose of mailing list traffic. There were a few technical concerns with it, but overall it seemed promising.

Also out of this session came a side topic about part-time contributors and what to do in this new OpenStack world where many of the contributors to projects can't devote 100% of their time to a project. As there wasn't time in the session to cover this adequately, a separate hallway track discussion was scheduled, which I will cover later.

Python 2 Deprecation Timeline

Python 2 is going away. Currently OpenStack still runs primarily on Python 2, which means we have some work to do before early 2020 when upstream support for it ends. The first step will be to re-orient our testing to focus on Python 3. In the past, jobs ran on Python 2 by default and Python 3 jobs were the outliers. That has to change.

The current timeline is to have Python 3 tested as the primary target by the end of the Stein cycle, and then to have Python 3 in a state where it can become the only supported Python version by the end of the T cycle, so that Python 2 support can be dropped early in the U cycle. Assuming we hit these targets, they line up well with the upstream Python plans.

oslo.config drivers hack session

On Tuesday we held an unofficial Oslo hack session to work through some things on the oslo.config driver patches that are in progress. They are intended to add new functionality to oslo.config which will enable features such as storing config in a key-value store like etcd and moving secret data (passwords, tokens, etc.) to a more secure service accessible via Castellan. The details are documented in the link above, but overall I think we made good progress on the plan and identified some concrete actions needed to move the work forward.

Oslo Project Onboarding

This was the first time we've done an Oslo onboarding session, and all things considered I would say it went reasonably well. There was some digression in the discussion, which is understandable in a project as wide-ranging as Oslo. There was also some interest in getting involved though, so I think it was a worthwhile session. Most importantly, it wasn't me talking to an empty room for 40 minutes. :-)

Designate Project Update

For the most part I just recommend you go watch the video of Graham's presentation. He'll do a better job explaining Designate stuff than I can. However, one thing that was sort of specific to me from this session was that I met a couple of Designate deployers who had written Ansible playbooks to deploy it in a TripleO overcloud post-deployment. Obviously for full integration we want it done as part of the main deployment, but their work could definitely come in handy as we move to use more Ansible in TripleO.

OpenStack Messaging at the Edge

This was a followup to a presentation at the Dublin PTG that was exploring the use of different messaging technologies in Edge deployments. It isn't testing full OpenStack edge deployments, but it did simulate the messaging architecture you'd see in such a deployment. The biggest takeaway was that distributed messaging technology like Qpid Data Router can significantly improve performance in a widely distributed system versus a broker-based system like RabbitMQ.

Oslo Project Update

I don't have too much to say about this beyond what is already in the video linked from the session page. I do really need to stop saying "umm" so much when I speak though. ;-)

Mountain Biking Stanley Park

Okay, this wasn't OpenStack-related, but if you're into mountain biking then you know BC is a great place to do it, so it was hard to resist getting in a little time on the dirt when it was walking distance from the conference. I managed to escape with only minor scrapes and bruises after my long lunch ride.

Encouraging Part-time Contributors Hallway Track

As I mentioned earlier, this came up in one of the dev/ops sessions as a pain point. I had the opportunity to sit down with Julia Kreger, Tim Bell, and Thierry Carrez to try to identify some ways we could make it easier for new or occasional contributors to work on OpenStack. This is particularly important in the academic world where research contracts are often for a very specific, fixed period of time. If changes don't make it in during that window, they will tend to be abandoned.

A number of ideas were suggested, and ultimately we decided to focus on what we hoped would be the least controversial option to avoid the boil-the-ocean problem of attacking everything at once. To that end, we decided to propose a new policy targeted at reducing the amount of nit-picking in the review process. -1's over typos in code comments or the use of passive voice in specs do not meaningfully contribute to the quality of OpenStack software, but they can be disproportionately demotivating to both new and experienced developers alike. I know I personally have changed my reviewing style in a big way as a result of my own frustration with being on the receiving end of nit-pick reviews (deepest apologies to everyone I have nit-picked in the past).

This proposal led to a rather long mailing list thread, which I think demonstrates why we decided to stick with one relatively simple change in the beginning. As it was, the discussion tangented into some of the other areas we would like to address eventually but didn't want to get bogged down with right now.

Overall, I have high hopes that this initiative will make OpenStack a more pleasant project to work on while not meaningfully harming the quality of the software.

API Debt Cleanup

I must confess I'm not sure I fully understand the proposed way forward here. It seemed to me that there are two conflicting goals here: 1) To not break existing users of OpenStack APIs and 2) To make it easier for users to consume new functionality added in more recent microversions of OpenStack APIs. The concern seemed to be that many users are not aware of or not able to use new microversions so they are missing out on functionality and improved API designs. However, if we raise the default microversion then we open up the possibility of breaking existing users because new microversions may not be 100% compatible with old ones. As long as microversions are opt-in that's fine, but once you start changing the minimum microversion it becomes a problem.

The proposal was sort of a "big bang" major version bump across OpenStack. Essentially we would pick a cycle and have all of the projects do their desired API cleanup and everyone would tag a new major version of their API at about the same time. I'm still not entirely clear how this solves the problem I mentioned above though. A new default major version still opens up the possibility of breaking users that rely on older behavior, and a new major version that isn't the default still requires users to opt in. Maybe it's just that opting in to a new major version is easier than a new microversion?

I'm hoping that I missed some fundamental key to how this works, or maybe just missed that some of these tradeoffs are considered acceptable. In any case, it will be interesting to see how this work progresses.

OpenStack Maintainers

This ended up being another session that focused heavily on how to keep the OpenStack community healthy. The proposal was that there should be a group of people who are solely concerned with maintaining and cleaning up the code. This group would not be involved in new feature work.

Obviously this is a hard sell, as most developers want to do feature work. In addition, if you're not actively working on the code it's harder to stay up to date on where the project is going so you can provide useful architectural reviews. Overall, I did not feel like the idea of dedicated maintainers gained much traction in the room, but there was a lot of good discussion of how to encourage maintenance-type activities from existing and new contributors. The details can be found on the etherpad.

Stein Release Goals

In the Stein Release Goal session...we identified a whole bunch of goals for the T release. Okay, not entirely true, but there were a number of ideas floated that got nacked because they wouldn't be ready for Stein, but might be for T. I'm not going to try to cover them all here, but you can read more on the etherpad.

The other thing that happened in this session was we got rather side-tracked on the topic of how to select goals and what makes a good goal. The point was made that it's good to have community goals with a "wow" factor. These help the marketing teams and attract people to OpenStack, a good thing for everyone. However, the question was raised as to why we aren't selecting only "wow" goals that immediately address operator needs. It's a valid question, but it's not as simple as it appears on the surface.

See, all of the goals ultimately benefit operators. But the strategy so far with community goals has been to select one directly operator-facing goal, and one indirect goal. The latter is often related to cleaning up tech debt in OpenStack. While that may not have the same kind of immediate impact that something like mutable config does, it can have a huge long-term impact on the health and velocity of the project. Sure, splitting out the Tempest plugins for projects didn't have a direct impact on operators in that cycle, but it freed up bandwidth for everyone to be able to land new features faster. We paid down the debt in the short term to enable larger long term gains.

All of which is basically me saying that I like the idea behind our goal selection up to this point. I think one of each is a good balance of both immediate and longer-term impact.

In this session there was also some post-mortem of the previous goals. The WSGI API deployment goal was pointed out as one that did not go so well. Halfway through the cycle there was a massive shift in direction for that goal which caused bunch of re-work and bad feelings about it. As a result, there were some recommendations for criteria that goals need to meet going forward to avoid selection of goals that aren't quite fully-baked yet. You can also read more about those on the etherpad.

Unified Limits

I mostly attended this because it involves the new oslo.limit library so I thought I should have some idea of what was going on. I'm really glad I did though because it turned out to be an excellent deep dive into how the unified limits API is going to work and how it could address the needs of some of the operators in the room. I came out of the session feeling very good about where quota management in OpenStack is headed.

Designate Best Practices

The very last session slot of the Summit, and as a result there wasn't a ton of audience participation (although there were still a fair number of people in attendance). However, there was quite a bit of useful information presented so I recommend watching the video if you are interested in operating Designate.

I skipped writing a summary for a few of the sessions that I attended, either because I thought they would be covered elsewhere or because they were simply too much to discuss in this already too-long post. I hope what I wrote above was interesting and maybe even a little helpful though.

by bnemec at June 07, 2018 08:34 PM

June 04, 2018

Steven Hardy

TripleO Containerized deployments, debugging basics

Containerized deployments, debugging basics

Since the Pike release, TripleO has supported deployments with OpenStack services running in containers.  Currently we use docker to run images based on those maintained by the Kolla project.

We already have some tips and tricks for container deployment debugging in tripleo-docs, but below are some more notes on my typical debug workflows.

Config generation debugging overview

In the TripleO container architecture, we still use Puppet to generate configuration files and do some bootstrapping, but it is run (inside a container) via a script

The config generation usage happens at the start of the deployment (step 1) and the configuration files are generated for all services (regardless of which step they are started in).

The input file used is /var/lib/docker-puppet/docker-puppet.json, but you can also filter this (e.g via cut/paste or jq as shown below) to enable debugging for specific services - this is helpful when you need to iterate on debugging a config generation issue for just one service.

[root@overcloud-controller-0 docker-puppet]# jq '[.[]|select(.config_volume | contains("heat"))]' /var/lib/docker-puppet/docker-puppet.json | tee /tmp/heat_docker_puppet.json
"puppet_tags": "heat_config,file,concat,file_line",
"config_volume": "heat_api",
"step_config": "include ::tripleo::profile::base::heat::api\n",
"config_image": ""
"puppet_tags": "heat_config,file,concat,file_line",
"config_volume": "heat_api_cfn",
"step_config": "include ::tripleo::profile::base::heat::api_cfn\n",
"config_image": ""
"puppet_tags": "heat_config,file,concat,file_line",
"config_volume": "heat",
"step_config": "include ::tripleo::profile::base::heat::engine\n\ninclude ::tripleo::profile::base::database::mysql::client",
"config_image": ""


Then we can run the config generation, if necessary changing the tags (or puppet modules, which are consumed from the host filesystem e.g /etc/puppet/modules) until the desired output is achieved:

[root@overcloud-controller-0 docker-puppet]# export NET_HOST='true'
[root@overcloud-controller-0 docker-puppet]# export DEBUG='true'
[root@overcloud-controller-0 docker-puppet]# export PROCESS_COUNT=1
[root@overcloud-controller-0 docker-puppet]# export CONFIG=/tmp/heat_docker_puppet.json
[root@overcloud-controller-0 docker-puppet]# python /var/lib/docker-puppet/docker-puppet.py2018-02-09 16:13:16,978 INFO: 102305 -- Running docker-puppet
2018-02-09 16:13:16,978 DEBUG: 102305 -- CONFIG: /tmp/heat_docker_puppet.json
2018-02-09 16:13:16,978 DEBUG: 102305 -- config_volume heat_api
2018-02-09 16:13:16,978 DEBUG: 102305 -- puppet_tags heat_config,file,concat,file_line
2018-02-09 16:13:16,978 DEBUG: 102305 -- manifest include ::tripleo::profile::base::heat::api
2018-02-09 16:13:16,978 DEBUG: 102305 -- config_image


When the config generation is completed, configuration files are written out to /var/lib/config-data/heat.

We then compare timestamps against the /var/lib/config-data/heat/heat.*origin_of_time file (touched for each service before we run the config-generating containers), so that only those files modified or created by puppet are copied to /var/lib/config-data/puppet-generated/heat.

Note that we also calculate a checksum for each service (see /var/lib/config-data/puppet-generated/*.md5sum), which means we can detect when the configuration changes - when this happens we need paunch to restart the containers, even though the image did not change.

This checksum is added to the /var/lib/tripleo-config/hashed-docker-container-startup-config-step_*.json files by, and these files are later used by paunch to decide if a container should be restarted (see below).


Runtime debugging, paunch 101

Paunch is a tool that orchestrates launching containers for each step, and performing any bootstrapping tasks not handled via

It accepts a json format, which are the /var/lib/tripleo-config/docker-container-startup-config-step_*.json files that are created based on the enabled services (the content is directly derived from the service templates in tripleo-heat-templates)

These json files are then modified via (as mentioned above) to add a TRIPLEO_CONFIG_HASH value to the container environment - these modified files are written with a different name, see /var/lib/tripleo-config/hashed-docker-container-startup-config-step_*.json

Note this environment variable isn't used by the container directly, it is used as a salt to trigger restarting containers when the configuration files in the mounted config volumes have changed.

As in the docker-puppet case it's possible to filter the json file with jq and debug e.g mounted volumes or other configuration changes directly.

It's also possible to test configuration changes by manually modifying /var/lib/config-data/puppet-generated/ then either restarting the container via docker restart, or by modifying TRIPLEO_CONFIG_HASH then re-running paunch.

Note paunch will kill any containers tagged for a particular step e.g the --config-id tripleo_step4 --managed-by tripleo-Controller means all containers started during this step for any previous paunch apply will be killed if they are removed from your json during testing.  This is a feature which enables changes to the enabled services on update to your overcloud but it's worth bearing in mind when testing as described here.

[root@overcloud-controller-0]# cd /var/lib/tripleo-config/
[root@overcloud-controller-0 tripleo-config]# jq '{"heat_engine": .heat_engine}' hashed-docker-container-startup-config-step_4.json | tee /tmp/heat_startup_config.json
"heat_engine": {
"healthcheck": {
"test": "/openstack/healthcheck"
"image": "",
"environment": [
"volumes": [
"net": "host",
"privileged": false,
"restart": "always"
[root@overcloud-controller-0 tripleo-config]# paunch --debug apply --file /tmp/heat_startup_config.json --config-id tripleo_step4 --managed-by tripleo-Controller
stdout: dd60546daddd06753da445fd973e52411d0a9031c8758f4bebc6e094823a8b45

[root@overcloud-controller-0 tripleo-config]# docker ps | grep heat
dd60546daddd "kolla_start" 9 seconds ago Up 9 seconds (health: starting) heat_engine



Containerized services, logging

There are a couple of ways to access the container logs:

  • On the host filesystem, the container logs are persisted under /var/log/containers/<service>
  • docker logs <container id or name>
It is also often useful to use docker inspect <container id or name> to verify the container configuration, e.g the image in use and the mounted volumes etc.


Debugging containers directly

Sometimes logs are not enough to debug problems, and in this case you must interact with the container directly to diagnose the issue.

When a container is not restarting, you can attach a shell to the running container via docker exec:

[root@openstack-controller-0 ~]# docker exec -ti heat_engine /bin/bash
()[heat@openstack-controller-0 /]$ ps ax
1 ? Ss 0:00 /usr/local/bin/dumb-init /bin/bash /usr/local/bin/kolla_start
5 ? Ss 1:50 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat
25 ? S 3:05 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat
26 ? S 3:06 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat
27 ? S 3:06 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat
28 ? S 3:05 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat
2936 ? Ss 0:00 /bin/bash
2946 ? R+ 0:00 ps ax


That's all for today, for more information please refer to tripleo-docs,, or feel free to ask questions in #tripleo on Freenode!

by Steve Hardy ( at June 04, 2018 05:09 PM

June 01, 2018

James Slagle

TripleO and Ansible: config-download with Ansible Tower (part 3)

In my 2 previous posts, I’ve talked about TripleO’s config-download. If you
haven’t had a chance to read those yet, I suggest checking them out here and

One of the nice things about config-download is that it integrates nicely with
other Ansible based tooling. In particular, Ansible Tower (or Ansible AWX) can
be used to drive applying the overcloud configuration. For users and
operators who are already familiar with Tower, this provides a nice way to
manage and report on the overcloud deployment status with TripleO and Tower.

At a high level, this integration is broken down into the following steps on
the TripleO undercloud:

  1. Create the Heat stack
  2. Run openstack overcloud config download to download the ansible
    playbooks from Heat
  3. Run tripleo-ansible-inventory to create the Ansible inventory file
  4. Since Ansible Tower uses git or other (SCM’s) to synchronize and manage
    Ansible project directories, we create a git repo from the config-download
    directory on the undercloud.

Switching over to Ansible Tower, we then:

  1. Create an organization
  2. Create SCM (git) credentials and machine credentials
  3. Create the Ansible project, pointing it at the git repository we created on
    the undercloud
  4. Create the inventory and inventory source, pointing it at the inventory file
    within the project directory we created with tripleo-ansible-inventory.
  5. Create a Job Template to run deploy_steps_playbook.yaml from the project
  6. Launch the Job Template

When the job finishes, we have a deployed and configured overcloud ready for
use by tenants.

Here’s a video of the demo showing the above steps:

Of course, we wouldn’t want to manually go through those steps every time. We
can instead automate them with an ansible playbook, and then execute the
playbook from the undercloud, or a different management node. An example
playbook that automates all the steps above can be seen here:

by James Slagle at June 01, 2018 09:51 PM

May 31, 2018

Carlos Camacho

TripleO deep dive session #13 (Containerized Undercloud)

This is the 13th release of the TripleO “Deep Dive” sessions

Thanks to Dan Prince & Emilien Macchi for this deep dive session about the next step of the TripleO’s Undercloud evolution.

In this session, they will explain in detail the movement re-architecting the Undercloud to move towards containers in order to reuse the containerized Overcloud ecosystem.

You can access the presentation or the Etherpad notes.

So please, check the full session content on the TripleO YouTube channel.

Please check the sessions index to have access to all available content.

by Carlos Camacho at May 31, 2018 12:00 AM

March 19, 2018

Giulio Fidente

Ceph integration topics at OpenStack PTG

I wanted to share a short summary of the discussions happened around the Ceph integration (in TripleO) at the OpenStack PTG.

ceph-{container,ansible} branching

Together with John Fulton and Guillaume Abrioux (and after PTG, Sebastien Han) we put some thought into how to make the Ceph container images and ceph-ansible releases fit better the OpenStack model; the container images and ceph-ansible are in fact loosely coupled (not all versions of the container images work with all versions of ceph-ansible) and we wanted to move from a "rolling release" into a "point release" approach, mainly to permit regular maintenance of the previous versions known to work with the previous OpenStack versions. The plan goes more or less as follows:

  • ceph-{container,ansible} should be released together with the regular ceph updates
  • ceph-container will start using tags and stable branches like ceph-ansible does

The changes for the ceph/daemon docker images are visible already:

Multiple Ceph clusters

In the attempt to support better the "edge computing" use case, we discussed adding support for the deployment of multiple Ceph clusters in the overcloud.

Together with John Fulton and Steven Hardy (and after PTG, Gregory Charot) we realized this could be done using multiple stacks and by doing so, hopefully simplify managament of the "cells" and avoid potential issues due to orchestration of large clusters.

Much of this will build on Shardy's blueprint to split the control plane, see spec at:

The multiple Ceph clusters specifics will be tracked via another blueprint:

ceph-ansible testing with TripleO

We had a very good chat with John Fulton, Guillaume Abrioux, Wesley Hayutin and Javier Pena on how to get tested new pull requests for ceph-ansible with TripleO; basically trigger an existing TripleO scenario on changes proposed to ceph-ansible.

Given ceph-ansible is hosted on github, Wesley's and Javier suggested this should be possible with Zuul v3 and volunteered to help; some of the complications are about building an RPM from uncommitted changes for testing.

Move ceph-ansible triggering from workflow_tasks to external_deploy_tasks

This is a requirement for the Rocky release; we want to migrate away from using workflow_tasks and use external_deploy_tasks instead, to integrate into the "config-download" mechanism.

This work is tracked via a blueprint and we have a WIP submission on review:

We're also working with Sofer Athlan-Guyot on the enablement of Ceph in the upgrade CI jobs and with Tom Barron on scenario004 to deploy Manila with Ganesha (and CephFS) instead of the CephFS native backend.

Hopefully I didn't forget much; to stay updated on the progress join #tripleo on freenode or check our integration squad status at:

by Giulio Fidente at March 19, 2018 02:32 AM

March 06, 2018

Marios Andreou

My summary of the OpenStack Rocky PTG in Dublin

My summary of the OpenStack Rocky PTG in Dublin

I was fortunate to be part of the OpenStack PTG in Dublin this February. Here is a summary of the sessions I was able to be at. In the end the second day of the TripleO meetup thursday was disrupted as we had to leave the PTG venue. However we still managed to cover a wide range of topics some of which are summarized here.

In short and in the order attended: * FFU * Release cycles * TripleO


  • session etherpad
  • There are at least 5 different ways of doing FFU! Deployment projects update (tripleo, openstack-ansible, kolla, charms)
  • Some folks trying to do it manually (via operator feedback)
  • We will form a SIG (freenode #openstack-upgrades? ) –> first order of business is documenting something! Agreeing on best practices when FFU. –> meetings every 2 weeks?

Release Cycles

  • session etherpad
  • Release cadence to stay at 6 months for now. Wide discussion about the potential impacts of a longer release cycle including maintenance of stable branches, deployment project/integration testing and d/stream product release cycles, marketing, documentation and others. In the end the merits of a frequent upstream release cycle won, or at least, there was no consensus about getting a longer cycle.
  • On the other hand operators still think upgrades suck and don’t want to do it every six months. FFU is being relied on as the least painfull way to do upgrades at a longer cadence than the upstream 6 month development cycle which for now will stay as is.
  • There will be an extended maintenance tag or policy introduced for projects that will support the LTS long term support for stable branches


  • main tracking etherpad

  • retro session (emilienm) session etherpad some main points here are ‘do more and better ci’, communicate more and review at least a bit outside your squad, improve bugs triage, bring back deepdives.

  • ci session (weshay) session etherpad some main points here are ‘we need more attention on promotion’, upcoming features like new jobs (containerized undercloud, upgrades jobs), more communication with squads (upgrades ongoing for ex and continue to integrate the tripleo-upgrade role), python3 testing.

  • config download (slagle) session etherpad some main points are Rocky will bring config download and ansible-playbook worfklow for deployment of the environment, not just upgrade.

  • all in one (dprince) session etherpad some main points: using containerized undercloud have an ‘all-in-one’ role with only those services you need for your development at the given time. Some discussion around the potential CLI and pointers to more info

  • tripleo for generic provisioning (shadower) session etherpad some main points are re-using the config download with external_deploy_tasks (idea is kubernetes or openshift deployed in a tripleo overcloud), some work still needed on the interfaces and discussion around ironic nodes and ansible.

  • upgrades (marios o/, chem, jistr, lbezdick) at session etherpad , some main points are improvements in the ci - tech debt (moving to using the tripleo-upgrade role now), containerized undercloud upgrade is coming in Rocky (emilien investigating), Rocky will be a stabilization cycle with focus on improvements to the operator experience including validations, backup/restore, documentation and cli/ui. Integration with UI might be considered during Rocky to be revisitied with UI squad.

  • containerized undercloud (dprince, emilienm) session etherpad dprince gave a demonstration of a running containerized undercloud environment and reviewed the current work from the trello board. It is running well today and we can consider switching to default containerized undercloud in Rocky.

  • multiple ceph clusters (gfidente, johfulto), linked bug , discussion around possible approaches including having multiple heat stacks. gfidente or jfulton are better sources of info you are interested in this feature.

  • workflows api (thrash) session etherpad , some main points are fixing inconsistencies in workflows (should all have an output value, and not trying to get that from a zaqar message) and fixing usability, make a v2 tripleo mistral workflows api (tripleo-common) and re-organise the directories moving existing things under v1, look into optimizing the calls to swift to avoid a large number of individual object GET as currently happens.

  • UI (jtomasek) session etherpad some main points here are adding UI support for the new composable networks configuration, integration with coming config-download deployment, continue to increase UI/CLI feature parity, allow deployment of multiple plans, prototype workflows to derive parameters for the operator based on input for specific scenarios (like HCI), investigate root device hints support and setting physical_network on particular nodes. Florian led a side session in the Hotel on Thursday morning after we were kicked out of Croke Park stadium because nodublin where we discussed allowing operators to upload customvalidations and prototyping the use of swift for storing validations.

  • You might note that there are errors in the html validator for this post, but its late here and I’m in no mood to fight that right now. Yes, I know. cool story bro

March 06, 2018 03:00 PM

February 09, 2018

Steven Hardy

Debugging TripleO revisited - Heat, Ansible & Puppet

Some time ago I wrote a post about debugging TripleO heat templates, which contained some details of possible debug workflows when TripleO deployments fail.

In recent releases (since the Pike release) we've made some major changes to the TripleO architecture - we makes more use of Ansible "under the hood", and we now support deploying containerized environments.  I described some of these architectural changes in a talk at the recent OpenStack Summit in Sydney.

In this post I'd like to provide a refreshed tutorial on typical debug workflow, primarily focussing on the configuration phase of a typical TripleO deployment, and with particular focus on interfaces which have changed or are new since my original debugging post.

We'll start by looking at the deploy workflow as a whole, some heat interfaces for diagnosing the nature of the failure, then we'll at how to debug directly via Ansible and Puppet.  In a future post I'll also cover the basics of debugging containerized deployments.

The TripleO deploy workflow, overview

A typical TripleO deployment consists of several discrete phases, which are run in order:

Provisioning of the nodes

  1. A "plan" is created (heat templates and other files are uploaded to Swift running on the undercloud
  2. Some validation checks are performed by Mistral/Heat then a Heat stack create is started (by Mistral on the undercloud)
  3. Heat creates some groups of nodes (one group per TripleO role e.g "Controller"), which results in API calls to Nova
  4. Nova makes scheduling/placement decisions based on your flavors (which can be different per role), and calls Ironic to provision the baremetal nodes
  5. The nodes are provisioned by Ironic

This first phase is the provisioning workflow, after that is complete and the nodes are reported ACTIVE by nova (e.g the nodes are provisioned with an OS and running).

Host preparation

The next step is to configure the nodes in preparation for starting the services, which again has a specific workflow (some optional steps are omitted for clarity):

  1. The node networking is configured, via the os-net-config tool
  2. We write hieradata for puppet to the node filesystem (under /etc/puppet/hieradata/*)
  3. We write some data files to the node filesystem (a puppet manifest for baremetal configuration, and some json files that are used for container configuration)

Service deployment, step-by-step configuration

The final step is to deploy the services, either on the baremetal host or in containers, this consists of several tasks run in a specific order:

  1. We run puppet on the baremetal host (even in the containerized architecture this is still needed, e.g to configure the docker daemon and a few other things)
  2. We run "" to generate the configuration files for each enabled service (this only happens once, on step 1, for all services)
  3. We start any containers enabled for this step via the "paunch" tool, which translates some json files into running docker containers, and optionally does some bootstrapping tasks.
  4. We run again (with a different configuration, only on one node the "bootstrap host"), this does some bootstrap tasks that are performed via puppet, such as creating keystone users and endpoints after starting the service.

Note that these steps are performed repeatedly with an incrementing step value (e.g step 1, 2, 3, 4, and 5), with the exception of the "" config generation which we only need to do once (we just generate the configs for all services regardless of which step they get started in).

Below is a diagram which illustrates this step-by-step deployment workflow:
TripleO Service configuration workflow

The most common deployment failures occur during this service configuration phase of deployment, so the remainder of this post will primarily focus on debugging failures of the deployment steps.


Debugging first steps - what failed?

Heat Stack create failed.

Ok something failed during your TripleO deployment, it happens to all of us sometimes!  The next step is to understand the root-cause.

My starting point after this is always to run:

openstack stack failures list --long <stackname>

(undercloud) [stack@undercloud ~]$ openstack stack failures list --long overcloud
resource_type: OS::Heat::StructuredDeployment
physical_resource_id: 421c7860-dd7d-47bd-9e12-de0008a4c106
status_reason: |
Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
deploy_stdout: |

PLAY [localhost] ***************************************************************


TASK [Run puppet host configuration for step 1] ********************************
ok: [localhost]

TASK [debug] *******************************************************************
fatal: [localhost]: FAILED! => {
"changed": false,
"failed_when_result": true,
"outputs.stdout_lines|default([])|union(outputs.stderr_lines|default([]))": [
"Debug: Runtime environment: puppet_version=4.8.2, ruby_version=2.0.0, run_mode=user, default_encoding=UTF-8",
"Error: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'ugeas' at /etc/puppet/modules/tripleo/manifests/profile/base/docker.pp:181:5 on node overcloud-controller-0.localdomain"
to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/8dd0b23a-acb8-4e11-aef7-12ea1d4cf038_playbook.retry

PLAY RECAP *********************************************************************
localhost : ok=18 changed=12 unreachable=0 failed=1

We can tell several things from the output (which has been edited above for brevity), firstly the name of the failing resource

  • The error was on one of the Controllers (ControllerDeployment)
  • The deployment failed during the per-step service configuration phase (the AllNodesDeploySteps part tells us this)
  • The failure was during the first step (Step1.0)
Then we see more clues in the deploy_stdout, ansible failed running the task which runs puppet on the host, it looks like a problem with the puppet code.

With a little more digging we can see which node exactly this failure relates to, e.g we copy the SoftwareDeployment ID from the output above, then run:

(undercloud) [stack@undercloud ~]$ openstack software deployment show 421c7860-dd7d-47bd-9e12-de0008a4c106 --format value --column server_id
(undercloud) [stack@undercloud ~]$ openstack server list | grep 29b3c254-5270-42ae-8150-9fc3f67d3d89
| 29b3c254-5270-42ae-8150-9fc3f67d3d89 | overcloud-controller-0 | ACTIVE | ctlplane= | overcloud-full | oooq_control |

Ok so puppet failed while running via ansible on overcloud-controller-0.


Debugging via Ansible directly

Having identified that the problem was during the ansible-driven configuration phase, one option is to re-run the same configuration directly via ansible-ansible playbook, so you can either increase verbosity or potentially modify the tasks to debug the problem.

Since the Queens release, this is actually very easy, using a combination of the new "openstack overcloud config download" command and the tripleo dynamic ansible inventory.

(undercloud) [stack@undercloud ~]$ openstack overcloud config download
The TripleO configuration has been successfully generated into: /home/stack/tripleo-VOVet0-config
(undercloud) [stack@undercloud ~]$ cd /home/stack/tripleo-VOVet0-config
(undercloud) [stack@undercloud tripleo-VOVet0-config]$ ls
common_deploy_steps_tasks.yaml external_post_deploy_steps_tasks.yaml templates
Compute global_vars.yaml update_steps_playbook.yaml
Controller group_vars update_steps_tasks.yaml
deploy_steps_playbook.yaml post_upgrade_steps_playbook.yaml upgrade_steps_playbook.yaml
external_deploy_steps_tasks.yaml post_upgrade_steps_tasks.yaml upgrade_steps_tasks.yaml

Here we can see there is a "deploy_steps_playbook.yaml", which is the entry point to run the ansible service configuration steps.  This runs all the common deployment tasks (as outlined above) as well as any service specific tasks (these end up in task include files in the per-role directories, e.g Controller and Compute in this example).

We can run the playbook again on all nodes with the tripleo-ansible-inventory from tripleo-validations, which is installed by default on the undercloud:

(undercloud) [stack@undercloud tripleo-VOVet0-config]$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory deploy_steps_playbook.yaml --limit overcloud-controller-0
TASK [Run puppet host configuration for step 1] ********************************************************************
ok: []

TASK [debug] *******************************************************************************************************
fatal: []: FAILED! => {
"changed": false,
"failed_when_result": true,
"outputs.stdout_lines|default([])|union(outputs.stderr_lines|default([]))": [
"Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend",
"exception: connect failed",
"Warning: Undefined variable '::deploy_config_name'; ",
" (file & line not available)",
"Warning: Undefined variable 'deploy_config_name'; ",
"Error: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'ugeas' at /etc/puppet/modules/tripleo/manifests/profile
/base/docker.pp:181:5 on node overcloud-controller-0.localdomain"


NO MORE HOSTS LEFT *************************************************************************************************
to retry, use: --limit @/home/stack/tripleo-VOVet0-config/deploy_steps_playbook.retry

PLAY RECAP ********************************************************************************************************* : ok=56 changed=2 unreachable=0 failed=1

Here we can see the same error is reproduced directly via ansible, and we made use of the --limit option to only run tasks on the overcloud-controller-0 node.  We could also have added --tags to limit the tasks further (see tripleo-heat-templates for which tags are supported).

If the error were ansible related, this would be a good way to debug and test any potential fixes to the ansible tasks, and in the upcoming Rocky release there are plans to switch to this model of deployment by default.


Debugging via Puppet directly

Since this error seems to be puppet related, the next step is to reproduce it on the host (obviously the steps above often yield enough information to identify the puppet error, but this assumes you need to do more detailed debugging directly via puppet):

Firstly we log on to the node, and look at the files in the /var/lib/tripleo-config directory.

(undercloud) [stack@undercloud tripleo-VOVet0-config]$ ssh heat-admin@
Warning: Permanently added '' (ECDSA) to the list of known hosts.
Last login: Fri Feb 9 14:30:02 2018 from gateway
[heat-admin@overcloud-controller-0 ~]$ cd /var/lib/tripleo-config/
[heat-admin@overcloud-controller-0 tripleo-config]$ ls
docker-container-startup-config-step_1.json docker-container-startup-config-step_4.json puppet_step_config.pp
docker-container-startup-config-step_2.json docker-container-startup-config-step_5.json
docker-container-startup-config-step_3.json docker-container-startup-config-step_6.json

The puppet_step_config.pp file is the manifest applied by ansible on the baremetal host

We can debug any puppet host configuration by running puppet apply manually. Note that hiera is used to control the step value, this will be at the same value as the failing step, but it can also be useful sometimes to manually modify this for development testing of different steps for a particular service.

[root@overcloud-controller-0 tripleo-config]# hiera -c /etc/puppet/hiera.yaml step
[root@overcloud-controller-0 tripleo-config]# cat /etc/puppet/hieradata/config_step.json
{"step": 1}[root@overcloud-controller-0 tripleo-config]# puppet apply --debug puppet_step_config.pp
Error: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'ugeas' at /etc/puppet/modules/tripleo/manifests/profile/base/docker.pp:181:5 on node overcloud-controller-0.localdomain

Here we can see the problem is a typo in the /etc/puppet/modules/tripleo/manifests/profile/base/docker.pp file at line 181, I look at the file, fix the problem (ugeas should be augeas) then re-run puppet apply to confirm the fix.

Note that with puppet module fixes you will need to get the fix either into an updated overcloud image, or update the module via deploy artifacts for testing local forks of the modules.

That's all for today, but in a future post, I will cover the new container architecture, and share some debugging approaches I have found helpful when deployment failures are container related.

by Steve Hardy ( at February 09, 2018 05:04 PM

December 11, 2017

James Slagle

TripleO and Ansible deployment (Part 1)

In the Queens release of TripleO, you’ll be able to use Ansible to apply the
software deployment and configuration of an Overcloud.

Before jumping into some of the technical details, I wanted to cover some
background about how the Ansible integration works along side some of the
existing tools in TripleO.

The Ansible integration goes as far as offering an alternative to the
communication between the existing Heat agent (os-collect-config) and the Heat
API. This alternative is opt-in for Queens, but we are exploring making it the
default behavior for future releases.

The default behavior for Queens (and all prior releases) will still use the
model where each Overcloud node has a daemon agent called os-collect-config
that periodically polls the Heat API for deployment and configuration data.
When Heat provides updated data, the agent applies the deployments, making
changes to the local node such as configuration, service management,
pulling/starting containers, etc.

The Ansible alternative instead uses a “control” node (the Undercloud) running
ansible-playbook with a local inventory file and pushes out all the changes to
each Overcloud node via ssh in the typical Ansible fashion.

Heat is still the primary API, while the parameter and environment files that
get passed to Heat to create an Overcloud stack remain the same regardless of
which method is used.

Heat is also still fully responsible for creating and orchestrating all
OpenStack resources in the services running on the Undercloud (Nova servers,
Neutron networks, etc).

This sequence diagram will hopefully provide a clear picture:

Replacing the application and transport layer of the deployment with Ansible
allows us to take advantage of features in Ansible that will hopefully make
deploying and troubleshooting TripleO easier:

  • Running only specific deployments
  • Including/excluding specific nodes or roles from an update
  • More real time sequential output of the deployment
  • More robust error reporting
  • Faster iteration and reproduction of deployments

Using Ansible instead of the Heat agent is easy. Just include 2 extra cli args
in the deployment command:

-e /path/to/templates/environments/config-download-environment.yaml \

Once Heat is done creating the stack (will be much faster than usual), a
separate Mistral workflow will be triggered that runs ansible-playbook to
finish the deployment. The output from ansible-playbook will be streamed to
stdout so you can follow along with the progress.

Here’s a demo showing what a stack update looks like:

(I suggest making the demo fully screen or watch it here:

Note that we don’t get color output from ansible-playbook since we are
consuming the stdout from a Zaqar queue. However, in my next post I will go
into how to execute ansible-playbook manually, and detail all of the related
files (inventory, playbooks, etc) that are available to interact with manually.

If you want to read ahead, have a look at the official documentation:


by James Slagle at December 11, 2017 03:19 PM

July 19, 2017

Giulio Fidente

Understanding ceph-ansible in TripleO

One of the goals for the TripleO Pike release was to introduce ceph-ansible as an alternative to puppet-ceph for the deployment of Ceph.

More specifically, to put operators in control of the playbook execution as if they were launching ceph-ansible from the commandline, except it would be Heat starting ceph-ansible at the right time during the overcloud deployment.

This demanded for some changes in different tools used by TripleO and went through a pretty long review process, eventually putting in place some useful bits for the future integration of Kubernetes and migration to an ansible driven deployment of the overcloud configuration steps in TripleO.

The idea was to add a generic functionality allowing triggering of a given Mistral workflow during the deployment of a service. Mistral could have then executed any action, including for example an ansible playbook, provided it was given all the necessay input data for the playbook to run and the roles list to build the hosts inventory.

This is how we did it.

Run ansible-playbook from Mistral (1)
An initial submission added support for the execution of ansible playbooks as workflow tasks in Mistral

A generic action for Mistral which workflows can use to run an ansible playbook. +2 to Dougal and Ryan.

Deploy external resources from Heat (2)
We also needed a new resource in Heat to be able to drive Mistral workflow executions so that we could orchestrate the executions like any other Heat resource. This is described much in detail in a Heat spec.

With these two, we could run an ansible playbook from a Heat resource, via Mistral. +2 to Zane and Thomas for the help! Enough to start messing in TripleO and glue things together.

Describe what/when to run in TripleO (3)
We added a mechanim in the TripleO templates to make it possible to describe, from within a service, a list of tasks or workflows to be executed at any given deployment step

There aren't restrictions on what the tasks or workflows in the new section should do. These might deploy the service or prepare the environment for it or execute code (eg. build Swift rings). The commit message explains how to use it:

    - name: my_action_name
      action: std.echo
        output: 'hello world'

The above snippet would make TripleO to run the Mistral std.echo action during the overcloud deployment, precisely at step 2, assuming you create a new service with the code above and enable it on a role.

For Ceph we wanted to run the new Mistral action (1) and needed to provide it with the config settings for the service, normally described within the config_settings structure of the service template.

Provide config_settings to the workflows (4)
The decision was to make available all config settings into the Mistral execution environment so that ansible actions could, for example, use them as extra_vars

Now all config settings normally consumed by puppet were available to the Mistral action and playbook settings could be added too, +2 Steven.

Build the data for the hosts inventory (5)
Together with the above, another small change provided into the execution environment a dictionary mapping every enabled service to the list of IP address of the nodes where the service is deployed

This was necessary to be able to build the ansible hosts inventory.

Create a workflow for ceph-ansible (6)
Having all pieces available to trigger the workflow and pass to it the service config settings, we needed the workflow which would run ceph-ansible plus some new, generic Mistral actions, to run smoothly multiple times (eg. stack updates)

This is the glue which runs a ceph-ansible playbook with the given set of parameters. +2 John.

Deploy Ceph via ceph-ansible (7)
Finally, the new services definition for Tripleo to deploy Ceph in containers via ceph-ansible, including a couple of params operators can use to push into the Mistral environment arbitrary extra_vars for ceph-ansible.

The deployment with ceph-ansible is activated with the ceph-ansible.yaml environment file.

Interestingly the templates to deploy Ceph using puppet-ceph are unchanged and continue to work as they used to so that for new deployments it is possible to use alternatively the new implementation with ceph-ansible or the pre-existing implementation using puppet-ceph. Only ceph-ansible allows for the deployment of Ceph in containers.

Big +2 also to Jiri (who doesn't even need a blog or twitter) and all the people who helped during the development process with feedback, commits and reviews.

Soon another article with some usage examples and debugging instructions!

by Giulio Fidente at July 19, 2017 09:00 AM

July 07, 2017

Julie Pichon

TripleO Deep Dive: Internationalisation in the UI

Yesterday, as part of the TripleO Deep Dives series I gave a short introduction to internationalisation in TripleO UI: the technical aspects of it, as well as a quick overview of how we work with the I18n team.

You can catch the recording on BlueJeans or YouTube, and below's a transcript.


Life and Journey of a String

Internationalisation was added to the UI during Ocata - just a release ago. Florian implemented most of it and did the lion's share of the work, as can be seen on the blueprint if you're curious about the nitty-gritty details.

Addition to the codebase

Here's an example patch from during the transition. On the left you can see how things were hard-coded, and on the right you can see the new defineMessages() interface we now use. Obviously new patches should directly look like on the right hand-side nowadays.

The defineMessages() dictionary requires a unique id and default English string for every message. Optionally, you can also provide a description if you think there could be confusion or to clarify the meaning. The description will be shown in Zanata to the translators - remember they see no other context, only the string itself.

For example, a string might sound active like if it were related to an action/button but actually be a descriptive help string. Or some expressions are known to be confusing in English - "provide a node" has been the source of multiple discussions on list and live so might as well pre-empt questions and offer additional context to help the translators decide on an appropriate translation.

Extraction & conversion

Now we know how to add an internationalised string to the codebase - how do these get extracted into a file that will be uploaded to Zanata?

All of the following steps are described in the translation documentation in the tripleo-ui repository. Assuming you've already run the installation steps (basically, npm install):

$ npm run build

This does a lot more than just extracting strings - it prepares the code for being deployed in production. Once this ends you'll be able to find your newly extracted messages under the i18n directory:

$ ls i18n/extracted-messages/src/js/components

You can see the directory structure is kept the same as the source code. And if you peek into one of the files, you'll note the content is basically the same as what we had in our defineMessages() dictionary:

$ cat i18n/extracted-messages/src/js/components/Login.json
    "id": "UserAuthenticator.authenticating",
    "defaultMessage": "Authenticating..."
    "id": "Login.username",
    "defaultMessage": "Username"
    "id": "Login.usernameRequired",
    "defaultMessage": "Username is required."

However, JSON is not a format that Zanata understands by default. I think the latest version we upgraded to, or the next one might have some support for it, but since there's no i18n JSON standard it's somewhat limited. In open-source software projects, po/pot files are generally the standard to go with.

$ npm run json2pot

> tripleo-ui@7.1.0 json2pot /home/jpichon/devel/tripleo-ui
> rip json2pot ./i18n/extracted-messages/**/*.json -o ./i18n/messages.pot

> [react-intl-po] write file -> ./i18n/messages.pot ✔️

$ cat i18n/messages.pot
msgid ""
msgstr ""
"POT-Creation-Date: 2017-07-07T09:14:10.098Z\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"MIME-Version: 1.0\n"
"X-Generator: react-intl-po\n"

#: ./i18n/extracted-messages/src/js/components/nodes/RegisterNodesDialog.json
#. [RegisterNodesDialog.noNodesToRegister] - undefined
msgid ""No Nodes To Register""
msgstr ""

#: ./i18n/extracted-messages/src/js/components/nodes/NodesToolbar/NodesToolbar.json
#. [Toolbar.activeFilters] - undefined
#: ./i18n/extracted-messages/src/js/components/validations/ValidationsToolbar.json
#. [Toolbar.activeFilters] - undefined
msgid "Active Filters:"
msgstr ""

#: ./i18n/extracted-messages/src/js/components/nodes/RegisterNodesDialog.json
#. [RegisterNodesDialog.addNew] - Small button, to add a new Node
msgid "Add New"
msgstr ""

#: ./i18n/extracted-messages/src/js/components/plan/PlanFormTabs.json
#. [PlanFormTabs.addPlanName] - Tooltip for "Plan Name" form field
msgid "Add a Plan Name"
msgstr ""

This messages.pot file is what will be automatically uploaded to Zanata.

Infra: from the git repo, to Zanata

The following steps are done by the infrastructure scripts. There's infra documentation on how to enable translations for your project, in our case as the first internationalised JavaScript project we had to update the scripts a little as well. This is useful to know if an issue happens with the infra jobs; debugging will probably bring you here.

The scripts live in the project-config infra repo and there are three files of interest for us:

In this case, is the file of interest to us: it simply sets up the project on line 76, then sends the pot file up to Zanata on line 115.

What does "setting up the project" entails? It's a function in, that pretty much runs the steps we talked about in the previous section, and also creates a config file to talk to Zanata.

Monitoring the post jobs

Post jobs run after a patch has already merged - usually to upload tarballs where they should be, update the documentation pages, etc, and also upload messages catalogues onto Zanata. Being a 'post' job however means that if something goes wrong, there is no notification on the original review so it's easy to miss.

Here's the OpenStack Health page to monitor 'post' jobs related to tripleo-ui. Scroll to the bottom - hopefully tripleo-ui-upstream-translation-update is still green! It's good to keep an eye on it although it's easy to forget. Thankfully, AJaeger from #openstack-infra has been great at filing bugs and letting us know when something does go wrong.

Debugging when things go wrong: an example

We had a couple of issues whereby a linebreak gets introduced into one of the strings, which works fine in JSON but breaks our pot file. If you look at the content from the bug (the full logs are no longer accessible):

2017-03-16 12:55:13.468428 | + zanata-cli -B -e push --copy-trans False
2017-03-16 12:55:15.391220 | [INFO] Found source documents:
2017-03-16 12:55:15.391405 | [INFO]            i18n/messages
2017-03-16 12:55:15.531164 | [ERROR] Operation failed: missing end-quote

You'll notice the first line is the last function we call in the script; for debugging that gives you an idea of the steps to follow to reproduce. The upstream Zanata instance also lets you create toy projects, if you want to test uploads yourself (this can't be done directly on the OpenStack Zanata instance.)

This particular newline issue has popped up a couple of times already. We're treating it with band-aids at the moment, ideally we'd get a proper test on the gate to prevent it from happening again: this is why this bug is still open. I'm not very familiar with JavaScript testing and haven't had a chance to look into it yet; if you'd like to give it a shot that'd be a useful contribution :)

Zanata, and contributing translations

The OpenStack Zanata instance lives at This is where the translators do their work. Here's the page for tripleo-ui, you can see there is one project per branch (stable/ocata and master, for now). Sort by "Percent Translated" to see the languages currently translated. Here's an example of the translator's view, for Spanish: you can see the English string on the left, and the translator fills in the right side. No context! Just strings.

At this stage of the release cycle, the focus would be on 'master,' although it is still early to do translations; there is a lot of churn still.

If you'd like to contribute translations, the I18n team has good documentation about how to go about how to do it. The short version: sign up on Zanata, request to join your language team, once you're approved - you're good to go!

Return of the string

Now that we have our strings available in multiple languages, it's time for another infra job to kick in and bring them into our repository. This is where comes in. We pull the po files from Zanata, convert them to JSON, then do a git commit that will be proposed to Gerrit.

The cleanup step does more than it might seem. It checks if files are translated over a certain ratio (~75% for code), which avoids adding new languages when there might only be one or two words translated (e.g. someone just testing Zanata to see how it works). Switching to your language and yet having the vast majority of the UI still appear in English is not a great user experience.

In theory, files that were added but are now below 40% should get automatically removed, however this doesn't quite work for JavaScript at the moment - another opportunity to help! Manual cleanups can be done in the meantime, but it's a rare event so not a major issue.

Monitoring the periodic jobs

Zanata is checked once a day every morning, there is an OpenStack Health page for this as well. You can see there are two jobs at the moment (hopefully green!), one per branch: tripleo-ui-propose-translation-update and tripleo-ui-propose-translation-update-ocata. The job should run every day even if there are no updates - it simply means there might not be a git review proposed at the end.

We haven't had issues with the periodic job so far, though the debugging process would be the same: figure out based on the failure if it is happening at the infra script stage or in one of our commands (e.g. npm run po2json), try to reproduce and fix. I'm sure super-helpful AJaeger would also let us know if he were to notice an issue here.

Automated patches

You may have seen the automated translations updates pop up on Gerrit. The commit message has some tips on how to review these: basically don't agonise over the translation contents as problems there should be handled in Zanata anyway, just make sure the format looks good and is unlikely to break the code. A JSON validation tool runs during the infra prep step in order to "prettify" the JSON blob and limit the size of the diffs, therefore once the patch  makes it out to Gerrit we know the JSON is well-formed at least.

Try to review these patches quickly to respect the translators' work. Not very nice to spend a lot of time on translating a project and yet not have your work included because no one was bothered to merge it :)

A note about new languages...

If the automated patch adds a new language, there'll be an additional step required after merging the translations in order to enable it: adding a string with the language name to a constants file. Until recently, this took 3 or 4 steps - thanks to Honza for making it much simpler!

This concludes the technical journey of a string. If you'd like to help with i18n tasks, we have a few related bugs open. They go from very simple low-hanging-fruits you could use to make your first contribution to the UI, to weird buttons that have translations available yet show in English but only in certain modals, to the kind of CI resiliency tasks I linked to earlier. Something for everyone! ;)

Working with the I18n team

It's really all about communication. Starting with...

Release schedule and string freezes

String freezes are noted on the main schedule but tend to fit the regular cycle-with-milestones work. This is a problem for a cycle-trailing project like tripleo-ui as we could be implementing features up to 2 weeks after the other projects, so we can't freeze strings that early.

There were discussions at the Atlanta PTG around whether the I18n should care at all about projects that don't respect the freeze deadlines. That would have made it impossible for projects like ours to ever make it onto the I18n official radar. The compromise was that cycle-trailing project should have a I18n cross-project liaison that communicates with the I18n PTL and team to inform them of deadlines, and also to ignore Soft Freeze and only do a Hard Freeze.

This will all be documented under an i18n governance tag; while waiting for it the notes from the sessions are available for the curious!

What's a String Freeze again?

The two are defined on the schedule: soft freeze means not allowing changes to strings, as it invalidates the translator's work and forces them to retranslate; hard freeze means no additions, changes or anything else in order to give translators a chance to catch up.

When we looked at Zanata earlier, there were translation percentages beside each language: the goal is always the satisfaction of reaching 100%. If we keep adding new strings then the goalpost keeps moving, which is discouraging and unfair.

Of course there's also an "exception process" when needed, to ask for permission to merge a string change with an explanation or at least a heads-up, by sending an email to the openstack-i18n mailing list. Not to be abused :)

Role of the I18n liaison

...Liaise?! Haha. The role is defined briefly on the Cross-Projects Liaison wiki page. It's much more important toward the end of the cycle, when the codebase starts to stabilise, there are fewer changes and translators look at starting their work to be included in the release.

In general it's good to hang out on the #openstack-i18n IRC channel (very low traffic), attend the weekly meeting (it alternates times), be available to answer questions, and keep the PTL informed of the I18n status of the project. In the case of cycle-trailing projects (quite a new release model still), it's also important to be around to explain the deadlines.

A couple of examples having an active liaison helps with:

  • Toward the end or after the release, once translations into the stable branch have settled, the stable translations get copied into the master branch on Zanata. The strings should still be fairly similar at that point and it avoids translators having to re-do the work. It's a manual process, so you need to let the I18n PTL know when there are no longer changes to stable/*.
  • Last cycle, because the cycle-trailing status of tripleo-ui was not correctly documented, a Zanata upgrade was planned right after the main release - which for us ended up being right when the codebase had stabilised enough and several translators had planned to be most active. Would have been solved with better, earlier communication :)


After the Ocata release, I sent a few screenshots of tripleo-ui to the i18n list so translators could see the result of their work. I don't know if anybody cared :-) But unlike Horizon, which has an informal test system available for translators to check their strings during the RC period, most of the people who volunteered translations had no idea what the UI looked like. It'd be cool if we could offer a test system with regular string updates next release - maybe just an undercloud on the new RDO cloud? Deployment success/failures strings wouldn't be verifiable but the rest would, while the system would be easier to maintain than a full dev TripleO environment - better than nothing. Perhaps an idea for the Queens cycle!

The I18n team has a priority board on the Zanata main page (only visible when logged in I think). I'm grateful to see TripleO UI in there! :) Realistically we'll never move past Low or perhaps Medium priority which is fair, as TripleO doesn't have the same kind of reach or visibility that Horizon or the installation guides do. I'm happy that we're included! The OpenStack I18n team is probably the most volunteer-driven team in OpenStack. Let's be kind, respect string freezes and translators' time! \o/


by jpichon at July 07, 2017 11:45 AM

March 02, 2017

Julie Pichon

OpenStack Pike PTG: TripleO, TripleO UI | Some highlights

For the second part of the PTG (vertical projects), I mainly stayed in the TripleO room, moving around a couple of times to attend cross-project sessions related to i18n.

Although I always wish I understood more/everything, in the end my areas of interest (and current understanding!) in TripleO are around the UI, installing and configuring it, the TripleO CLI, and the tripleo-common Mistral workflows. Therefore the couple of thoughts in this post are mainly relevant to these - if you're looking for a more exhaustive summary of the TripleO discussions and decisions made during the PTG, I recommend reading the PTL's excellent thread about this on the dev list, and the associated etherpads.

Random points of interest

  • Containers is the big topic and had multiple sessions dedicated to it, both single and cross-projects. Many other sessions ended up revisiting the subject as well, sometimes with "oh that'll be solved with containers" and sometimes with "hm good but that won't work with containers."
  • A couple of API-breaking changes may need to happen in Tripleo Heat Templates (e.g. for NFV, passing a role mapping vs a role name around). The recommendation is to get this in as early as possible (by the first milestone) and communicate it well for out of tree services.
  • When needing to test something new on the CI, look at the existing scenarios and prioritise adding/changing something there to test for what you need, as opposed to trying to create a brand new job.
  • Running Mistral workflows as part of or after the deployment came up several times and was even a topic during a cross-project Heat / Mistral / TripleO sessions. Things can get messy, switching between Heat, Mistral and Puppet. Where should these workflows live (THT, tripleo-common)? Service-specific workflows (pre/post-deploy) are definitely something people want and there's a need to standardise how to do that. Ceph's likely to be the first to try their hand at this.
  • One lively cross-project session with OpenStack Ansible and Kolla was about parameters in configuration files. Currently whenever a new feature is added to Nova or whatever service, Puppet and so on need to be updated manually. The proposal is to make a small change to oslo.config to enable it to give an output in machine-readable YAML which can then be consumed (currently the config generated is only human readable). This will help with validations, and it may help to only have to maintain a structure as opposed to a template.
  • Heat folks had a feedback session with us about the TripleO needs. They've been super helpful with e.g. helping to improve our memory usage over the last couple of cycles. My takeaway from this session was "beware/avoid using YAQL, especially in nested stacks." YAQL is badly documented and everyone ends up reading the source code and tests to figure out how to things. Bringing Jinja2 into Heat or some kind of way to have repeated patterns from resources (e.g. based on a file) also came up and was cautiously acknowledged.
  • Predictable IP assignment on the control plane is a big enough issue that some people are suggesting dropping Neutron in the undercloud over it. We'd lose so many other benefits though, that it seems unlikely to happen.
  • Cool work incoming allowing built-in network examples to Just Work, based on a sample configuration. Getting the networking stuff right is a huge pain point and I'm excited to hear this should be achievable within Pike.

Python 3

Python 3 is an OpenStack community goal for Pike.

Tripleo-common and python-tripleoclient both have voting unit tests jobs for Python 3.5, though I trust them only moderately for a number of reasons. For example many of the tests tend to focus on the happy path and I've seen and fixed Python 3 incompatible code in exceptions several times (no 'message' attribute seems easy to get caught into), despite the unit testing jobs being all green. Apparently there are coverage jobs we could enable for the client, to ensure the coverage ratio doesn't drop.

Python 3 for functional tests was also brought up. We don't have functional tests in any of our projects and it's not clear the value we would get out of it (mocking servers) compared to the unit testing and all the integration testing we already do. Increasing unit test coverage was deemed a more valuable goal to pursue for now.

There are other issues around functional/integration testing with Python 3 which will need to be resolved (though likely not during Pike). For example our integration jobs run on CentOS and use packages, which won't be Python 3 compatible yet (cue SCL and the need to respin dependencies). If we do add functional tests, perhaps it would be easier to have them run on a Fedora gate (although if I recall correctly gating on Fedora was investigated once upon a time at the OpenStack level, but caused too many issues due to churn and the lack of long-term releases).

Another issue with Python 3 support and functional testing is that the TripleO client depends on Mistral server (due to the Series Of Unfortunate Dependencies I also mentioned in the last post). That means Mistral itself would need to fully support Python 3 as well.

Python 2 isn't going anywhere just yet so we still have time to figure things out. The conclusions, as described in Emilien's email seem to be:

  • Improve the unit test coverage
  • Enable the coverage job in CI
  • Investigate functional testing for python-tripleoclient to start with, see if it makes sense and is feasible

Sample environment generator

Currently environment files in THT are written by hand and quite inconsistent. This is also important for the UI, which needs to display this information. For example currently the environment general description is in a comment at the top of the file (if it exists at all), which can't be accessed programmatically. Dependencies between environment files are not described either.

To make up for this, currently all that information lives in the capabilities map but it's external to the template themselves, needs to be updated manually and gets out of sync easily.

The sample environment generator to fix this has been out there for a year, and currently has two blockers. First, it needs a way to determine which parameters are private (that is, parameters that are expected to be passed in by another template and shouldn't be set by the user).

One way could be to use a naming convention, perhaps an underscore prefix similar to Python. Parameter groups cannot be used because of a historical limitation, there can only be one group (so you couldn't be both Private and Deprecated). Changing Heat with a new feature like Nested Groups or generic Parameter Tags could be an option. The advantage of the naming convention is that it doesn't require any change to Heat.

From the UI perspective, validating if an environment or template is redefining parameters already defined elsewhere also matters. Because if it has the same name, then it needs to be set up with the same value everywhere or it's uncertain what the final value will end up as.

I think the second issue was that the general environment description can only be a comment at the moment, there is no Heat parameter to do this. The Heat experts in the room seemed confident this is non-controversial as a feature and should be easy to get in.

Once the existing templates are updated to match the new format, the validation should be added to CI to make sure that any new patch with environments does include these parameters. Having "description" show up as an empty string when generating a new environment will make it more obvious that something can/should be there, while it's easy to forget about it with the current situation.

The agreement was:

  • Use underscores as a naming convention to start with
  • Start with a comment for the general description

Once we get the new Heat description attribute we can move things around. If parameter tags get accepted, likewise we can automate moving things around. Tags would also be useful to the UI, to determine what subset of relevant parameters to display to the user in smaller forms (easier to understand that one form with several dozens of fields showing up all at once). Tags, rather than parameter groups are required because of the aforementioned issue: it's already used for deprecation and a parameter can only have one group.

Trusts and federation

This was a cross-project session together with Heat, Keystone and Mistral. A "Trust" lets you delegate or impersonate a user with a subset of their rights. From my experience in TripleO, this is particularly useful with long running Heat stacks as a authentication token expires after a few hours which means you lose the ability to do anything in the middle of an operation.

Trusts have been working very well for Heat since 2013. Before that they had to encrypt the user password in order to ask for a new token when needed, which all agreed was pretty horrible and not anything people want to go back to. Unfortunately with the federation model and using external Identity Providers, this is no longer possible. Trusts break, but some kind of delegation is still needed for Heat.

There were a lot of tangents about security issues (obviously!), revocation, expiration, role syncing. From what I understand Keystone currently validates Trusts to make sure the user still has the requested permissions (that the relevant role hasn't been removed in the meantime). There's a desire to have access to the entire role list, because the APIs currently don't let us check which role is necessary to perform a specific action. Additionally, when Mistral workflows launched from Heat get in, Mistral will create its own Trusts and Heat can't know what that will do. In the end you always kinda end up needing to do everything. Karbor is running into this as well.

No solution was discovered during the session, but I think all sides were happy enough that the problem and use cases have been clearly laid out and are now understood.

TripleO UI

Some of the issues relevant to the UI were brought up in other sessions, like standardising the environment files. Other issues brought up were around plan management, for example why do we use the Mistral environment in addition to Swift? Historically it looks like it's because it was a nice drop-in replacement for the defunct TripleO API and offered a similar API. Although it won't have an API by default, the suggestion is to move to using a file to store the environment during Pike and have a consistent set of templates: this way all the information related to a deployment plan will live in the same place. This will help with exporting/importing plans, which itself will help with CLI/UI interoperability (for instance there are still some differences in how and when the Mistral environment is generated, depending on whether you deployed with the CLI or the UI).

A number of other issues were brought up around networking, custom roles, tracking deployment progress, and a great many other topics but I think the larger problems around plan management was the only expected to turn into a spec, now proposed for review.

I18n and release models

After the UI session I left the TripleO room to attend a cross-project session about i18n, horizon and release models. The release model point is particularly relevant because the TripleO UI is a newly internationalised project as of Ocata and the first to be cycle-trailing (TripleO releases a couple of weeks after the main OpenStack release).

I'm glad I was able to attend this session. For one it was really nice to collaborate directly with the i18n and release management team, and catch up with a couple of Horizon people. For second it turns out tripleo-ui was not properly documented as cycle-trailing (fixed now!) and that led to other issues.

Having different release models is a source of headaches for the i18n community, already stretched thin. It means string freezes happen at different times, stable branches are cut at different points, which creates a lot of tracking work for the i18n PTL to figure which project is ready and do the required manual work to update Zanata upon branching. One part of the solution is likely to figure out if we can script the manual parts of this workflow so that when the release patch (which creates the stable branch) is merged, the change is automatically reflected in Zanata.

For the non-technical aspects of the work (mainly keeping track of deadlines and string freezes) the decision was that if you want to be translated, then you need to respect the same deadlines than the cycle-with-milestones projects do on the main schedule, and if a project doesn't want to - if it doesn't respect the freezes or cut branches when expected, then they will be dropped from the i18n priority dashboard in Zanata. This was particularly relevant for Horizon plugins, as there's about a dozen of them now with various degrees of diligence when it comes to doing releases.

These expectations will be documented in a new governance tag, something like i18n:translated.

Obviously this would mean that cycle-trailing projects would likely never be able to get the tag. The work we depend on lands late and so we continue making changes up to two weeks after each of the documented deadlines. ianychoi, the i18n PTL seemed happy to take these projects under the i18n wing and do the manual work required, as long as there is an active i18n liaison for the project communicating with the i18n team to keep them informed about freezes and new branches. This seemed to work ok for us during Ocata so I'm hopeful we can keep that model. It's not entirely clear to me if this will also be documented/included in the governance tag so I look forward to reading the spec once it is proposed! :)

In the case of tripleo-ui we're not a priority project for translations nor looking to be, but we still rely on the i18n PTL to create Zanata branches and merge translations for us, and would like to continue with being able to translate stable branches as needed.


The CI Q&A session on Friday morning was amazingly useful and unanimously agreed it should be moved to the documentation (not done yet). If you've ever scratched your head about something related to TripleO CI, have a look at the etherpad!

by jpichon at March 02, 2017 09:55 AM

Last updated: November 26, 2018 01:32 PM

TripleO: OpenStack Deployment   Documentation | Code Reviews | CI Status | Zuul Queue | Planet