Planet TripleO

Subscriptions

July 19, 2017

Giulio Fidente

Understanding ceph-ansible in TripleO

One of the goals for the TripleO Pike release was to introduce ceph-ansible as an alternative to puppet-ceph for the deployment of Ceph.

More specifically, to put operators in control of the playbook execution as if they were launching ceph-ansible from the commandline, except it would be Heat starting ceph-ansible at the right time during the overcloud deployment.

This demanded for some changes in different tools used by TripleO and went through a pretty long review process, eventually putting in place some useful bits for the future integration of Kubernetes and migration to an ansible driven deployment of the overcloud configuration steps in TripleO.

The idea was to add a generic functionality allowing triggering of a given Mistral workflow during the deployment of a service. Mistral could have then executed any action, including for example an ansible playbook, provided it was given all the necessay input data for the playbook to run and the roles list to build the hosts inventory.

This is how we did it.

Run ansible-playbook from Mistral (1)
An initial submission added support for the execution of ansible playbooks as workflow tasks in Mistral https://github.com/openstack/tripleo-common/commit/e6c8a46f00436edfa5de92e97c3a390d90c3ce54

A generic action for Mistral which workflows can use to run an ansible playbook. +2 to Dougal and Ryan.

Deploy external resources from Heat (2)
We also needed a new resource in Heat to be able to drive Mistral workflow executions https://github.com/openstack/heat/commit/725b404468bdd2c1bdbaf16e594515475da7bace so that we could orchestrate the executions like any other Heat resource. This is described much in detail in a Heat spec.

With these two, we could run an ansible playbook from a Heat resource, via Mistral. +2 to Zane and Thomas for the help! Enough to start messing in TripleO and glue things together.

Describe what/when to run in TripleO (3)
We added a mechanim in the TripleO templates to make it possible to describe, from within a service, a list of tasks or workflows to be executed at any given deployment step https://github.com/openstack/tripleo-heat-templates/commit/71f13388161cbab12fe284f7b251ca8d36f7635c

There aren't restrictions on what the tasks or workflows in the new section should do. These might deploy the service or prepare the environment for it or execute code (eg. build Swift rings). The commit message explains how to use it:

service_workflow_tasks:
  step2:
    - name: my_action_name
      action: std.echo
      input:
        output: 'hello world'

The above snippet would make TripleO to run the Mistral std.echo action during the overcloud deployment, precisely at step 2, assuming you create a new service with the code above and enable it on a role.

For Ceph we wanted to run the new Mistral action (1) and needed to provide it with the config settings for the service, normally described within the config_settings structure of the service template.

Provide config_settings to the workflows (4)
The decision was to make available all config settings into the Mistral execution environment so that ansible actions could, for example, use them as extra_vars https://github.com/openstack/tripleo-heat-templates/commit/8b81b363fd48b0080b963fd2b1ab6bfe97b0c204

Now all config settings normally consumed by puppet were available to the Mistral action and playbook settings could be added too, +2 Steven.

Build the data for the hosts inventory (5)
Together with the above, another small change provided into the execution environment a dictionary mapping every enabled service to the list of IP address of the nodes where the service is deployed https://github.com/openstack/tripleo-heat-templates/commit/9c1940e461867f2ce986a81fa313d7995592f0c5

This was necessary to be able to build the ansible hosts inventory.

Create a workflow for ceph-ansible (6)
Having all pieces available to trigger the workflow and pass to it the service config settings, we needed the workflow which would run ceph-ansible https://github.com/openstack/tripleo-common/commit/fa0b9f52080580b7408dc6f5f2da6fc1dc07d500 plus some new, generic Mistral actions, to run smoothly multiple times (eg. stack updates) https://github.com/openstack/tripleo-common/commit/f81372d85a0a92de455eeaa93162faf09be670cf

This is the glue which runs a ceph-ansible playbook with the given set of parameters. +2 John.

Deploy Ceph via ceph-ansible (7)
Finally, the new services definition for Tripleo https://review.openstack.org/#/c/465066/ to deploy Ceph in containers via ceph-ansible, including a couple of params operators can use to push into the Mistral environment arbitrary extra_vars for ceph-ansible.

The deployment with ceph-ansible is activated with the ceph-ansible.yaml environment file.

Interestingly the templates to deploy Ceph using puppet-ceph are unchanged and continue to work as they used to so that for new deployments it is possible to use alternatively the new implementation with ceph-ansible or the pre-existing implementation using puppet-ceph. Only ceph-ansible allows for the deployment of Ceph in containers.

Big +2 also to Jiri (who doesn't even need a blog or twitter) and all the people who helped during the development process with feedback, commits and reviews.

Soon another article with some usage examples and debugging instructions!

by Giulio Fidente at July 19, 2017 09:00 AM

July 14, 2017

Carlos Camacho

Create a TripleO snapshot before breaking it...

The idea of this post is to show how developers can save some time creating snapshots of their development environments for not deploying it each time it breaks.

So, don’t waste time re-deploying your environment when testing submissions.

I’ll show here how to be a little more agile when deploying your Undercloud/Overcloud for testing purposes.

Deploying a fully working development environment takes around 3 hours with human supervision… And breaking it just after deployed is not cool at all…

Step 1

Deploy your environment as usual.

Step 2

Create your Undercloud/Overcloud snapshots. Do this as the stack user, otherwise virsh won’t see the VMs

# The VMs deployed are:
vms=( "undercloud" "control_0" "compute_0" )

# List all VMs
virsh list --all

# List current snapshots
for i in "${vms[@]}"; \
do \
virsh snapshot-list --domain "$i"; \
done

# Dump VMs XLM and check for qemu
for i in "${vms[@]}"; \
do \
virsh dumpxml "$i" | grep -i qemu; \
done

# Create an initial snapshot for each VM
for i in "${vms[@]}"; \
do \
echo "virsh snapshot-create-as --domain $i --name $i-fresh-install --description $i-fresh-install --atomic"; \
virsh snapshot-create-as --domain "$i" --name "$i"-fresh-install --description "$i"-fresh-install --atomic; \
done

# List current snapshots (After they should be already created)
for i in "${vms[@]}"; \
do \
virsh snapshot-list --domain "$i"; \
done

#########################################################################################################
# Current libvirt version does not support live snapshots.
# error: Operation not supported: live disk snapshot not supported with this QEMU binary
# --disk-only and --live not yet available.

# Create the folder for the images
# cd
# mkdir ~/backup_images

# for i in "${vms[@]}"; \
# do \
# echo "<domainsnapshot>" > $i.xml; \
# echo "  <memory snapshot='external' file='/home/stack/backup_images/$i.mem.snap2'/>" >> $i.xml; \
# echo "  <disks>" >> $i.xml; \
# echo "    <disk name='vda'>" >> $i.xml; \
# echo "      <source file='/home/stack/backup_images/$i.disk.snap2'/>" >> $i.xml; \
# echo "    </disk>" >> $i.xml; \
# echo "  </disks>" >> $i.xml; \
# echo "</domainsnapshot>" >> $i.xml; \
# done

# for i in "${vms[@]}"; \
# do \
# echo "virsh snapshot-create $i --xmlfile ~/$i.xml --atomic"; \
# virsh snapshot-create $i --xmlfile ~/$i.xml --atomic; \
# done

Step 3

Break your environment xD

Step 4

Restore your snapshots

# Commented for safety reasons...
# i=compute_0
i=blehblehbleh
virsh list --all
virsh shutdown $i
sleep 120
virsh list --all
virsh snapshot-revert --domain $i --snapshotname $i-fresh-install --running
virsh list --all

by Carlos Camacho at July 14, 2017 12:00 AM

Create a TripleO snapshot before breaking it...

The idea of this post is to show how developers can save some time creating snapshots of their development environments for not deploying it each time it breaks.

So, don’t waste time re-deploying your environment when testing submissions.

I’ll show here how to be a little more agile when deploying your Undercloud/Overcloud for testing purposes.

Deploying a fully working development environment takes around 3 hours with human supervision… And breaking it just after deployed is not cool at all…

Step 1

Deploy your environment as usual.

Step 2

Create your Undercloud/Overcloud snapshots. Do this as the stack user, otherwise virsh won’t see the VMs

# The VMs deployed are:
# $vms will have something like ne next line...
# vms=( "undercloud" "control_0" "compute_0" )
vms=( $(virsh list --all | grep running | awk '{print $2}') )

# List all VMs
virsh list --all

# List current snapshots
for i in "${vms[@]}"; \
do \
virsh snapshot-list --domain "$i"; \
done

# Dump VMs XLM and check for qemu
for i in "${vms[@]}"; \
do \
virsh dumpxml "$i" | grep -i qemu; \
done

# Create an initial snapshot for each VM
for i in "${vms[@]}"; \
do \
echo "virsh snapshot-create-as --domain $i --name $i-fresh-install --description $i-fresh-install --atomic"; \
virsh snapshot-create-as --domain "$i" --name "$i"-fresh-install --description "$i"-fresh-install --atomic; \
done

# List current snapshots (After they should be already created)
for i in "${vms[@]}"; \
do \
virsh snapshot-list --domain "$i"; \
done

#########################################################################################################
# Current libvirt version does not support live snapshots.
# error: Operation not supported: live disk snapshot not supported with this QEMU binary
# --disk-only and --live not yet available.

# Create the folder for the images
# cd
# mkdir ~/backup_images

# for i in "${vms[@]}"; \
# do \
# echo "<domainsnapshot>" > $i.xml; \
# echo "  <memory snapshot='external' file='/home/stack/backup_images/$i.mem.snap2'/>" >> $i.xml; \
# echo "  <disks>" >> $i.xml; \
# echo "    <disk name='vda'>" >> $i.xml; \
# echo "      <source file='/home/stack/backup_images/$i.disk.snap2'/>" >> $i.xml; \
# echo "    </disk>" >> $i.xml; \
# echo "  </disks>" >> $i.xml; \
# echo "</domainsnapshot>" >> $i.xml; \
# done

# for i in "${vms[@]}"; \
# do \
# echo "virsh snapshot-create $i --xmlfile ~/$i.xml --atomic"; \
# virsh snapshot-create $i --xmlfile ~/$i.xml --atomic; \
# done

Step 3

Break your environment xD

Step 4

Restore your snapshots

# Commented for safety reasons...
# i=compute_0
i=blehblehbleh
virsh list --all
virsh shutdown $i
sleep 120
virsh list --all
virsh snapshot-revert --domain $i --snapshotname $i-fresh-install --running
virsh list --all

by Carlos Camacho at July 14, 2017 12:00 AM

July 07, 2017

Julie Pichon

TripleO Deep Dive: Internationalisation in the UI

Yesterday, as part of the TripleO Deep Dives series I gave a short introduction to internationalisation in TripleO UI: the technical aspects of it, as well as a quick overview of how we work with the I18n team.

You can catch the recording on BlueJeans or YouTube, and below's a transcript.

~

Life and Journey of a String

Internationalisation was added to the UI during Ocata - just a release ago. Florian implemented most of it and did the lion's share of the work, as can be seen on the blueprint if you're curious about the nitty-gritty details.

Addition to the codebase

Here's an example patch from during the transition. On the left you can see how things were hard-coded, and on the right you can see the new defineMessages() interface we now use. Obviously new patches should directly look like on the right hand-side nowadays.

The defineMessages() dictionary requires a unique id and default English string for every message. Optionally, you can also provide a description if you think there could be confusion or to clarify the meaning. The description will be shown in Zanata to the translators - remember they see no other context, only the string itself.

For example, a string might sound active like if it were related to an action/button but actually be a descriptive help string. Or some expressions are known to be confusing in English - "provide a node" has been the source of multiple discussions on list and live so might as well pre-empt questions and offer additional context to help the translators decide on an appropriate translation.

Extraction & conversion

Now we know how to add an internationalised string to the codebase - how do these get extracted into a file that will be uploaded to Zanata?

All of the following steps are described in the translation documentation in the tripleo-ui repository. Assuming you've already run the installation steps (basically, npm install):

$ npm run build

This does a lot more than just extracting strings - it prepares the code for being deployed in production. Once this ends you'll be able to find your newly extracted messages under the i18n directory:

$ ls i18n/extracted-messages/src/js/components

You can see the directory structure is kept the same as the source code. And if you peek into one of the files, you'll note the content is basically the same as what we had in our defineMessages() dictionary:

$ cat i18n/extracted-messages/src/js/components/Login.json 
[
  {
    "id": "UserAuthenticator.authenticating",
    "defaultMessage": "Authenticating..."
  },
  {
    "id": "Login.username",
    "defaultMessage": "Username"
  },
  {
    "id": "Login.usernameRequired",
    "defaultMessage": "Username is required."
  },
[...]

However, JSON is not a format that Zanata understands by default. I think the latest version we upgraded to, or the next one might have some support for it, but since there's no i18n JSON standard it's somewhat limited. In open-source software projects, po/pot files are generally the standard to go with.

$ npm run json2pot

> tripleo-ui@7.1.0 json2pot /home/jpichon/devel/tripleo-ui
> rip json2pot ./i18n/extracted-messages/**/*.json -o ./i18n/messages.pot

> [react-intl-po] write file -> ./i18n/messages.pot ✔️

$ cat i18n/messages.pot 
msgid ""
msgstr ""
"POT-Creation-Date: 2017-07-07T09:14:10.098Z\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"MIME-Version: 1.0\n"
"X-Generator: react-intl-po\n"


#: ./i18n/extracted-messages/src/js/components/nodes/RegisterNodesDialog.json
#. [RegisterNodesDialog.noNodesToRegister] - undefined
msgid ""No Nodes To Register""
msgstr ""

#: ./i18n/extracted-messages/src/js/components/nodes/NodesToolbar/NodesToolbar.json
#. [Toolbar.activeFilters] - undefined
#: ./i18n/extracted-messages/src/js/components/validations/ValidationsToolbar.json
#. [Toolbar.activeFilters] - undefined
msgid "Active Filters:"
msgstr ""

#: ./i18n/extracted-messages/src/js/components/nodes/RegisterNodesDialog.json
#. [RegisterNodesDialog.addNew] - Small button, to add a new Node
msgid "Add New"
msgstr ""

#: ./i18n/extracted-messages/src/js/components/plan/PlanFormTabs.json
#. [PlanFormTabs.addPlanName] - Tooltip for "Plan Name" form field
msgid "Add a Plan Name"
msgstr ""
[...]

This messages.pot file is what will be automatically uploaded to Zanata.

Infra: from the git repo, to Zanata

The following steps are done by the infrastructure scripts. There's infra documentation on how to enable translations for your project, in our case as the first internationalised JavaScript project we had to update the scripts a little as well. This is useful to know if an issue happens with the infra jobs; debugging will probably bring you here.

The scripts live in the project-config infra repo and there are three files of interest for us:

In this case, upstream_translation_update.sh is the file of interest to us: it simply sets up the project on line 76, then sends the pot file up to Zanata on line 115.

What does "setting up the project" entails? It's a function in common_translations_update.sh, that pretty much runs the steps we talked about in the previous section, and also creates a config file to talk to Zanata.

Monitoring the post jobs

Post jobs run after a patch has already merged - usually to upload tarballs where they should be, update the documentation pages, etc, and also upload messages catalogues onto Zanata. Being a 'post' job however means that if something goes wrong, there is no notification on the original review so it's easy to miss.

Here's the OpenStack Health page to monitor 'post' jobs related to tripleo-ui. Scroll to the bottom - hopefully tripleo-ui-upstream-translation-update is still green! It's good to keep an eye on it although it's easy to forget. Thankfully, AJaeger from #openstack-infra has been great at filing bugs and letting us know when something does go wrong.

Debugging when things go wrong: an example

We had a couple of issues whereby a linebreak gets introduced into one of the strings, which works fine in JSON but breaks our pot file. If you look at the content from the bug (the full logs are no longer accessible):

2017-03-16 12:55:13.468428 | + zanata-cli -B -e push --copy-trans False
[...]
2017-03-16 12:55:15.391220 | [INFO] Found source documents:
2017-03-16 12:55:15.391405 | [INFO]            i18n/messages
2017-03-16 12:55:15.531164 | [ERROR] Operation failed: missing end-quote

You'll notice the first line is the last function we call in the upstream_translation_update.sh script; for debugging that gives you an idea of the steps to follow to reproduce. The upstream Zanata instance also lets you create toy projects, if you want to test uploads yourself (this can't be done directly on the OpenStack Zanata instance.)

This particular newline issue has popped up a couple of times already. We're treating it with band-aids at the moment, ideally we'd get a proper test on the gate to prevent it from happening again: this is why this bug is still open. I'm not very familiar with JavaScript testing and haven't had a chance to look into it yet; if you'd like to give it a shot that'd be a useful contribution :)

Zanata, and contributing translations

The OpenStack Zanata instance lives at https://translate.openstack.org. This is where the translators do their work. Here's the page for tripleo-ui, you can see there is one project per branch (stable/ocata and master, for now). Sort by "Percent Translated" to see the languages currently translated. Here's an example of the translator's view, for Spanish: you can see the English string on the left, and the translator fills in the right side. No context! Just strings.

At this stage of the release cycle, the focus would be on 'master,' although it is still early to do translations; there is a lot of churn still.

If you'd like to contribute translations, the I18n team has good documentation about how to go about how to do it. The short version: sign up on Zanata, request to join your language team, once you're approved - you're good to go!

Return of the string

Now that we have our strings available in multiple languages, it's time for another infra job to kick in and bring them into our repository. This is where propose_translation_update.sh comes in. We pull the po files from Zanata, convert them to JSON, then do a git commit that will be proposed to Gerrit.

The cleanup step does more than it might seem. It checks if files are translated over a certain ratio (~75% for code), which avoids adding new languages when there might only be one or two words translated (e.g. someone just testing Zanata to see how it works). Switching to your language and yet having the vast majority of the UI still appear in English is not a great user experience.

In theory, files that were added but are now below 40% should get automatically removed, however this doesn't quite work for JavaScript at the moment - another opportunity to help! Manual cleanups can be done in the meantime, but it's a rare event so not a major issue.

Monitoring the periodic jobs

Zanata is checked once a day every morning, there is an OpenStack Health page for this as well. You can see there are two jobs at the moment (hopefully green!), one per branch: tripleo-ui-propose-translation-update and tripleo-ui-propose-translation-update-ocata. The job should run every day even if there are no updates - it simply means there might not be a git review proposed at the end.

We haven't had issues with the periodic job so far, though the debugging process would be the same: figure out based on the failure if it is happening at the infra script stage or in one of our commands (e.g. npm run po2json), try to reproduce and fix. I'm sure super-helpful AJaeger would also let us know if he were to notice an issue here.

Automated patches

You may have seen the automated translations updates pop up on Gerrit. The commit message has some tips on how to review these: basically don't agonise over the translation contents as problems there should be handled in Zanata anyway, just make sure the format looks good and is unlikely to break the code. A JSON validation tool runs during the infra prep step in order to "prettify" the JSON blob and limit the size of the diffs, therefore once the patch  makes it out to Gerrit we know the JSON is well-formed at least.

Try to review these patches quickly to respect the translators' work. Not very nice to spend a lot of time on translating a project and yet not have your work included because no one was bothered to merge it :)

A note about new languages...

If the automated patch adds a new language, there'll be an additional step required after merging the translations in order to enable it: adding a string with the language name to a constants file. Until recently, this took 3 or 4 steps - thanks to Honza for making it much simpler!

This concludes the technical journey of a string. If you'd like to help with i18n tasks, we have a few related bugs open. They go from very simple low-hanging-fruits you could use to make your first contribution to the UI, to weird buttons that have translations available yet show in English but only in certain modals, to the kind of CI resiliency tasks I linked to earlier. Something for everyone! ;)

Working with the I18n team

It's really all about communication. Starting with...

Release schedule and string freezes

String freezes are noted on the main schedule but tend to fit the regular cycle-with-milestones work. This is a problem for a cycle-trailing project like tripleo-ui as we could be implementing features up to 2 weeks after the other projects, so we can't freeze strings that early.

There were discussions at the Atlanta PTG around whether the I18n should care at all about projects that don't respect the freeze deadlines. That would have made it impossible for projects like ours to ever make it onto the I18n official radar. The compromise was that cycle-trailing project should have a I18n cross-project liaison that communicates with the I18n PTL and team to inform them of deadlines, and also to ignore Soft Freeze and only do a Hard Freeze.

This will all be documented under an i18n governance tag; while waiting for it the notes from the sessions are available for the curious!

What's a String Freeze again?

The two are defined on the schedule: soft freeze means not allowing changes to strings, as it invalidates the translator's work and forces them to retranslate; hard freeze means no additions, changes or anything else in order to give translators a chance to catch up.

When we looked at Zanata earlier, there were translation percentages beside each language: the goal is always the satisfaction of reaching 100%. If we keep adding new strings then the goalpost keeps moving, which is discouraging and unfair.

Of course there's also an "exception process" when needed, to ask for permission to merge a string change with an explanation or at least a heads-up, by sending an email to the openstack-i18n mailing list. Not to be abused :)

Role of the I18n liaison

...Liaise?! Haha. The role is defined briefly on the Cross-Projects Liaison wiki page. It's much more important toward the end of the cycle, when the codebase starts to stabilise, there are fewer changes and translators look at starting their work to be included in the release.

In general it's good to hang out on the #openstack-i18n IRC channel (very low traffic), attend the weekly meeting (it alternates times), be available to answer questions, and keep the PTL informed of the I18n status of the project. In the case of cycle-trailing projects (quite a new release model still), it's also important to be around to explain the deadlines.

A couple of examples having an active liaison helps with:

  • Toward the end or after the release, once translations into the stable branch have settled, the stable translations get copied into the master branch on Zanata. The strings should still be fairly similar at that point and it avoids translators having to re-do the work. It's a manual process, so you need to let the I18n PTL know when there are no longer changes to stable/*.
  • Last cycle, because the cycle-trailing status of tripleo-ui was not correctly documented, a Zanata upgrade was planned right after the main release - which for us ended up being right when the codebase had stabilised enough and several translators had planned to be most active. Would have been solved with better, earlier communication :)

Post-release

After the Ocata release, I sent a few screenshots of tripleo-ui to the i18n list so translators could see the result of their work. I don't know if anybody cared :-) But unlike Horizon, which has an informal test system available for translators to check their strings during the RC period, most of the people who volunteered translations had no idea what the UI looked like. It'd be cool if we could offer a test system with regular string updates next release - maybe just an undercloud on the new RDO cloud? Deployment success/failures strings wouldn't be verifiable but the rest would, while the system would be easier to maintain than a full dev TripleO environment - better than nothing. Perhaps an idea for the Queens cycle!

The I18n team has a priority board on the Zanata main page (only visible when logged in I think). I'm grateful to see TripleO UI in there! :) Realistically we'll never move past Low or perhaps Medium priority which is fair, as TripleO doesn't have the same kind of reach or visibility that Horizon or the installation guides do. I'm happy that we're included! The OpenStack I18n team is probably the most volunteer-driven team in OpenStack. Let's be kind, respect string freezes and translators' time! \o/

</braindump>

Tagged with: open-source, openstack, talk-transcript, tripleo

by jpichon at July 07, 2017 12:45 PM

June 27, 2017

Ben Nemec

TripleO Network Isolation Template Generator Update

Just a quick update on the TripleO Network Isolation Template Generator. A few new features have been added recently that may be of interest.

The first, and most broadly applicable, is that the tool can now generate either old-style os-apply-config based templates, or new-style tripleo-heat-templates native templates. The latter are an improvement because they allow for much better error handling, and if bugs are found it is much easier to fix them. If your deployment is using Ocata or newer TripleO then you'll want to use the version 2 templates. If you need to support older releases, select version 1.

In addition support for some additional object types has been added. In particularl, the tool can now generate templates for OVS DPDK deployments. I don't have any way to test these templates, unfortunately, so the output is solely based on the examples in the os-net-config repo. Hopefully those are accurate. :-)

If you do try any of the new (or old) features of the tool and have feedback don't hesitate to let me know. To my knowledge I'm still the primary user of the tool so it would be nice to know what, if anything, other people are doing with it.

by bnemec at June 27, 2017 08:52 PM

June 06, 2017

Ben Nemec

Prevent cloud-init from Changing Your Hostname on Reboot

This is something that periodically bites me when I'm doing deployments in OVB. Because I'm dealing with cloud instances, cloud-init runs on each reboot and one of the things it does is change the hostname to whatever Nova's metadata says it should be. This behavior is very problematic for something like an undercloud VM because it can change your hostname from, say, undercloud.localdomain to undercloud-test.novalocal when you reboot the VM. Since Nova and Neutron use the hostname as a service identifier that can cause the undercloud to become completely broken until the hostname is changed back.

It's very simple to fix this, but every time it comes up I seem to have trouble finding a resource that just tells me how to do it. I found a couple of posts that discussed the problem but had no solutions, and a few that had very hacky solutions that I didn't want to use. So here's the fix:
echo "preserve_hostname: true" > /etc/cloud/cloud.cfg.d/99_hostname.cfg
It's that simple. With that file in place, cloud-init will not mess with the hostname.

by bnemec at June 06, 2017 02:43 PM

May 11, 2017

Steven Hardy

OpenStack Summit - TripleO Project Onboarding

We've been having a productive week here in Boston at the OpenStack Summit, and one of the sessions I was involved in was a TripleO project Onboarding session.

The project onboarding sessions are a new idea for this summit, and provide the opportunity for new or potential contributors (and/or users/operators) to talk with the existing project developers and get tips on how to get started as well as ask any questions and discuss ideas/issues.

The TripleO session went well, and I'm very happy to report it was well attended and we had some good discussions.  The session was informal with an emphasis on questions and some live demos/examples, but we did also use a few slides which provide an overview and some context for those new to the project.

Here are the slides used (also on my github), unfortunately I can't share the Q+A aspects of the session as it wasn't recorded, but I hope the slides will prove useful - we can be found in #tripleo on Freenode if anyone has questions about the slides or getting started with TripleO in general.

by Steve Hardy (noreply@blogger.com) at May 11, 2017 12:28 PM

May 10, 2017

Juan Antonio Osorio

Run ansible playbook on TripleO nodes

Running an ansible playbook on TripleO nodes is fairly simple thanks to the work done by the folks working on tripleo-validations. There’s no need to manually maintain an inventory file with all the nodes as there is already a dynamic inventory script set up for us.

So, using an ansible playbook would look as the following:

$ source ~/stackrc
$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory path/to/playbook.yaml

This will use localhost for the undercloud node and fetch the nodes from nova to get the overcloud nodes. There are also roles already available such as controllers and computes, which can be accessed in your playbooks with the keys “controller” and “compute” respectively. And support is coming for dynamic roles as well.

So, for a simple example, lets say we want to install the Libreswan package in all the overcloud nodes. It would be done with a playbook that looks as the following:

---
- hosts: overcloud
  become: true
  tasks:
  - name: Install libreswan
    yum:
      name: libreswan
      state: latest

May 10, 2017 06:22 AM

April 14, 2017

Emilien Macchi

My Journey As An OpenStack PTL

This story explains why I started to stop working as a anarchistic-multi-tasking-schedule-driven and learnt how to become a good team leader.

How it started

March 2015, Puppet OpenStack project just moved under the Big Tent. What a success for our group!

One of the first step was to elect a Project Team Lead. Our group was pretty small (~10 active contributors) so we thought that the PTL would be just a facilitator for the group, and the liaison with other projects that interact with us.
I mean, easy, right?

At that time, I was clearly an unconsciously incompetent PTL. I thought I knew what I was doing to drive the project to success.

But situation evolved. I started to deal with things that I didn’t expect to deal with like making sure our team works together in a way that is efficient and consistent. I also realized nobody knew what
a PTL was really supposed to do (at least in our group), so I took care of more tasks, like release management, organizing Summit design sessions, promoting core reviewers, and welcoming newcomers.
That was the time where I realized I become a consciously incompetent PTL. I was doing things that nobody taught me before.

In fact, there is no book telling you how to lead an OpenStack project so I decided to jump in this black hole and hopefully I would make mistakes so I can learn something.

 

Set your own expectations

I made the mistake of engaging myself into a role where expectations were not cleared with the team. The PTL guide is not enough to clear expectations of what your team will wait from you. This is something you have to figure out with the folks you’re working with. You would be surprised by the diversity of expectations that project contributors have for their PTL.
Talk with your team and ask them what they want you to be and how they see you as a team lead.
I don’t think there is a single rule that works for all projects, because of the different cultures in OpenStack community.

 

Embrace changes

… and accept failures.
There is no project in OpenStack that didn’t had outstanding issues (technical and human).
The first step as a PTL is to acknowledge the problem and share it with your team. Most of the conflicts are self-resolved when everyone agrees that yes, there is a problem. It can be a code design issue or any other technical disagreement but also human complains, like the difficulty to start contributing or the lack of reward for very active contributors who aren’t core yet.
Once a problem is resolved: discuss with your team about how we can avoid the same situation in the future.
Make a retrospective if needed but talk and document the output.

I continuously encourage at welcoming all kind of changes in TripleO so we can adopt new technologies that will make our project better.

Keep in mind it has a cost. Some people will disagree but that’s fine: you might have to pick a rate of acceptance to consider that your team is ready to make this change.

 

Delegate

We are humans and have limits. We can’t be everywhere and do everything.
We have to accept that PTLs are not supposed to be online 24/7. They don’t always have the best ideas and don’t always take the right decisions.
This is fine. Your project will survive.

I learnt that when I started to be PTL of TripleO in 2016.
The TripleO team has become so big that I didn’t realize how many interruptions I would have every day.
So I decided to learn how to delegate.
We worked together and created TripleO Squads where each squad focus on a specific area of TripleO.
Each squad would be autonomous enough to propose their own core reviewers or do their own meetings when needed.
I wanted small teams working together, failing fast and making quick iterations so we could scale the project, accept and share the work load and increase the trust inside the TripleO team.

This is where I started to be a Consciously Competent PTL.

 

Where am I now

I have reached a point where I think that projects wouldn’t need a PTL to run fine if they really wanted.
Instead, I start to believe about some essential things that would actually help to get rid of this role:

  • As a team, define the vision of the project and document it. It will really help to know where we want to
    go and clear all expectations about the project.
  • Establish trust to each individual by default and welcome newcomers.
  • Encourage collective and distributed leadership.
  • Try, Do, Fail, Learn, Teach. and start again. Don’t stale.

This long journey helped me to learn many things in both technical and human areas. It has been awesome to work with such groups so far.
I would like to spend more time on technical work (aka coding) but also in teaching and mentoring new contributors in OpenStack.
Therefore, I won’t be PTL during the next cycle and my hope is to see new leaders in TripleO, who would come up with fresh ideas and help us to keep TripleO rocking.

 

Thanks for reading so far, and also thanks for your trust.

by Emilien at April 14, 2017 08:56 PM

April 11, 2017

Juan Antonio Osorio

Using FreeIPA as an LDAP domain backend for keystone in TripleO

Configuring FreeIPA to be the backend of a keystone domain is pretty simple nowadays with recent additions to TripleO.

I took the configuration and several aspects of the setup (such as the users) from RDO VM Factory and used to to create the following environment file which we’ll use for TripleO:

parameter_defaults:
  KeystoneLDAPDomainEnable: true
  KeystoneLDAPBackendConfigs:
    freeipadomain:
      url: ldaps://ipa.example.com
      user: uid=keystone,cn=users,cn=accounts,dc=example,dc=com
      password: MySecretPassword
      suffix: dc=example,dc=com
      user_tree_dn: cn=users,cn=accounts,dc=example,dc=com
      user_objectclass: person
      user_id_attribute: uid
      user_name_attribute: uid
      user_mail_attribute: mail
      user_allow_create: false
      user_allow_update: false
      user_allow_delete: false
      group_tree_dn: cn=groups,cn=accounts,dc=example,dc=com
      group_objectclass: groupOfNames
      group_id_attribute: cn
      group_name_attribute: cn
      group_member_attribute: member
      group_desc_attribute: description
      group_allow_create: false
      group_allow_update: false
      group_allow_delete: false
      user_enabled_attribute: nsAccountLock
      user_enabled_default: False
      user_enabled_invert: true

We’ll call this freeipa-ldap-config.yaml.

Note that I set a user with uid called keystone. We’ll need to create this on the FreeIPA side. For convenience, we’ll also create a demo user. So, with your FreeIPA admin credentials loaded, do the following:

create_ipa_user() {
    echo "$2" | ipa user-add $1 --cn="$1 user" --first="$1" --last="user" --password
}
# Add a keystone user that Keystone will bind as
create_ipa_user keystone MySecretPassword

# Add a demo user
create_ipa_user demo MySecretPassword

Now, having this, we can do an overcloud install adding the configuration to the environments:

./overcloud-deploy.sh -e freeipa-ldap-config.yaml

When the deployment finishes, for convenience, we’ll assign the admin role for our admin user. We already have credentials for this user in the generated overcloudrc file from the deployment. So we’ll source that file, and add the role:

source overcloudrc.v3
openstack role add --domain $(openstack domain show freeipadomain -f value -c id)\
        --user $(openstack user show admin --domain default -f value -c id) \
        $(openstack role show admin -c id -f value)

Note that keystone v3 is needed for this, so we sourced overcloudrc.v3.

Now that we have a role in the FreeIPA-backed domain, we can list its users:

$ openstack user list --domain freeipadomain
+------------------------------------------------------------------+----------+
| ID                                                               | Name     |
+------------------------------------------------------------------+----------+
| 1bf11b164f896bbbaa94c7ca7de6d54fcd49f46e3e0fa452c7334bcd0586aeba | admin    |
| 61673b89cc0f0d50de0e649587c8ef2ecd28e3a029fde529a1db77ed0cf7c1d9 | keystone |
| b16f3fe6a5ffbca9e4fd45131f935dc516a21b597fc894dff4a1290d4ce8c6db | demo     |
+------------------------------------------------------------------+----------+

April 11, 2017 07:16 AM

March 03, 2017

Steven Hardy

Developing Mistral workflows for TripleO

During the newton/ocata development cycles, TripleO made changes to the architecture so we make use of Mistral (the OpenStack workflow API project) to drive workflows required to deploy your OpenStack cloud.

Prior to this change we had workflow defined inside python-tripleoclient, and most API calls were made directly to Heat.  This worked OK but there was too much "business logic" inside the client, which doesn't work well if non-python clients (such as tripleo-ui) want to interact with TripleO.

To solve this problem, number of mistral workflows and custom actions have been implemented, which are available via the Mistral API on the undercloud.  This can be considered the primary "TripleO API" for driving all deployment tasks now.

Here's a diagram showing how it fits together:

Overview of Mistral integration in TripleO


Mistral workflows and actions

There are two primary interfaces to mistral, workflows which are a yaml definition of a process or series of tasks, and actions which are a concrete definition of how to do a specific task (such as call some OpenStack API).

Workflows and actions can defined directly via the mistral API, or a wrapper called a workbook.  Mistral actions are also defined via a python plugin interface, which TripleO uses to run some tasks such as running jinja2 on tripleo-heat-templates prior to calling Heat to orchestrate the deployment.

Mistral workflows, in detail

Here I'm going to show how to view and interact with the mistral workflows used by TripleO directly, which is useful to understand what TripleO is doing "under the hood" during a deployment, and also for debugging/development.

First we view the mistral workbooks loaded into Mistral - these contain the TripleO specific workflows and are defined in tripleo-common


[stack@undercloud ~]$ . stackrc 
[stack@undercloud ~]$ mistral workbook-list
+----------------------------+--------+---------------------+------------+
| Name | Tags | Created at | Updated at |
+----------------------------+--------+---------------------+------------+
| tripleo.deployment.v1 | <none> | 2017-02-27 17:59:04 | None |
| tripleo.package_update.v1 | <none> | 2017-02-27 17:59:06 | None |
| tripleo.plan_management.v1 | <none> | 2017-02-27 17:59:09 | None |
| tripleo.scale.v1 | <none> | 2017-02-27 17:59:11 | None |
| tripleo.stack.v1 | <none> | 2017-02-27 17:59:13 | None |
| tripleo.validations.v1 | <none> | 2017-02-27 17:59:15 | None |
| tripleo.baremetal.v1 | <none> | 2017-02-28 19:26:33 | None |
+----------------------------+--------+---------------------+------------+

The name of the workbook constitutes a namespace for the workflows it contains, so we can view the related workflows using grep (I also grep for tag_node to reduce the number of matches).


[stack@undercloud ~]$ mistral workflow-list | grep "tripleo.baremetal.v1" | grep tag_node
| 75d2566c-13d9-4aa3-b18d-8e8fc0dd2119 | tripleo.baremetal.v1.tag_nodes | 660c5ec71ce043c1a43d3529e7065a9d | <none> | tag_node_uuids, untag_nod... | 2017-02-28 19:26:33 | None |
| 7a4220cc-f323-44a4-bb0b-5824377af249 | tripleo.baremetal.v1.tag_node | 660c5ec71ce043c1a43d3529e7065a9d | <none> | node_uuid, role=None, que... | 2017-02-28 19:26:33 | None | 

When you know the name of a workflow, you can inspect the required inputs, and run it directly via a mistral execution, in this case we're running the tripleo.baremetal.v1.tag_node workflow, which modifies the profile assigned in the ironic node capabilities (see tripleo-docs for more information about manual tagging of nodes)


[stack@undercloud ~]$ mistral workflow-get tripleo.baremetal.v1.tag_node
+------------+------------------------------------------+
| Field | Value |
+------------+------------------------------------------+
| ID | 7a4220cc-f323-44a4-bb0b-5824377af249 |
| Name | tripleo.baremetal.v1.tag_node |
| Project ID | 660c5ec71ce043c1a43d3529e7065a9d |
| Tags | <none> |
| Input | node_uuid, role=None, queue_name=tripleo |
| Created at | 2017-02-28 19:26:33 |
| Updated at | None |
+------------+------------------------------------------+
[stack@undercloud ~]$ ironic node-list
+--------------------------------------+-----------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+-----------+---------------+-------------+--------------------+-------------+
| 30182cb9-eba9-4335-b6b4-d74fe2581102 | control-0 | None | power off | available | False |
| 19fd7ea7-b4a0-4ae9-a06a-2f3d44f739e9 | compute-0 | None | power off | available | False |
+--------------------------------------+-----------+---------------+-------------+--------------------+-------------+
[stack@undercloud ~]$ mistral execution-create tripleo.baremetal.v1.tag_node '{"node_uuid": "30182cb9-eba9-4335-b6b4-d74fe2581102", "role": "test"}'
+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| ID | 6a141065-ad6e-4477-b1a8-c178e6fcadcb |
| Workflow ID | 7a4220cc-f323-44a4-bb0b-5824377af249 |
| Workflow name | tripleo.baremetal.v1.tag_node |
| Description | |
| Task Execution ID | <none> |
| State | RUNNING |
| State info | None |
| Created at | 2017-03-03 09:53:10 |
| Updated at | 2017-03-03 09:53:10 |
+-------------------+--------------------------------------+

At this point the mistral workflow is running, and it'll either succeed or fail, and also create some output (which in the TripleO model is sometimes returned to the Ux via a Zaqar queue).  We can view the status, and the outputs (truncated for brevity):


[stack@undercloud ~]$ mistral execution-list | grep  6a141065-ad6e-4477-b1a8-c178e6fcadcb
| 6a141065-ad6e-4477-b1a8-c178e6fcadcb | 7a4220cc-f323-44a4-bb0b-5824377af249 | tripleo.baremetal.v1.tag_node | | <none> | SUCCESS | None | 2017-03-03 09:53:10 | 2017-03-03 09:53:11 |
[stack@undercloud ~]$ mistral execution-get-output 6a141065-ad6e-4477-b1a8-c178e6fcadcb
{
"status": "SUCCESS",
"message": {
...

So that's it - we ran a mistral workflow, it suceeded and we looked at the output, now we can see the result looking at the node in Ironic, it worked! :)


[stack@undercloud ~]$ ironic node-show 30182cb9-eba9-4335-b6b4-d74fe2581102 | grep profile
| | u'cpus': u'2', u'capabilities': u'profile:test,cpu_hugepages:true,boot_o |

 

Mistral workflows, create your own!

Here I'll show how to develop your own custom workflows (which isn't something we expect operators to necessarily do, but is now part of many developers workflow during feature development for TripleO).

First, we create a simple yaml definition of the workflow, as defined in the v2 Mistral DSL - this example lists all available ironic nodes, then finds those which match the "test" profile we assigned in the example above:


This example uses the mistral built-in "ironic" action, which is basically a pass-through action exposing the python-ironicclient interfaces.  Similar actions exist for the majority of OpenStack python clients, so this is a pretty flexible interface.

Now we can now upload the workflow (not wrapped in a workbook this time, so we use workflow-create), run it via execution create, then look at the outputs - we can see that  the matching_nodes output matches the ID of the node we tagged in the example above - success! :)

[stack@undercloud tripleo-common]$ mistral workflow-create shtest.yaml 
+--------------------------------------+-------------------------+----------------------------------+--------+--------------+---------------------+------------+
| ID | Name | Project ID | Tags | Input | Created at | Updated at |
+--------------------------------------+-------------------------+----------------------------------+--------+--------------+---------------------+------------+
| 2b8f2bea-f3dd-42f0-ad16-79987c75df4d | test_nodes_with_profile | 660c5ec71ce043c1a43d3529e7065a9d | <none> | profile=test | 2017-03-03 10:18:48 | None |
+--------------------------------------+-------------------------+----------------------------------+--------+--------------+---------------------+------------+
[stack@undercloud tripleo-common]$ mistral execution-create test_nodes_with_profile
+-------------------+--------------------------------------+
| Field | Value |
+-------------------+--------------------------------------+
| ID | 2392ed1c-96b4-4787-9d11-0f3069e9a7e5 |
| Workflow ID | 2b8f2bea-f3dd-42f0-ad16-79987c75df4d |
| Workflow name | test_nodes_with_profile |
| Description | |
| Task Execution ID | <none> |
| State | RUNNING |
| State info | None |
| Created at | 2017-03-03 10:19:30 |
| Updated at | 2017-03-03 10:19:30 |
+-------------------+--------------------------------------+
[stack@undercloud tripleo-common]$ mistral execution-list | grep 2392ed1c-96b4-4787-9d11-0f3069e9a7e5
| 2392ed1c-96b4-4787-9d11-0f3069e9a7e5 | 2b8f2bea-f3dd-42f0-ad16-79987c75df4d | test_nodes_with_profile | | <none> | SUCCESS | None | 2017-03-03 10:19:30 | 2017-03-03 10:19:31 |
[stack@undercloud tripleo-common]$ mistral execution-get-output 2392ed1c-96b4-4787-9d11-0f3069e9a7e5
{
"matching_nodes": [
"30182cb9-eba9-4335-b6b4-d74fe2581102"
],
"available_nodes": [
"30182cb9-eba9-4335-b6b4-d74fe2581102",
"19fd7ea7-b4a0-4ae9-a06a-2f3d44f739e9"
]
}

Using this basic example, you can see how to develop workflows which can then easily be copied into the tripleo-common workbooks, and integrated into the TripleO deployment workflow.

In a future post, I'll dig into the use of custom actions, and how to develop/debug those.

by Steve Hardy (noreply@blogger.com) at March 03, 2017 10:51 AM

March 02, 2017

Julie Pichon

OpenStack Pike PTG: TripleO, TripleO UI | Some highlights

For the second part of the PTG (vertical projects), I mainly stayed in the TripleO room, moving around a couple of times to attend cross-project sessions related to i18n.

Although I always wish I understood more/everything, in the end my areas of interest (and current understanding!) in TripleO are around the UI, installing and configuring it, the TripleO CLI, and the tripleo-common Mistral workflows. Therefore the couple of thoughts in this post are mainly relevant to these - if you're looking for a more exhaustive summary of the TripleO discussions and decisions made during the PTG, I recommend reading the PTL's excellent thread about this on the dev list, and the associated etherpads.

Random points of interest

  • Containers is the big topic and had multiple sessions dedicated to it, both single and cross-projects. Many other sessions ended up revisiting the subject as well, sometimes with "oh that'll be solved with containers" and sometimes with "hm good but that won't work with containers."
  • A couple of API-breaking changes may need to happen in Tripleo Heat Templates (e.g. for NFV, passing a role mapping vs a role name around). The recommendation is to get this in as early as possible (by the first milestone) and communicate it well for out of tree services.
  • When needing to test something new on the CI, look at the existing scenarios and prioritise adding/changing something there to test for what you need, as opposed to trying to create a brand new job.
  • Running Mistral workflows as part of or after the deployment came up several times and was even a topic during a cross-project Heat / Mistral / TripleO sessions. Things can get messy, switching between Heat, Mistral and Puppet. Where should these workflows live (THT, tripleo-common)? Service-specific workflows (pre/post-deploy) are definitely something people want and there's a need to standardise how to do that. Ceph's likely to be the first to try their hand at this.
  • One lively cross-project session with OpenStack Ansible and Kolla was about parameters in configuration files. Currently whenever a new feature is added to Nova or whatever service, Puppet and so on need to be updated manually. The proposal is to make a small change to oslo.config to enable it to give an output in machine-readable YAML which can then be consumed (currently the config generated is only human readable). This will help with validations, and it may help to only have to maintain a structure as opposed to a template.
  • Heat folks had a feedback session with us about the TripleO needs. They've been super helpful with e.g. helping to improve our memory usage over the last couple of cycles. My takeaway from this session was "beware/avoid using YAQL, especially in nested stacks." YAQL is badly documented and everyone ends up reading the source code and tests to figure out how to things. Bringing Jinja2 into Heat or some kind of way to have repeated patterns from resources (e.g. based on a file) also came up and was cautiously acknowledged.
  • Predictable IP assignment on the control plane is a big enough issue that some people are suggesting dropping Neutron in the undercloud over it. We'd lose so many other benefits though, that it seems unlikely to happen.
  • Cool work incoming allowing built-in network examples to Just Work, based on a sample configuration. Getting the networking stuff right is a huge pain point and I'm excited to hear this should be achievable within Pike.

Python 3

Python 3 is an OpenStack community goal for Pike.

Tripleo-common and python-tripleoclient both have voting unit tests jobs for Python 3.5, though I trust them only moderately for a number of reasons. For example many of the tests tend to focus on the happy path and I've seen and fixed Python 3 incompatible code in exceptions several times (no 'message' attribute seems easy to get caught into), despite the unit testing jobs being all green. Apparently there are coverage jobs we could enable for the client, to ensure the coverage ratio doesn't drop.

Python 3 for functional tests was also brought up. We don't have functional tests in any of our projects and it's not clear the value we would get out of it (mocking servers) compared to the unit testing and all the integration testing we already do. Increasing unit test coverage was deemed a more valuable goal to pursue for now.

There are other issues around functional/integration testing with Python 3 which will need to be resolved (though likely not during Pike). For example our integration jobs run on CentOS and use packages, which won't be Python 3 compatible yet (cue SCL and the need to respin dependencies). If we do add functional tests, perhaps it would be easier to have them run on a Fedora gate (although if I recall correctly gating on Fedora was investigated once upon a time at the OpenStack level, but caused too many issues due to churn and the lack of long-term releases).

Another issue with Python 3 support and functional testing is that the TripleO client depends on Mistral server (due to the Series Of Unfortunate Dependencies I also mentioned in the last post). That means Mistral itself would need to fully support Python 3 as well.

Python 2 isn't going anywhere just yet so we still have time to figure things out. The conclusions, as described in Emilien's email seem to be:

  • Improve the unit test coverage
  • Enable the coverage job in CI
  • Investigate functional testing for python-tripleoclient to start with, see if it makes sense and is feasible

Sample environment generator

Currently environment files in THT are written by hand and quite inconsistent. This is also important for the UI, which needs to display this information. For example currently the environment general description is in a comment at the top of the file (if it exists at all), which can't be accessed programmatically. Dependencies between environment files are not described either.

To make up for this, currently all that information lives in the capabilities map but it's external to the template themselves, needs to be updated manually and gets out of sync easily.

The sample environment generator to fix this has been out there for a year, and currently has two blockers. First, it needs a way to determine which parameters are private (that is, parameters that are expected to be passed in by another template and shouldn't be set by the user).

One way could be to use a naming convention, perhaps an underscore prefix similar to Python. Parameter groups cannot be used because of a historical limitation, there can only be one group (so you couldn't be both Private and Deprecated). Changing Heat with a new feature like Nested Groups or generic Parameter Tags could be an option. The advantage of the naming convention is that it doesn't require any change to Heat.

From the UI perspective, validating if an environment or template is redefining parameters already defined elsewhere also matters. Because if it has the same name, then it needs to be set up with the same value everywhere or it's uncertain what the final value will end up as.

I think the second issue was that the general environment description can only be a comment at the moment, there is no Heat parameter to do this. The Heat experts in the room seemed confident this is non-controversial as a feature and should be easy to get in.

Once the existing templates are updated to match the new format, the validation should be added to CI to make sure that any new patch with environments does include these parameters. Having "description" show up as an empty string when generating a new environment will make it more obvious that something can/should be there, while it's easy to forget about it with the current situation.

The agreement was:

  • Use underscores as a naming convention to start with
  • Start with a comment for the general description

Once we get the new Heat description attribute we can move things around. If parameter tags get accepted, likewise we can automate moving things around. Tags would also be useful to the UI, to determine what subset of relevant parameters to display to the user in smaller forms (easier to understand that one form with several dozens of fields showing up all at once). Tags, rather than parameter groups are required because of the aforementioned issue: it's already used for deprecation and a parameter can only have one group.

Trusts and federation

This was a cross-project session together with Heat, Keystone and Mistral. A "Trust" lets you delegate or impersonate a user with a subset of their rights. From my experience in TripleO, this is particularly useful with long running Heat stacks as a authentication token expires after a few hours which means you lose the ability to do anything in the middle of an operation.

Trusts have been working very well for Heat since 2013. Before that they had to encrypt the user password in order to ask for a new token when needed, which all agreed was pretty horrible and not anything people want to go back to. Unfortunately with the federation model and using external Identity Providers, this is no longer possible. Trusts break, but some kind of delegation is still needed for Heat.

There were a lot of tangents about security issues (obviously!), revocation, expiration, role syncing. From what I understand Keystone currently validates Trusts to make sure the user still has the requested permissions (that the relevant role hasn't been removed in the meantime). There's a desire to have access to the entire role list, because the APIs currently don't let us check which role is necessary to perform a specific action. Additionally, when Mistral workflows launched from Heat get in, Mistral will create its own Trusts and Heat can't know what that will do. In the end you always kinda end up needing to do everything. Karbor is running into this as well.

No solution was discovered during the session, but I think all sides were happy enough that the problem and use cases have been clearly laid out and are now understood.

TripleO UI

Some of the issues relevant to the UI were brought up in other sessions, like standardising the environment files. Other issues brought up were around plan management, for example why do we use the Mistral environment in addition to Swift? Historically it looks like it's because it was a nice drop-in replacement for the defunct TripleO API and offered a similar API. Although it won't have an API by default, the suggestion is to move to using a file to store the environment during Pike and have a consistent set of templates: this way all the information related to a deployment plan will live in the same place. This will help with exporting/importing plans, which itself will help with CLI/UI interoperability (for instance there are still some differences in how and when the Mistral environment is generated, depending on whether you deployed with the CLI or the UI).

A number of other issues were brought up around networking, custom roles, tracking deployment progress, and a great many other topics but I think the larger problems around plan management was the only expected to turn into a spec, now proposed for review.

I18n and release models

After the UI session I left the TripleO room to attend a cross-project session about i18n, horizon and release models. The release model point is particularly relevant because the TripleO UI is a newly internationalised project as of Ocata and the first to be cycle-trailing (TripleO releases a couple of weeks after the main OpenStack release).

I'm glad I was able to attend this session. For one it was really nice to collaborate directly with the i18n and release management team, and catch up with a couple of Horizon people. For second it turns out tripleo-ui was not properly documented as cycle-trailing (fixed now!) and that led to other issues.

Having different release models is a source of headaches for the i18n community, already stretched thin. It means string freezes happen at different times, stable branches are cut at different points, which creates a lot of tracking work for the i18n PTL to figure which project is ready and do the required manual work to update Zanata upon branching. One part of the solution is likely to figure out if we can script the manual parts of this workflow so that when the release patch (which creates the stable branch) is merged, the change is automatically reflected in Zanata.

For the non-technical aspects of the work (mainly keeping track of deadlines and string freezes) the decision was that if you want to be translated, then you need to respect the same deadlines than the cycle-with-milestones projects do on the main schedule, and if a project doesn't want to - if it doesn't respect the freezes or cut branches when expected, then they will be dropped from the i18n priority dashboard in Zanata. This was particularly relevant for Horizon plugins, as there's about a dozen of them now with various degrees of diligence when it comes to doing releases.

These expectations will be documented in a new governance tag, something like i18n:translated.

Obviously this would mean that cycle-trailing projects would likely never be able to get the tag. The work we depend on lands late and so we continue making changes up to two weeks after each of the documented deadlines. ianychoi, the i18n PTL seemed happy to take these projects under the i18n wing and do the manual work required, as long as there is an active i18n liaison for the project communicating with the i18n team to keep them informed about freezes and new branches. This seemed to work ok for us during Ocata so I'm hopeful we can keep that model. It's not entirely clear to me if this will also be documented/included in the governance tag so I look forward to reading the spec once it is proposed! :)

In the case of tripleo-ui we're not a priority project for translations nor looking to be, but we still rely on the i18n PTL to create Zanata branches and merge translations for us, and would like to continue with being able to translate stable branches as needed.

CI Q&A

The CI Q&A session on Friday morning was amazingly useful and unanimously agreed it should be moved to the documentation (not done yet). If you've ever scratched your head about something related to TripleO CI, have a look at the etherpad!

Tagged with: events, openstack, tripleo

by jpichon at March 02, 2017 09:55 AM

January 31, 2017

Dougal Matthews

Interactive Mistral Workflows over Zaqar

It is possible to do some really nice automation with the Mistral Workflow engine. However, sometimes user input is required or desirable. I set about to write an interactive Mistral Workflow, one that could communicate with a user over Zaqar.

If you are not familiar with Mistral Workflows you may want to start here, here or here.

The Workflow

Okay, this is what I came up with.

---
version: '2.0'

interactive-workflow:

  input:
    - input_queue: "workflow-input"
    - output_queue: "workflow-output"

  tasks:

    request_user_input:
      action: zaqar.queue_post
      retry: count=5 delay=1
      input:
        queue_name: <% $.output_queue %>
        messages:
          body: "Send some input to '<% $.input_queue %>'"
      on-success: read_user_input

    read_user_input:
      pause-before: true
      action: zaqar.queue_pop
      input:
        queue_name: <% $.input_queue %>
      publish:
        user_input: <% task(read_user_input).result[0].body %>
      on-success: done

    done:
      action: std.echo output=<% $.user_input %>
      action: zaqar.queue_post
      retry: count=5 delay=1
      input:
        queue_name: <% $.output_queue %>
        messages:
          body: "You sent: '<% $.user_input %>'"

Breaking it down...

  1. The Workflow uses two queues one for input and one for output - it would be possible to use the same for both but this seemed simpler.

  2. On the first task, request_user_input, the Workflow sends a Zaqar message to the user requesting a message be sent to the input_queue.

  3. The read_user_input task pauses before it starts, see the pause-before: true. This means we can unpause the Workflow after we send a message. It would be possible to create a loop here that polls for messages, see below for more on this.

  4. After the input is provided, the Workflow must be un-paused manually. It then reads from the queue and sends the message back to the user via the output Zaqar queue.

See it in Action

We can demonstrate the Workflow with just the Mistral client. First you need to save it to a file and use the mistral workflow-create command to upload it.

First we trigger the Workflow execution.

$ mistral execution-create interactive-workflow
+-------------------+--------------------------------------+
| Field             | Value                                |
+-------------------+--------------------------------------+
| ID                | e8e2bfd5-3ae4-46db-9230-ada00a2c0c8c |
| Workflow ID       | bdd1253e-68f8-4cf3-9af0-0957e4a31631 |
| Workflow name     | interactive-workflow                 |
| Description       |                                      |
| Task Execution ID | <none>                               |
| State             | RUNNING                              |
| State info        | None                                 |
| Created at        | 2017-01-31 08:22:17                  |
| Updated at        | 2017-01-31 08:22:17                  |
+-------------------+--------------------------------------+

The Workflow will complete the first task and then move to the PAUSED state before read_user_input. This can be seen with the mistral execution-list command.

In this Workflow we know there will now be a message in Zaqar now. The Mistral action zaqar.queue_pop can be used to receive it...

$ mistral run-action zaqar.queue_pop '{"queue_name": "workflow-output"}'
{"result": [{"body": "Send some input to 'workflow-input'", "age": 4, "queue": {"_metadata": null, "client": null, "_name": "workflow-output"}, "href": null, "ttl": 3600, "_id": "589049397dcad341ecfb72cf"}]}

The JSON is a bit hard to read, but you can see the message body Send some input to 'workflow-input'.

Great. We can do that with another Mistral action...

$ mistral run-action zaqar.queue_post '{"queue_name": "workflow-input", "messages":{"body": {"testing": 123}}}'
{"result": {"resources": ["/v2/queues/workflow-input/messages/589049447dcad341ecfb72d0"]}}

After sending the message back to the requested Workflow we can unpause it. This can be done like this...

$ mistral execution-update -s RUNNING e8e2bfd5-3ae4-46db-9230-ada00a2c0c8c
+-------------------+--------------------------------------+
| Field             | Value                                |
+-------------------+--------------------------------------+
| ID                | e8e2bfd5-3ae4-46db-9230-ada00a2c0c8c |
| Workflow ID       | bdd1253e-68f8-4cf3-9af0-0957e4a31631 |
| Workflow name     | interactive-workflow                 |
| Description       |                                      |
| Task Execution ID | <none>                               |
| State             | RUNNING                              |
| State info        | None                                 |
| Created at        | 2017-01-31 08:22:17                  |
| Updated at        | 2017-01-31 08:22:38                  |
+-------------------+--------------------------------------+

Finally we can confirm it worked by getting a message back from the Workflow...

$ mistral run-action zaqar.queue_pop '{"queue_name": "workflow-output"}'
{"result": [{"body": "You sent: '{u'testing': 123}'", "age": 6, "queue": {"_metadata": null, "client": null, "_name": "workflow-output"}, "href": null, "ttl": 3600, "_id": "5890494f7dcad341ecfb72d1"}]}

You can see a new message is returned which shows the input we sent.

Caveats

As mentioned above, the main limitation here is that you need to manually unpause the Workflow. It would be nice if there was a way for the Zaqar message to automatically do this.

Polling for messages in the Workflow would be quite easy, with a retry loop and Mistral's continue-on. However, that would be quite resource intensive. If you wanted to do this, a Workflow task like this would probably do the trick.

  wait_for_message:
    action: zaqar.queue_pop
    input:
      queue_name: <% $.input_queue %>
    timeout: 14400
    retry:
      delay: 15
      count: <% $.timeout / 15 %>
      continue-on: <% len(task(wait_for_message).result) > 0 %>

The other limitation is that this Workflow now requires a specific interaction pattern that isn't obvious and documenting it might be a little tricky. However, I think the flexible execution it provides might be worthwhile in some cases.

by Dougal Matthews at January 31, 2017 07:40 AM

January 25, 2017

Dan Prince

Docker Puppet

Today TripleO leverages Puppet to help configure and manage the deployment of OpenStack services. As we move towards using Docker one of the big questions people have is how will we generate config files for those containers. We'd like to continue to make use of our mature configuration interfaces (Heat parameters, Hieradata overrides, Puppet modules) to allow our operators to seamlessly take the step towards a fully containerized deployment.

With the recently added composable service we've got everything we need. This is how we do it...

Install puppet into our base container image

Turns out the first thing you need of you want to generate config files with Puppet is well... puppet. TripleO uses containers from the Kolla project and by default they do not install Puppet. In the past TripleO uses an 'agent container' to manage the puppet installation requirements. This worked okay for the compute role (a very minimal set of services) but doesn't work as nicely for the broader set of OpenStack services because packages need to be pre-installed into the 'agent' container in order for config file generation to work correctly (puppet overlays the default config files in many cases). Installing packages for all of OpenStack and its requirements into the agent container isn't ideal.

Enter TripleO composable services (thanks Newton!). TripleO now supports composability and Kolla typically has individual containers for each service so it turns out the best way to generate config files for a specific service is to use the container for the service itself. We do this in two separate runs of a container: one to create config files, and the second one to launch the service (bind mounting/copying in the configs). It works really well.

But we still have the issue of how do we get puppet into all of our Kolla containers. We were happy to discover that Kolla supports a template-overrides mechanism (A jinja template) that allows you to customize how containers are built. This is how you can use that mechanism to add puppet into the Centos base image used for all the OpenStack docker containers generated by Kolla build scripts.

$ cat template-overrides.j2
{% extends parent_template %}
{% set base_centos_binary_packages_append = ['puppet'] %}

kolla-build --base centos --template-override template-overrides.j2

Control the Puppet catalog

A puppet manifest in TripleO can do a lot of things like installing packages, configuring files, starting a service, etc. For containers we only want to generate the config files. Furthermore we'd like to do this without having to change our puppet modules.

One mechanism we use is the --tags option for 'puppet apply'. This option allows you to specify which resources within a given puppet manifest (or catalog) should be executed. It works really nicely to allow you to select what you want out of a puppet catalog.

An example of this is listed below where we have a manifest to create a '/tmp/foo' file. When we run the manifest with the 'package' tag (telling it to only install packages) it does nothing at all.

$ cat test.pp 
file { '/tmp/foo':
  content => 'bar',
}
$ puppet apply --tags package test.pp
Notice: Compiled catalog for undercloud.localhost in environment production in 0.10 seconds
Notice: Applied catalog in 0.02 seconds
$ cat /tmp/foo
cat: /tmp/foo: No such file or directory

When --tags doesn't work

The --tags option of 'puppet apply' doesn't always give us the behavior we are after which is to generate only config files. Some puppet modules have custom resources with providers that can execute commands anyway. This might be a mysql query or an openstackclient command to create a keystone endpoint. Remember here that we are trying to re-use puppet modules from our baremetal configuration and these resources are expected to be in our manifests... we just don't want them to run at the time we are generating config files. So we need an alternative mechanism to suppress (noop out) these offending resources.

To do this we've started using a custom built noop_resource function that exists in puppet-tripleo. This function dynamically configures a default provider for the named resource. For mysql this ends up looking like this:

['Mysql_datadir', 'Mysql_user', 'Mysql_database', 'Mysql_grant', 'Mysql_plugin'].each |String $val| { noop_resource($val) }

Running a puppet manifest with this at the top will noop out any of the named resource types and they won't execute. Thus allowing puppet apply to complete and finish generating the config files within the specified manifest.

The good news is most of our services don't require the noop_resource in order to generate config files cleanly. But for those that do the interface allows us to effectively disable the resources we don't want to execute.

Putting it all together: docker-puppet.py

Bringing everything together in tripleo-heat-templates to create one container configuration interface that will allow us to configurably generate per-service config files. It looks like this:

  • manifest: the puppet manifest to use to generate config files (Thanks to composable services this is now per service!)
  • puppet_tags: the puppet tags to execute within this manifest
  • config_image: the docker image to use to generate config files. Generally we use the same image as the service itself.
  • config_volume: where to output the resulting config tree (includes /etc/ and some other directories).

And then we've created a custom tool to drive this per-service configuration called docker-puppet.py. The tool supports using the information above in a Json file format drive generation of the config files in a single action.

It ends up working like this:

Video demo: Docker Puppet

And thats it. Our config interfaces are intact. We generate the config files we need. And we get to carry on with our efforts to deploy with containers.

Links:

by Dan Prince at January 25, 2017 02:00 PM

January 23, 2017

James Slagle

Update on TripleO with already provisioned servers

In a previous post, I talked about using TripleO with already deployed and provisioned servers. Since that was published, TripleO has made a lot of progress in this area. I figured it was about time for an update on where the project is with this feature.

Throughout the Ocata cycle, I’ve had the chance to help make this feature more
mature and easier to consume for production deployments.

Perhaps most importantly, for pulling their deployment metadata from Heat, the servers are now configured to use a Swift Temporary URL instead of having to rely on a Keystone username and password.

Also, instead of having to bootstrap the servers with all the expected packages
and initial configuration that TripleO typically expects from instances that it
has deployed from pre-built images, you can now start with a basic CentOS image
installed with only the initial python-heat-agent packages and the agent
running.

There have also been other bug fixes and enhancements to enable this to work
with things such as network isolation and fixed predictable IP’s for all
networks.

I’ve started on some documentation that shows how to use this feature for
TripleO deployments: https://review.openstack.org/#/c/420369/
The documentation is still in progress, but I invite people to give it a try
and let me know how it works.

Using this feature, I’ve been able to deploy an Overcloud on 4 servers in a
remote lab from a virtualized Undercloud running in an entirely different lab.
There’s no L2 provisioning network connecting the 2 labs, and I don’t have
access to run a DHCP server on it anyway. The 4 Overcloud servers were
initially provisioned with the existing lab provisioning system
(cobbler/kickstart).

This flexibility helps build upon the composable nature of the
tripleo-heat-templates framework that we’ve been developing in TripleO
in that it allows integration with already existing provisioning environments.

Additionally, we’ve been using this capability extensively in our
Continuous Integration tests. Since TripleO does not have to be responsible for
provisioning the initial operating system on instances, we’ve been able to make
use of virtual instances provided by the OpenStack Infra project and
their managed Nodepool instance.

Like all other OpenStack CI jobs running in the standard check and gate queues,
our jobs are spread across several redundant OpenStack clouds. That means we
have a lot more virtual compute capacity for running tests than we previously
had available.

We’ve further been able to define job definitions using 2, 3, and 4 nodes in
the same test. These multinode tests, and the increased capacity, allow us to
test different deployment scenarios such as customized composable roles, and
recently, a job upgrading from the previous OpenStack release all the way to
master.

We’ve also scaled out our testing using scenario tests. Scenario tests allow us
to run a test with a specific configuration based on which files are actually
modified by the patch being tested. This allows the project to make
sure that patches affecting a given service are actually tested, since a
scenario test will be triggered deploying that service. This is important to
scaling our CI testing, because it’s unrealistic to expect to be able to deploy
every possible OpenStack service and test that it can be initially deployed, is
functional, and can be upgraded on every single TripleO patch.

If this is something you try out and have any feedback, I’d love to hear it and
see how we could improve this feature and make it easier to use.

by slagle at January 23, 2017 02:05 PM

January 12, 2017

Dougal Matthews

Calling Ansible from Mistral Workflows

I have spoken with a few people that were interested in calling Ansible from Mistral Workflows.

I finally got around to trying to make this happen. All that was needed was a very small and simple custom action that I put together, uploaded to github and also published to pypi.

Here is an example of a simple Mistral Workflow that makes use of these new actions.

---
version: 2.0

run_ansible_playbook:
  type: direct
  tasks:
    run_playbook:
      action: ansible-playbook
      input:
        playbook: path/to/playbook.yaml

Installing and getting started with this action is fairly simple. This is how I done it in my TripleO undercloud.

sudo pip install mistral-ansible-actions;
sudo mistral-db-manage populate;
sudo systemctrl restart openstack-mistral*;

There is one gotcha that might be confusing. The Mistral Workflow runs as the mistral user, this means that the user needs permission to access the Ansible playbook files.

After you have installed the custom actions, you can test it with the Mistral CLI. The first command should work without anything extra setup, the second requires you to create a playbook somewhere and provide access.

mistral run-action ansible '{"hosts": "localhost", "module": "setup"}'
mistral run-action ansible-playbook '{"playbook": "path/to/playbook.yaml"}'

The action supports a few other input parameters, they are all listed for now in the README in the git repo. This is a very young project, but I am curious to know if people find it useful and what other features it would need.

If you want to write custom actions, check out the Mistral documentation.

by Dougal Matthews at January 12, 2017 02:20 PM

December 16, 2016

Giulio Fidente

TripleO to deploy Ceph standlone

Here is a nice Christmas present: you can use TripleO for a standalone Ceph deployment, with just a few lines of YAML. Assuming you have an undercloud ready for a new overcloud, create an environment file like the following:

resource_registry:
  OS::TripleO::Services::CephMon: /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-mon.yaml
  OS::TripleO::Services::CephOSD: /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-osd.yaml

parameters:
  ControllerServices:
    - OS::TripleO::Services::CephMon
  CephStorageServices:
    - OS::TripleO::Services::CephOSD

and launch a deployment with:

openstack overcloud deploy --compute-scale 0 --ceph-storage-scale 1 -e the_above_env_file.yaml

The two lines from the environment file in resource_registry are mapping (and enabling) the CephMon and CephOSD services in TripleO while the lines in parameters are defining which services should be deployed on the controller and cephstorage roles.

This will bring up a two nodes overcloud with one node running ceph-mon and the other ceph-osd but the actual Christmas gift is that it implicitly provides and allows usage of all the features we already know about TripleO, like:

  • baremetal provisioning
  • network isolation
  • a web GUI
  • lifecycle management
  • ... containers
  • ... upgrades

For example, you can scale up the Ceph cluster with:

openstack overcloud deploy --compute-scale 0 --ceph-storage-scale 2 -e the_above_env_file.yaml

and this will provision a new Ironic node with the cephstorage role, configuring the required networks on it and updating the cluster config for the new OSDs. (Note the --ceph-storage-scale parameter going from 1 to 2 in the second example).

Even more interestingly is that the above will work for any service, not just Ceph, and new services can be added to TripleO with just some YAML and puppet, letting TripleO take care of a number of common issues in any deployment tool, for example:

  • supports multinode deployments
  • synchronizes and orders the deployment steps across different nodes
  • supports propagation of config data across different services

Time to try it and join the fun in #tripleo :)

by Giulio Fidente at December 16, 2016 10:00 PM

September 08, 2016

Emilien Macchi

Scaling-up TripleO CI coverage with scenarios

TripleO CI up to eleven!

testing

 

When the project OpenStack started, it was “just” a set of services with the goal to spawn a VM. I remember you run everything on your laptop and test things really quickly.
The project has now grown, and thousands of features have been implemented, more backends / drivers are supported and new projects joined the party.
It makes testing very challenging because everything can’t be tested in CI environment.

TripleO aims to be an OpenStack installer, that takes care of services deployment. Our CI was only testing a set of services and a few plugins/drivers.
We had to find a way to test more services, more plugins, more drivers, in a efficient way, and without wasting CI resources.

So we thought that we could create some scenarios with a limited set of services, configured with a specific backend / plugin, and one CI job would deploy and test one scenario.
Example: scenario001 would be the Telemetry scenario, testing required services like Keystone, Nova, Glance, Neutron, but also Aodh, Ceilometer and Gnocchi.

Puppet OpenStack CI is using this model for a while and it works pretty well. We’re going to reproduce it into TripleO CI to have consistency.

 

How scenarios are run when patching TripleO?

We are using a feature in Zuul that allows to select which scenario we want to test, depending on the files we try to patch in a commit.
For example, if I submit a patch in TripleO Heat Templates and I try to modify “puppet/service/ceilometer-api.yaml” which is the composable service for Ceilometer-API, Zuul will trigger scenario001. See Zuul layout:

- name : ^gate-tripleo-ci-centos-7-scenario001-multinode.*$
  files:
    - ^puppet/services/aodh.*$
    - ^manifests/profile/base/aodh.*$
    - ^puppet/services/ceilometer.*$
    - ^manifests/profile/base/ceilometer.*$
    - ^puppet/services/gnocchi.*$
    - ^manifests/profile/base/gnocchi.*$

 

How can I bring my own service in a scenario?

The first step is to look at Puppet CI matrix and see if we already test the service in a scenario. If yes, please keep this number consistent with TripleO CI matrix. If not, you’ll need to pick a scenario, usually the less loaded to avoid performances issues.
Now you need to patch openstack-infra/project-config and specify the files that are deploying your service.
For example, if your service is “Zaqar”, you’ll add something like:

- name : ^gate-tripleo-ci-centos-7-scenario002-multinode.*$
  files:
    ...
    - ^puppet/services/zaqar.*$
    - ^manifests/profile/base/zaqar.*$

Everytime you’ll send a patch to TripleO Heat Templates in puppet/services/zaqar* files or in puppet-tripleo manifests/profile/base/zaqar*, scenario002 will be triggered.

Finally, you need to send a patch to openstack-infra/tripleo-ci:

  • Modify README.md to add the new service in the matrix.
  • Modify templates/scenario00X-multinode-pingtest.yaml and add a resource to test the service (in Zaqar, it could be a Zaqar Queue).
  • Modify test-environments/scenario00X-multinode.yaml and add the TripleO composable services and parameters to deploy the service.

Once you send the tripleo-ci patch, you can block it with -1 workflow to avoid accidental merge. Now go on openstack/tripleo-heat-templates and try to modify zaqar composable service by adding a comment or something you actually want to test. In the commit message, add “Depends-On: XXX” where XXX is the commit ID of the tripleo-ci patch. When you’ll send the patch, you’ll see that Zuul will trigger the appropriate scenario and your service will be tested.

 

 

What’s next?

  • Allow to extend testing outside pingtest. Some services, for example Ironic, can’t be tested with pingtest. Maybe run Tempest for a set of services would be something to investigate.
  • Zuul v3 is the big thing we’re all waiting to extend the granularity of our matrix. A current limitation current Zuul version (2.5) is that we can’t run scenarios in Puppet OpenStack modules CI because we don’t have a way combine both files rules that we saw before AND running the jobs for a specific project without files restrictions (ex: puppet-zaqar for scenario002). In other words, our CI will be better with Zuul v3 and we’ll improve our testing coverage by running the right scenarios on the right projects.
  • Extend the number of nodes. We currently use multinode jobs which deploy an undercloud and a subnode for overcloud (all-in-one). Some use-cases might require a third node (example with Ironic).

Any feedback on this blog post is highly welcome, please let me know if you want me to cover something more in details.

by Emilien at September 08, 2016 10:52 PM

August 04, 2016

Dan Prince

TripleO: onward dark owl

Onward dark owl

I was on PTO last week and started hacking on the beginnings of what could be a new Undercloud installer that:

  • Uses a single process Heat (heat-all)
  • It does not require MySQL, Rabbit
  • Uses noauth (no Keystone)
  • Drives the deployment locally via os-collect-config

The prototype ends up looking like this:

openstack undercloud deploy --templates=/root/tripleo-heat-templates

A short presentation of the reasons behind this and demo of the prototype is available here:

Video demo: TripleO onward dark owl

An etherpad with links to the code/patches is here:

Etherpad

by Dan Prince at August 04, 2016 10:00 PM

June 16, 2016

Marios Andreou

Deploying a stable/mitaka OpenStack with tripleo-docs (and grep, git-blame and git-log).


Deploying a stable/mitaka OpenStack with tripleo-docs (and grep, git-blame and git-log).

This post is about how I was able to mostly successfully follow the tripleo-docs, to deploy a stable/mitaka 3-control 1-compute development (virt) setup so I can ultimately test upgrading this to Newton.

I wasn’t sure there was something worth writing here, but then the same tools I used to address the two issues I hit deploying mitaka kept coming up during the week when trying to upgrade that environment. I’ve had to use a lot of grep and git blame/log to get to the bottom of issues I’m seeing trying to upgrade the undercloud from stable/mitaka to latest/newton.

The Newton upgrade work is ongoing and possibly worthy of a future post.

I guess this post is mostly about git blame, and using URI munging using the change-id to get to actual gerrit code reviews from an error/issue you are seeing.

For the record I deployed stable/mitaka following the instructions at tripleo-docs and setting stable/mitaka repos in appropriate places. For example, during the virt-setup and the undercloud installation I followed the ‘Stable Branch’ admonition and enabled mitaka repos like:

sudo curl -o /etc/yum.repos.d/delorean-mitaka.repo http://trunk.rdoproject.org/centos7-mitaka/current/delorean.repo
sudo curl -o /etc/yum.repos.d/delorean-deps-mitaka.repo http://trunk.rdoproject.org/centos7-mitaka/delorean-deps.repo

Then when building images I enabled the mitaka repo like:

export NODE_DIST=centos7
export USE_DELOREAN_TRUNK=1
export DELOREAN_TRUNK_REPO="http://trunk.rdoproject.org/centos7-mitaka/current/"
export DELOREAN_REPO_FILE="delorean.repo"

The two issues I hit:


The pebcak issue.

This issue is the pebcak issue because whilst there is indeed a bona-fide bug that I hit here, I only hit that because I had a nit in my deployment command.

My deployment command looked like this:

openstack overcloud deploy --templates --control-scale 3 --compute-scale 1
  --libvirt-type qemu
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml
-e /usr/share/openstack-tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml
-e network_env.yaml --ntp-server "pool.ntp.org"

Deploying like that ^^^ got me this:

The files ('overcloud-without-mergepy.yaml', 'overcloud.yaml') not found
in the /usr/share/openstack-tripleo-heat-templates/ directory

Err.. no I’m pretty sure those files are there (!)

# [stack@instack ~]$ ls -l /usr/share/openstack-tripleo-heat-templates/overcloud-without-mergepy.yaml
  lrwxrwxrwx. 1 root root 14 Jun 17 08:55 /usr/share/openstack-tripleo-heat-templates/overcloud-without-mergepy.yaml -> overcloud.yaml

So I know that message is very likely from the tripleoclient so I traced it. The code has actually already been fixed on master so grep gave me nothing there. However when I also tried against stable/mitaka:

[m@m python-tripleoclient]$ git checkout stable/mitaka
Switched to branch 'stable/mitaka'
[m@m python-tripleoclient]$ grep -rni "not found in the" ./*
./tripleoclient/v1/overcloud_deploy.py:414:  message = "The files {0} not
found in the {1} directory".format(

So then we can now use git blame to get to the code review that fixed it. Since we now know the file that error message comes from, we can use git blame against master branch. Since it is fixed on master, something must have fixed it:

[m@m python-tripleoclient]$ git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
[m@m python-tripleoclient]$ git blame tripleoclient/v1/overcloud_deploy.py

1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  382)     def _try_overcloud_deploy_with_compat_yaml(self, tht_root, stack,
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  383)                                                stack_name, parameters,
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  384)                                                environments, timeout):
7a05679e tripleoclient/v1/overcloud_deploy.py        (James Slagle               2016-04-01 08:57:41 -0400  385)         messages = ['The following errors occurred:']
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  386)         for overcloud_yaml_name in constants.OVERCLOUD_YAML_NAMES:
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  387)             overcloud_yaml = os.path.join(tht_root, overcloud_yaml_name)
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  388)             try:
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  389)                 self._heat_deploy(stack, stack_name, overcloud_yaml,
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  390)                                   parameters, environments, timeout)
7a05679e tripleoclient/v1/overcloud_deploy.py        (James Slagle               2016-04-01 08:57:41 -0400  391)             except six.moves.urllib.error.URLError as e:
7a05679e tripleoclient/v1/overcloud_deploy.py        (James Slagle               2016-04-01 08:57:41 -0400  392)                 messages.append(str(e.reason))
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  393)             else:
1077cf13 tripleoclient/v1/overcloud_deploy.py        (Juan Antonio Osorio Robles 2015-12-04 09:29:16 +0200  394)                 return
7a05679e tripleoclient/v1/overcloud_deploy.py        (James Slagle               2016-04-01 08:57:41 -0400  395)         raise ValueError('\n'.join(messages))

So the git blame may not display great above, but I see the following line as particularly interesting since it is different to stable/mitaka:

7a05679e tripleoclient/v1/overcloud_deploy.py        (James Slagle               2016-04-01 08:57:41 -0400  392)                 messages.append(str(e.reason))

So now we can use git log to see the actual commit and check it is the one we are looking for:

[m@m python-tripleoclient]$ git log 7a05679e
commit 7a05679ebc944e3bec6f20c194c40fae1cf39d8d
Author: James Slagle <jslagle@redhat.com>
Date:   Fri Apr 1 08:57:41 2016 -0400

Show correct missing files when an error occurs

This function was swallowing all missing file exceptions, and then
printing a message saying overcloud.yaml or
overcloud-without-mergepy.yaml were not found.

The problem is that the URLError could occur for any missing file, such
as a missing environment file, typo in a relative patch or filename,
etc. And in those cases, the error message is actually quite misleading,
especially if the overcloud.yaml does exist at the exact shown path.

This change makes it such that the actual missing file paths are shown
in the output.

Closes-Bug: 1584792
Change-Id: Id9a70cb50d7dfa3dde72eefe0a5eaea7985236ff

Now that sounds promising! So not only do we have the actual bug number, but we have the Change-Id. We can use that to get to the gerrit code review:

[m@m ~]$ gimmeGerrit Id9a70cb50d7dfa3dde72eefe0a5eaea7985236ff

Where gimmeGerrit is a bash alias in my .profile:

  2  gimme_gerrit() {$
  3      gerrit_url="http://review.openstack.org/#q,$1,n,z"$
  4      firefox $gerrit_url$
  5  }$
  93 alias gimmeGerrit=gimme_gerrit$

So from the review to master I just made a cherry-pick to stable/mitaka.

Now the reason I was seeing this issue in the first place, was because my deploy command was indeed wrong (it’s just that the error message was eaten by this particular bug). I was using ‘network_env.yaml’ but I had actually created network-env.yaml. Yes, much palmface, but if I hadn’t I wouldn’t have backported the fix so meh.


The overcloud needs moar memory bug.

It is more or less well known in the tripleo community that 4GB overcloud nodes will no longer cut it even in a virt environment, which is why we default to 5GB on current master instack-undercloud.

I was seeing OOM issues on the overcloud nodes with current stable/mitaka like:

16021:Jun 14 10:53:07 overcloud-controller-0 os-collect-config[2330]: u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mError: /Stage[main]/Main/Pacemaker::Constraint::Base[storage_mgmt_vip-then-haproxy]/Exec[Creating order constraint storage_mgmt_vip-then-haproxy]: Could not evaluate: Cannot allocate memory - fork(2)\u001b[0m\n\u001b[1;31mError: /Stage[main]/Main/Pacemaker::Resource::Service[openstack-nova-novncproxy]/Pacemaker::Resource::Systemd[openstack-nova-novncproxy]/Pcmk_resource[openstack-nova-novncproxy]: Could not evaluate: Cannot allocate memory - /usr/sbin/pcs resource show openstack-nova-novncproxy > /dev/null 2>&1 2>&1\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Base[nova-vncproxy-then-nova-api-constraint]/Exec[Creating order constraint nova-vncproxy-then-nova-api-constraint]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Colocation[nova-api-with-nova-vncproxy-colocation]/Pcmk_constraint[colo-openstack-nova-api-clone-openstack-nova-novncproxy-clone]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Base[nova-consoleauth-then-nova-vncproxy-constraint]/Exec[Creating order constraint nova-consoleauth-then-nova-vncproxy-constraint]: Skipping because of failed dependencies\u001b[0m\n\u001b[1;31mWarning: /Stage[main]/Main/Pacemaker::Constraint::Colocation[nova-vncproxy-with-nova-consoleauth-colocation]/Pcmk_constraint[

16313:Jun 14 10:53:07 overcloud-controller-0 os-collect-config[2330]:
Error: /Stage[main]/Sahara::Service::Api/Service[sahara-api]: Could not
evaluate: Cannot allocate memory - fork(2)
16314:Jun 14 10:53:07 overcloud-controller-0 os-collect-config[2330]:
Error: /Stage[main]/Haproxy/Haproxy::Instance[haproxy]/Haproxy::Config[haproxy]/Concat[/etc/haproxy/haproxy.cfg]/Exec[concat_/etc/haproxy/haproxy.cfg]:
Could not evaluate: Cannot allocate memory - fork(2)

Suspecting from previous experience this would be defaulted in instack-undercloud:

[m@m instack-undercloud]$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
[m@m instack-undercloud]$ grep -rni 'NODE_MEM' ./*
./scripts/instack-virt-setup:89:export NODE_MEM=${NODE_MEM:-5120}

[m@m instack-undercloud]$ git blame scripts/instack-virt-setup | grep  NODE_MEM
2dec7d75 (Carlos Camacho  2016-03-30 09:17:44 +0000  89) export NODE_MEM=${NODE_MEM:-5120}

So using git log to see more about 2dec7d75:

[m@m instack-undercloud]$ git log 2dec7d75
commit 2dec7d7521799c0323d076cd66ba71ebb444c706
Author: Carlos Camacho <ccamacho@redhat.com>
Date:   Wed Mar 30 09:17:44 2016 +0000

    Overcloud is not able to deploy with the default 4GB of RAM using instack-undercloud

    When deploying the overcloud with the default value of 4GB of RAM the overcloud fails throwing "Cannot allocate memory" errors.
    By increasing the default memory to 5GB the error is solved in instack-undercloud

    Change-Id: I29036edeebefc1959643a04c5396e72863fdca5f
    Closes-Bug: #1563750

So as in the case of the pebcak issue, gimmeGerrit yields the review so I then just cherrypicked that to stable/mitaka too.

June 16, 2016 03:00 PM


Last updated: August 21, 2017 08:18 AM

TripleO: OpenStack Deployment   Documentation | Code Reviews | CI Status | CI Extended | Zuul Queue | Planet