TelekomCloud DevOps team2019-01-02T14:53:17+00:00http://telekomcloud.github.io/Happy New Year2019-01-02T11:13:00+00:00http://telekomcloud.github.io/2019/01/02/happy-new-year<p>We start the year with cleanup and refresh.</p>
Frank KloekerAn automat OpenStack deployment for develop and test system2014-05-07T16:00:00+00:00http://telekomcloud.github.io/2014/05/07/vagrant-openstack<p>When I joined Deutsche Telekom 2 years ago, I had to share a common reference test system with everyone in the rooms, including all operators and developers. This was quite troublesomes when you have new ideas to test without interfering anyone, also make sure that your experiments will not break things down and make your colleagues angry.</p>
<p><img src="/images/2014-05-07-vagrant-openstack/test_system.jpg" alt="Test system" />
Figure1: a local integration test for experiment on new features</p>
<p>Like any development process, a local integration test system is required. It must support developers editing and debugging OpenStack on the fly, as well as operators or package-manager testing a release. It’s also nice to reset the test system from dirty changes and provision it again as fast as possible. This post introduces such system and now available upstream on <a href="https://github.com/TelekomCloud/vagrant_openstack">our repository</a>.</p>
<h1 id="1-overview">1. Overview</h1>
<p><img src="/images/2014-05-07-vagrant-openstack/vagrant_project.jpg" alt="vagrant project" />
Figure 2: deployment of OpenStack by vagrant</p>
<p>Vagrant is responsible for bringing the VMs up (step 1), setting up host-only networks within Virtual Box (step 2) and install base packages. From now on there are two ways to deploy OpenStack depends on your needs (step 5). For development purpose, OpenStack is deployed by devstack. For testing release packages, puppet is in use. The two deployments are configurable in a global file. Their git modules are authenticated (in case your company requires ssl for connection) and drop-in the vm for deployment (step 4).</p>
<p>From my personal use case, I always need to switch between the 2 deployments: devstack for coding and puppet for testing packages. Switching between the two is also supported to keep the previous deployment save, separated and reuse.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> # set environment in config.file, i.e puppet or devstack:
env: puppet
</code></pre></div></div>
<h2 id="11-networking">1.1 Networking</h2>
<p>Back to that time I only found projects that deploy all OpenStack components in one VM. This does not satisfy our needs because the all-in-one deployment does not reflect the behavior of the GRE data network within different OpenStack components. Figure 2 above shows multi nodes: control, compute and neutron node along with the 3 host-only networks for management, data GRE, and public network are brought up automatically.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> # supports multi nodes to enable/disable
$ vagrant status
Cache enabled: /home/tri/git/vagrant_openstack/cache
Working on environment deployed by puppet branch icehouse...
compute2.pp disabled in config.yaml
neutron2.pp disabled in config.yaml
Current machine states:
puppet not created (virtualbox)
control.pp not created (virtualbox)
compute1.pp not created (virtualbox)
neutron1.pp not created (virtualbox)
</code></pre></div></div>
<p>In such testing environment, we also need to test the floating ips of the VMs over the public network, because it would be extremely boring if the nova booting VMs cannot connect to the Internet. For this reason, figure 3 shows how packages from inside the neutron node go out and back. Packages coming from br-tun, br-int, go to br-ex on neutron node, are forwarded to the NAT interface (vboxnet0) and SNATed so that they can find the way to go back.</p>
<p><img src="/images/2014-05-07-vagrant-openstack/vagrant_net2.jpg" alt="SNAT for testing floating ips" />
Figure 3: SNAT for testing floating ips</p>
<h2 id="12-storage">1.2 Storage</h2>
<p>For a simple nova volume setup, iSCSI is chosen by default. The VBoxManage command [3] is very useful in this case for our vagrant to create a virtual storage and attach to control node. For those who interests in coding, the VBoxManage command is as follows</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> # create virtual storage
VBoxManage createhd --filename <vdi> --size <cinder_storage_size>
# and attach it to a vm
VBoxManage storageattach <vm_id> --storagectl "SATA Controller" --port 1
--device 0 --type hdd --medium <vdi>
</code></pre></div></div>
<p>The system also formats the new virtual storage, and creates a volume group <code class="highlighter-rouge">cinder-volumes</code> for cinder. It’s also worth to mention that, in order to keep the data separated for 2 deployed environment, two separated virtual storages have to be created for each environment.</p>
<h1 id="2-deployment-environments">2. Deployment environments</h1>
<h2 id="21-puppet">2.1 puppet</h2>
<p>A VM puppetmaster is up with puppetdb installed. It pulls manifests from a configurable git repository to a directory inside the vm and use these manifests to deploy OpenStack on the other VMs. By default manifests in <a href="https://github.com/TelekomCloud/vagrant_openstack_puppet">4</a> is provided as an example to try out the new Icehouse release with ML2 plugin/l2 population. You can also provide your own manifests by configuring a puppet repository and which site.pp to use for the nodes definition:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> puppet_giturl: git@your.repository.git
puppet_branch: your_branch
puppet_site_pp: manifests/your_site.pp
</code></pre></div></div>
<h2 id="22-devstack">2.2 devstack</h2>
<p>I like the deployment whereby provisioning script is provided directly inside the vm. For this reason, puppet master for deployment devstack is not necessary. Insteads devstack is directly cloned and setup inside all VMs. It is also config to use the .pip repository of OpenStack [3]. Pydev is also included in the VMs for remote debugging from the host machine supported.</p>
<p><img src="/images/2014-05-07-vagrant-openstack/test_system2.jpg" alt="vagrant project" />
Figure 2: Remote debugging with MySQL Workbench & Eclipse</p>
<h1 id="3-performance-boost">3. Performance boost</h1>
<p>One issue is the long deployment time, especially if you have a slow connection or connection drops in the middle of the deployment. So I tried out all tiny possibilities to reduce the time consuming.</p>
<h2 id="31-caching">3.1 Caching</h2>
<p>When a VM is destroy and up again, it must download all packages from scratch. A simple solution for caching is implemented which cuts the deployment time by half. It’s even more faster for a second deployment, since all packages and the glance image are cached for further use so internet access is not necessary.</p>
<p>Caching is supported for both environments: all .deb packages installed by puppet, as well as all .pip packages installed by devstack are cached and shared between VMs. The tables below just gives a clue how much time we can save for bringing up the machines with cache enabled with a pretty fast internet download speed (4Mbit/sec), each vm 1 cpu and 1024 ram.</p>
<p><em>a) Puppet deployment in secs</em></p>
<table>
<thead>
<tr>
<th>Nodes</th>
<th>no cache</th>
<th>with cache</th>
</tr>
</thead>
<tbody>
<tr>
<td>Control</td>
<td>312</td>
<td>227</td>
</tr>
<tr>
<td>Compute</td>
<td>110</td>
<td>83</td>
</tr>
<tr>
<td>Neutron</td>
<td>109</td>
<td>62</td>
</tr>
<tr>
<td>Total</td>
<td>532</td>
<td>230</td>
</tr>
</tbody>
</table>
<p>win 5 min</p>
<p><em>b) Devstack deployment in secs</em></p>
<table>
<thead>
<tr>
<th>Nodes</th>
<th>no cache</th>
<th>with cache</th>
</tr>
</thead>
<tbody>
<tr>
<td>Control</td>
<td>764</td>
<td>655</td>
</tr>
<tr>
<td>Compute</td>
<td>764</td>
<td>341</td>
</tr>
<tr>
<td>Neutron</td>
<td>224</td>
<td>208</td>
</tr>
<tr>
<td>Total</td>
<td>1754</td>
<td>660</td>
</tr>
</tbody>
</table>
<p>win 18 min</p>
<p>To test a custom package, simply replace it under the cache folder and bringing up new VMs.</p>
<h2 id="32-customize-vagrant-box">3.2 Customize vagrant box</h2>
<p>To reduce the vagrant up time, a vagrant box is customized with packages pre-installed. The box is based on precise64 with packages such as VBox Guest Additions 4.3.8, puppet, dnsmasq, r10k, vim, git, rubygems, msgpack, lvm2 pre-installed. The box is also zero out all empty spaces and white out all logs to have a minimum size as possible (378 Mb) to distribute on <a href="https://vagrantcloud.com/TelekomCloud/">vagrant cloud</a>. This again cuts down 70 secs for each vm up (from 79 secs to 8 secs).</p>
<p>win 4.6 min (4 VMs x 70 secs)</p>
<h1 id="references">REFERENCES</h1>
<ol>
<li><a href="https://github.com/TelekomCloud/vagrant_openstack">https://github.com/TelekomCloud/vagrant_openstack</a></li>
<li><a href="https://github.com/TelekomCloud/vagrant_openstack_puppet">puppet for Icehouse</a></li>
<li><a href="http://www.virtualbox.org/manual/ch08.html">VBoxManage</a></li>
<li><a href="http://pypi.openstack.org/openstack">OpenStack pypi</a></li>
</ol>
Tri Hoang VoCeph Performance Analysis: fio and RBD2014-02-26T22:42:00+00:00http://telekomcloud.github.io/ceph/2014/02/26/ceph-performance-analysis_fio_rbd<p>With this blog post we want to share insights into how the Platform Engineering team for the <a href="https://portal.telekomcloud.com/en/">Business Marketplace</a> at Deutsche Telekom AG analyzed a Ceph performance issue. Ceph is used for both block storage and object stroage in our cloud production platform.</p>
<h1 id="background">Background</h1>
<p>While the most common IO workload patterns of web applcations were not causing issues on our Ceph clusters, serving databases or other IO demanding applications with high IOPS requirements (with 8K or 4K blocksize) turns out more challenging.</p>
<p>Recently, we got a report from a colleague discussing performance regressions on our Ceph cluster for rados block devices. We were presented results of <code class="highlighter-rouge">dd if=/dev/zero of=/dev/vdx bs=1k count=100M</code>.</p>
<p>We were not very happy about getting a report with blocksize of 1k, synthetic sequential writes and /dev/zero as input. But it turned out that even with 4k or 8k, we didn’t get great numbers.</p>
<p>The cluster at that time was dealing fine with 32k and higher blocksizes. But 32k and less indeed resulted in a performance regression compared to an older
generation of our Ceph cluster deployment.</p>
<h1 id="analysis">Analysis</h1>
<p>First we reproduced the issue with <code class="highlighter-rouge">fio</code> inside an OpenStack instance with a Cinder-attached RBD, even with pure random write.
Sequential and random reads of the entire RBD were not affected inside the VM.</p>
<p>Also, the average results of <code class="highlighter-rouge">rados bench</code> were not perfect but not bad either - lets say they were OK. But not a good indicator if it was a pure Ceph problem or maybe something else within our network.</p>
<p>Initially we spent some time by analyzing with <code class="highlighter-rouge">tcpdump</code> and <code class="highlighter-rouge">systemtap</code> scripts the traffic <code class="highlighter-rouge">librbd</code> was seeing. Indeed we found situations were the sender buffer of the VM host got full while performing 4K random write stress loads inside a guest. (For this we used the example <code class="highlighter-rouge">systemtap</code> script: <code class="highlighter-rouge">sk_stream_wait_memory.stp</code>)</p>
<p>But this only happened on very intense 4k write loads inside the VM.</p>
<p>Was the IO-pattern sane to tests? Corner case issues?</p>
<p>We decided to look for the right tool to measure detailed latency and IOPS, which is able to reproduce the same load pattern.
Bonus point: replaying real-life workloads, that come close to production workload - so we can tune for the right workload (and not for dd bs=4k).</p>
<p>Here started the challenge, since we looking for a tool, that is able to generate the same load on each of the following layers:</p>
<ul>
<li>inside the VM guest (to test the RBD QEMU driver, which is using librbd)</li>
<li>on the VM host (using the same code: librbd. We hesitated to use the RBD kernel module to not miss potential issues inside librbd - if any)</li>
<li>on the Ceph storage node: testing the ceph-osd code (<code class="highlighter-rouge">FileStore</code> implementation. Covering OSD-disk and Journal-disk writes)</li>
<li>on the Ceph storage node: testing the filesystem (XFS) with the used formated options and mount options</li>
<li>on the Ceph storage node: testing the dmcrypt block device (yep, we do this.)</li>
<li>on the Ceph storage node: testing the block device directly (through storage/RAID controller. RAID0, write-through)</li>
</ul>
<p>We would have to use different tools that might produce different workloads, which could lead to different results per test on different layers.</p>
<h1 id="the-right-tool-fio">THE right tool: <code class="highlighter-rouge">fio</code></h1>
<p><code class="highlighter-rouge">fio</code> was pretty much the perfect match for our cases - it was only missing the capability to talk to Ceph <code class="highlighter-rouge">RBD</code> and the Ceph internal <code class="highlighter-rouge">FileStore</code> directly.</p>
<p>Fortunately, fio is supporting various IO engines. So we decided to add support for <code class="highlighter-rouge">librbd</code> and for the Ceph internal <code class="highlighter-rouge">FileStore</code> implementation, to have a artificial OSD processes to benchmark the OSD implementation via <code class="highlighter-rouge">fio</code>.</p>
<h1 id="fio-librbd-ioengine-support"><code class="highlighter-rouge">fio</code> <code class="highlighter-rouge">librbd</code> ioengine support</h1>
<p>With the latest master you can get <code class="highlighter-rouge">fio librbd</code> support and test your Ceph RBD cluster with IO patterns of your choice, with nearly all of the fio functionality (some are not supported yet by the <code class="highlighter-rouge">RBD engine</code> - not fio’s fault. Patches are welcome!). All you need is to install the <code class="highlighter-rouge">librbd</code> development package (e.g. <code class="highlighter-rouge">librbd-dev</code> or <code class="highlighter-rouge">librbd-dev</code> and dependencies) or have the library and its headers at the designated location.</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git clone git://git.kernel.dk/fio.git
$ cd fio
$ ./configure
[...]
Rados Block Device engine yes
[...]
$ make
</code></pre></div></div>
<h1 id="first-run-with-fio-with-rbd-engine">First run with <code class="highlighter-rouge">fio</code> with <code class="highlighter-rouge">rbd</code> engine</h1>
<p>The <code class="highlighter-rouge">rbd</code> engine will read <code class="highlighter-rouge">ceph.conf</code> from the default location of your Ceph build.</p>
<p>A valid RBD client configuration of <code class="highlighter-rouge">ceph.conf</code> is required. Also authentication and key handling needs to be done via <code class="highlighter-rouge">ceph.conf</code>.
If <code class="highlighter-rouge">ceph -s</code> is working on the designated RBD client (e.g. OpenStack compute node / VM host), the <code class="highlighter-rouge">rbd</code> engine is nearly good to go.</p>
<p>One preparation step left: You need to create a test rbd in advance. WARNING: do not use existing RBDs which might hold valuable data!</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rbd -p rbd create --size 2048 fio_test
</code></pre></div></div>
<p>Now one can use the <code class="highlighter-rouge">rbd</code> engine job file shipped with <code class="highlighter-rouge">fio</code>:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./fio examples/rbd.fio
</code></pre></div></div>
<p>The example <code class="highlighter-rouge">rbd.fio</code> in detail looks like this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>######################################################################
# Example test for the RBD engine.
#
# Runs a 4k random write test agains a RBD via librbd
#
# NOTE: Make sure you have either a RBD named 'fio_test' or change
# the rbdname parameter.
######################################################################
[global]
#logging
#write_iops_log=write_iops_log
#write_bw_log=write_bw_log
#write_lat_log=write_lat_log
ioengine=rbd
clientname=admin
pool=rbd
rbdname=fio_test
invalidate=0 # mandatory
rw=randwrite
bs=4k
[rbd_iodepth32]
iodepth=32
</code></pre></div></div>
<p>This will perform a 100% random write test across the entire RBD size (will be determined via <code class="highlighter-rouge">librbd</code>), as Ceph user <code class="highlighter-rouge">admin</code>
using the Ceph pool <code class="highlighter-rouge">rbd</code> (default) and the just created and empty RBD <code class="highlighter-rouge">fio_test</code> with a 4k blocksize and iodepth of 32 (numbers
of IO requests in flight). The engine is making use of asynchronous IO.</p>
<p>Current implementation limits:</p>
<ul>
<li><code class="highlighter-rouge">invalidate=0</code> is mandatory for now. The engine just fails without this for now.</li>
<li><code class="highlighter-rouge">rbd</code> engine will not cleanup once the test is done. The given RBD is filled up after a complete test run. (We use this to make prefilled tests right now. And recreate the RBD if required.)</li>
</ul>
<h1 id="results">Results</h1>
<p>Some carefully selected results from one of our development environments while investigating on the actual performance issue.</p>
<p>Following plot shows the original situation we were facing. Result from the fio example RBD job run with a 2GB RBD which was initial empty:
<img src="/images/2014-02-26-ceph-performance-analysis_fio_rbd/2G-RBD.4k.randwrite.iodepth32-iops.low.png" alt="Drawing" style="width: 650px;" /></p>
<p>After analysis on the Ceph OSD nodes with fio <code class="highlighter-rouge">FileStore</code> implementation and more analysis and tuning we managed to get this result:
<img src="/images/2014-02-26-ceph-performance-analysis_fio_rbd/2G-RBD.4k.randwrite.iodepth32-iops.good.png" alt="Drawing" style="width: 650px;" /></p>
<p>More detailed analysis and result and configuration/setup details will follow with the next postings on the TelekomCloud Blog.</p>
<h1 id="conclusion">Conclusion</h1>
<p><code class="highlighter-rouge">fio</code> is THE flexible IO tester - now even for Ceph RBD tests!</p>
<h1 id="outlook">Outlook</h1>
<p>We are looking forward to going into more details in the next post on our performance analysis story with our Ceph RBD cluster performance.</p>
<p>In case you are at the <a href="http://www.inktank.com/cephdays/frankfurt/">Ceph Day</a> tomorrow in Frankfurt, look out for Danny to get some more inside about our efforts arround Ceph and fio here at Deutsche Telekom.</p>
<h1 id="acknowledgements">Acknowledgements</h1>
<p>First of all, we want to thank Jens Axboe and all <code class="highlighter-rouge">fio</code> contributors for this awesome swiss-knife-IO-tool! Secondly, we thank Wolfgang Schulze and his Inktank Service team and Inktank colleagues for their intense support - especially when it turned out pretty early in the analysis, that Ceph was not causing the issue, still they teamed up with us to figure out what was going on.</p>
<h1 id="references">References</h1>
<p><a href="http://git.kernel.dk/?p=fio.git;a=summary">http://git.kernel.dk/?p=fio.git;a=summary</a></p>
Danny Al-Gaaf & Daniel GollubNew Ways of Tempest Stress Testing2013-09-11T00:00:00+00:00http://telekomcloud.github.io/2013/09/11/new-ways-of-tempest-stress-testing<h1 id="overview">Overview</h1>
<p>There are many stress test frameworks for OpenStack that are all pretty similar in nature. They follow fixed scenarios and fork many worker processes. Often they are difficult to enhance, since they were written for a single purpose.</p>
<p>With <a href="https://blueprints.launchpad.net/tempest/+spec/stress-tests">blueprint stress-test</a> the community of Tempest developers focused to build a single and very flexible stress test framework, inside of Tempest.</p>
<h1 id="a-stress-test-is-not-a-test-domain">A Stress Test is not a Test Domain</h1>
<p>In the past, stress tests had their own area inside Tempest. New tests were introduced mainly as clones of an exiting API, a scenario test or a mixture of both. But do stress tests really have their own testing domain?</p>
<p>Two main purposes of stress test are obvious:</p>
<ul>
<li>Having a framework to find/reproduce race conditions</li>
<li>Having a framework to simulate real-life load with a mixture of load profiles</li>
</ul>
<p>Those two topics are already covered by Tempest tests today: grouping API test enables us to detect race conditions, using scenario test, we can simulate load profiles.</p>
<h1 id="tempest-stress-test-core">Tempest Stress Test Core</h1>
<p>The core of the stress test framework is quite simple: It’s responsible for forking worker processes and summarizing results. How many processes should be forked is configurable in a JSON configuration file, which also can provide multiple arguments for each stress test.</p>
<h1 id="integration-of-existing-tests">Integration of Existing Tests</h1>
<p>In order to stop duplication code, we started to integrate existing Tempest tests into the stress test framework. A wrapper was build to call any kind of unit test and make it available to the framework. With that, it’s easy to group existing tests. Here is an example of how this is done for two unit tests:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[{"action": "tempest.stress.actions.unit_test.UnitTest",
"threads": 8,
"kwargs": {"test_method": "tempest.cli.simple_read_only.test_glance.\
SimpleReadOnlyGlanceClientTest.test_glance_fake_action",
"class_setup_per": "process"},
"action": "tempest.stress.actions.unit_test.UnitTest",
"threads": 8,
"kwargs": {"test_method": "tempest.api.volume.test_volumes_actions.\
VolumesActionsTest.test_attach_detach_volume_to_instance",
"class_setup_per": "process"},
}]
</code></pre></div></div>
<p>This will fork in total 16 processes that will conduct glance and cinder stress testing.</p>
<h2 id="the-stress-test-discovery">The Stress Test Discovery</h2>
<p>Test discovery is done like this:</p>
<p><img src="/images/2013-09-11-new-ways-of-tempest-stress-testing/tempest_stress_discovery.png" alt="Drawing" style="width: 650px;" /></p>
<p>Instead of manually adding exiting tests to the stress test framework, the next logical step was to introduce a decorator. It allows test developers to decide if a test is made available to the stress test framework. Or, to be more precise, it marks tests as being meaningful stress tests. The decorator can be used like this:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@stresstest(class_setup_per='process')
@attr(type='smoke')
def test_attach_detach_volume_to_instance(self):
</code></pre></div></div>
<p>It can simply get added to any existing unit test or used as the only purpose for a test. Internally it is based on the existing mechanism of attribute discovery of <code class="highlighter-rouge">testtools</code> and adds the attribute <code class="highlighter-rouge">stress</code>. It has one mandatory parameter <code class="highlighter-rouge">class_setup_per</code>, since it must be decided when the <code class="highlighter-rouge">setUpClass</code> function should be called: For every process, for every action or just globally. This depends on the content of the <code class="highlighter-rouge">setUpClass</code> and must be decided by the developer. In many cases a call on a per process level is sufficient.</p>
<p>All existing test attributes like <code class="highlighter-rouge">smoke</code> or <code class="highlighter-rouge">gate</code> can also be used as filter within the discovery function.</p>
<h2 id="which-tests-are-good-candidates">Which Tests are Good Candidates?</h2>
<p>In fact, it’s often easier to identify test cases that aren’t good candidates:</p>
<ul>
<li>Negative test</li>
<li>Single unit test function that cover only one little aspect (like listing volumes)</li>
<li>Tests that interfere each other (like changing quotas)</li>
</ul>
<p>All others tests might be interesting candidates to get integrated and used from the test framework. So please feel free to identify new cases and contribute them to OpenStack/Tempest.</p>
Marc KodererOpenStack Networking High Availability concept2013-06-10T19:42:00+00:00http://telekomcloud.github.io/2013/06/10/openstack-networking-high-availability<p>Getting OpenStack highly available has been a hot topic for us at Deutsche Telekom AG. With the upstream <a href="http://docs.openstack.org/trunk/openstack-ha/content/ch-intro.html">OpenStack High Availability documentation</a>, it’s already well documented how to configure supporting services like MySQL and RabbitMQ in highly available setups.</p>
<p>A challenging area with regards to high availability has been OpenStack Networking. For our OpenStack Grizzly-based cloud, we have made great progress with our solution that we would like to share: how to implement OpenStack Networking (L3 agent) highly available in active/active mode without pacemaker, using traditional routing/balancing functionality.</p>
<p>Our primary goal is to keep VMs available/reachable form the internet with redundant network. I’ll focus on the high-level idea today and post later about the detailed implmentation.</p>
<p>Lets assume we have OpenStack up and running, including OpenStack Networking with Open vSwitch plug-in. The traffic will usually flow from internet -> Network Node -> Open vSwitch -> Full mash GRE tunnel -> Compute node -> Open vSwitch on Compute - >VM:</p>
<p><img src="/images/2013-06-10-openstack-networking-high-availability/diagram_1.jpg" alt="Diagram 1" /></p>
<p>Next, we need a second Network node with Open vSwitch and GRE tunnels to connect to the same Compute nodes:</p>
<p><img src="/images/2013-06-10-openstack-networking-high-availability/diagram_2.jpg" alt="Diagram 2" /></p>
<p>OpenStack Networking gained an important feature in Grizzly, which allow us to <a href="https://blueprints.launchpad.net/quantum/+spec/quantum-scheduler">schedule to multiple network nodes</a>.</p>
<p>Now we can create a VM on compute node with two nics and assignee two different IP addresses to them. We need to make sure to use IPs that are part of each of the subnets that are mapped to the network nodes / L3 agents.</p>
<p>With this, we have now multiple paths to go out from the VM. In order to avoid any kind of routing problems you need to configure PBR (Policy based routing) insdie the VM. (Details will be provided with the next post.)</p>
<p>The goal is to make sure that packets arriving on interface ethX will be replayed or send back via the same interface. This will require two routing tables and two default routes, one for each interface.</p>
<p>Having done this, we now have multiple paths to the same VM, with different IP addresses.</p>
<p>In case one of network nodes (L3 agents) is not available, the GRE tunnel is not up or for any other reason you cannot access the VM via the first Network node, you can still reach the same VM via the second Network node, but with different IP address.</p>
<p>Finally, we need to set up a load balancer in front of for the Network node to manage the IP swapping, to hide any changes to the IP addresses form the public network.</p>
<p>Summarizing, we now have the networking node working in active/active mode and the traffic will be load-balanced, which also allows us to double the throughput. The traffic flow now looks like this: Internet -> load balancer -> Network node 1 or 2 -> Open vSwitch -> Full mash GRE tunnel -> Compute node -> OpenVswitch -> VM:</p>
<p><img src="/images/2013-06-10-openstack-networking-high-availability/diagram_3.jpg" alt="Diagram 3" /></p>
<p>The redundancy of the load balancer is out of scope for this post. You’ll have to choose a load balancer, that best meets your requirements. It could be a hardware appliance, or software based.</p>
<p>In case you don’t want to implement a load balancer in front of the network node, but still want to keep the VM highly availability, you need to take care of the IP address changing yourself. One option would be to use the IP SLA feature of CISCO routers for this. It can monitor the availability of the path to VM and switch to next path with NAT immediately, in case one path is not available:</p>
<p><img src="/images/2013-06-10-openstack-networking-high-availability/diagram_4.jpg" alt="Diagram 4" /></p>
<p>I’m looking forward to your comments and questions!</p>
Arsen GhevondyanHello World!2013-06-10T18:42:00+00:00http://telekomcloud.github.io/2013/06/10/hello-world<p>Did you know Deutsche Telekom AG has been using and developing OpenStack-based clouds for the last 1.5 years? We first publicly talked about our OpenStack efforts at <a href="http://www.telekom.com/media/media-kits/104982">CeBIT 2012</a> in March 2012 and at the OpenStack Folsom Design Summit in San Francisco. A lot has happened since then! While OpenStack Essex was not fully ready to meet our requirements, we have been in production with OpenStack Folsom for a while now. Currently our Cloud Development and Operations team is busy preparing the launch of our OpenStack Grizzly-based cloud.</p>
<p>With the creation of this team blog, we want to share our ideas and findings in and around OpenStack and discuss them with the community. We are looking forward to the conversation!</p>
Christoph Thiel