Learn more about building Cloud Foundations.

OpenStack Swift offers Object Storage, a fundamental infrastructure service in any cloud offering. Our IaaS team worked hard to bring Swift to all our cloud locations recently. This means customers can now for example easily create cross-site backups of their database services in Cloud Foundry using Swift Object Storage.

In this post we share our story of making our Swift production ready and how we tracked down a gnarly issue in our HAproxy setup.

HAProxy for Swift using radosgw

Our Swift implementation is based on Ceph radosgw, which we operate in a dedicated cluster. Radosgw only provides the object storage API on top of Ceph, which ultimately stores and replicates the objects to ensure durability. In front of this radosgw cluster sits a triplet of HAProxy load balancers to ensure high-availability. Among a few other things like terminating SSL, this HAProxy cluster needs to handle CORS requests since radosgw has no such functionality builtin.

CORS setup for radosgw

At meshcloud, we require our cloud services to properly implement CORS. For example, CORS allows our control panel interface directly with the Swift API.

Our initial research dug up a few options for setting up CORS with HAProxy. However, most were either focused on handling CORS preflight requests only or just adding CORS headers (such as) Access-Control-Allow-Origin to all requests. We actually needed HAProxy to do both for us, so the initial solution we came up with was to use HAProxy's Lua integration to handle CORS preflight requests and traditional rspadd statements to add CORS headers to all other requests.

The Swift API uses headers quite heavily, for example to configure Access Control List (ACL) on containers or to communicate metadata about objects. Browsers require explicit whitelisting of all headers that the browser should expose to a CORS requester. So the first thing we did was to scan the Swift API documentation and extract all commonly used headers that we would need to expose for CORS. This is the initial configuration that we started with:

The config above makes use of a Lua script to answer CORS preflight requests.

Testing in Pre-Production

At meshcloud we are very serious about thoroughly testing new cloud services internally before we offer them as public beta and finally offering them to customers with a production SLA. So to test our Openstack Swift implementation, we configured one of our dev installation of Cloud Foundry to use Swift as its blob store. We deploy hundreds of Apps per day to this dev installation, so this put some realistic load on our Swift cluster.

Within hours of switching over to Swift, we started noticing issues. Cloud Foundry apparently had trouble accessing Swift.

Error processing app files: Error uploading application.
Failed to perform blobstore operation after three retries.
error running command: exit status 1

Our monitoring indicated that Cloud Foundry received a HTTP 502 Bad Gateway response from Swift. A HTTP 502 error indicates that a proxy (in this case HAProxy) had an issue processing a request with its upstream server. So the first thing we checked were the radosgw logs. Analyzing the radosgw on debug level did not give us a clue for why radosgw would fail a request.

So the next thing we checked were the HAProxy logs. These contained a few log lines like this when the error occured: [05/Sep/2017:08:53:31.723] objectstore~ radosgw/ceph00 35/0/1/-1/1333 502 15573 - - PH-- 1/1/1/1/0 0/0 {} “GET /swift/v1/cf-dev-darz-cc-resources/app_bits_cache%2F43%2Fb8%2F43b85355ee7d6262ee6c0e01b8e2f1f8c46a5e7d HTTP/1.1”

The interesting bit here is the termination state "PH--". According to the HAProxy documentation for termination state, this indicates that HAProxy had an issue processing request headers. Further analyzing the logs, we also noticed that only some requests were failing, usually less than one in 1000. Testing various workloads, we also identified that the issue appeared to occur more frequently when the system was under high load. To isolate the issue, we started hammering our Swift cluster using curl, repeating the same request thousand of times over. Still, only some requests were failing with HTTP status 502, while all others passed with HTTP 200 OK.

tcpdump to the rescue?

The next step in our debugging adventure was to capture a tcpdump at HAProxy for ingoing and outgoing traffic. We also disabled SSL to make it easier to analyze.
Some digging in the captured dump with the excellent WireShark lead us to isolate the TCP Streams of the following two HTTP requests between HAProxy and radosgw. The request on the right belongs to a session that HAproxy returned a HTTP status 502 to the client, whereas HAProxy processed the request on the left successfully to the client with HTTP status 200.

Since HAProxy indicated that it had a problem processing the HTTP headers it received from radosgw, we looked at the payload data of both TCP Streams and did a binary comparison.

From this, we could see that the headers and the first payload packet were identical (except for a request id header, which was expected). Nonetheless, HAProxy decided to abort one stream after the first payload segment by sending a TCP RST, ACK to radosgw. This made it look like HAProxy is at fault, so we decided to dig deeper and try to better understand what causes the "PH" termination state.

The PH termination state

A first pass of re-reading HAProxy's documentation did not lead to any additional hints. So we thought, when the documentation fails us, let's read the source. Tracing our way backwards from the logging code we found the constants corresponding to the log line we were seeing.

#define SF_ERR_SRVCL    0x00005000   /* server closed (connect/read/write error) */
#define SF_FINST_H  0x00030000  /* stream ended during server headers */

Unfortunately, there were far too many places in the code where these were being set and after a few hours we decided there's too many possible exit paths that lead to a PH termination state.

Divide and Conquer

The next phase of debugging started with eliminating custom settings from our haproxy.cfg to see whether they may be the culprit. To reliably test for HAproxy's stability, we fired 100k requests in parallel at around 40 req/s using curl and a little bash and tmux magic against it. After six iterations of carefully commenting sections from the config, we could nail the issue down to the offending rspadd lines that were responsible for adding CORS headers.

Quite surprised, we further looked at the documentation and found this gem:

Using "rspadd"/"rspdel"/"rsprep" to manipulate request headers is discouraged
in newer versions (>= 1.5).

Was there a bug hidden in rspadd and that's why it's discouraged? We didn't know, but sure enough replacing the rspadd lines using http-request set-header appeared to fix the intermittent HTTP 502 response issue. That's until we started a final test hitting HAProxy with > 100 req/s and saw the issue appearing again.

HAProxy Buffer Overflow Handling

Since we now knew that our CORS header processing was at fault, we started digging for more information in the code, which finally lead us to the tune.maxrewrite config setting.

Sets the reserved buffer space to this size in bytes. The reserved space is
used for header rewriting or appending. The first reads on sockets will never
fill more than bufsize-maxrewrite.

Finally, we had a hypothesis that could explain the mysteriously failed requests. Under increased load of the system, the buffer used for request headers runs danger of overflowing due to our header rewriting.

HAProxy detects this situation and does the only right thing it can do: "gracefully" abort the connection to the backend server. It does not write memory after the end of the buffer, which would make for a classic buffer overflow bug. Disaster averted.

Since Swift's API uses a large number of Headers, our CORS header are very large (> 1 KB) accordingly, which makes this issue more likely to occur as well. It also seems that http-request set-header is quite a bit faster than rspadd, which is why the issue occurred less when switching to the former. This is more an example of a classic producer/consumer issue where the consumer (HAProxy in this case) isn't fast enough to complete header rewriting before it can forward the request to the client.

To fully eliminate the issue, we set tune.maxrewrite 4096 (4 KB). After using the Swift setup this way for private and public beta, we can now finally say that our HAProxy setup for OpenStack Swift is rock solid. For your reference, here's our final config:

You like going deep and fixing stuff? We're always looking for great engineers! Check out our Job Openings.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.