mauro-stettler

Monday, February 18, 2013

Splitting one MongoDB ReplicaSet into two

The Problem

We are running a Cluster of 4 MongoDB servers. They are all joined into one replicaset, called "rs1", which holds a lot of user data. All of the data is organized in some UserID oriented structure.
The load of these servers is getting higher and higher, most of it is caused by the write IO, generated by thousands of updates per second. Additionally are the disks filling up. The project manager is complaining almost daily that some page loads take too long because the MongoDB responds slowly to the queries from PHP.

The Solution

Since most of the load on these servers is caused by write, and not by read queries, adding more servers to the cluster will not really solve the problem, because every node in a replicaset replicates every update. So we decided that the only sensible thing to do is to introduce sharding. 
To shard our data would be relatively easy, because all of it's queries and updates are by UserID anyway. So basically we could distribute all the data by UserID % 2 into two replicasets, and then decide inside the client code (PHP), based on the UserID, to which replicaset each query has to go. 
Since, in our case, its so simple to implement sharding on the client side (PHP), we decided to not use MongoDBs own sharding feature. It would only be an additional point of failure and add complexity, without giving us an advantage.


By implementing this configuration, the amount of updates that goes to each data node is only 50% of the previous configuration, while the amount of reads per node is not going to change. The amount of disk space used is also reduced by almost half (except the oplog). 
The arbiter nodes are necessary because every MongoDB replicaset needs to have at least 3 active nodes to perform elections, but they don't hold any data and can basically run on any tiny machine without consuming resources.

Coming up with a Migration procedure

So we thought we have the perfect solution for our problem, but it seems that MongoDB replicasets are not really made for being split into multiple replicasets. We also weren't successful in googling for other people who did a similar things.
Of course it would be possible to mongodump all data, break the replicaset, create two new ones, and then mongorestore half of the data into each replicaset. But these tools take many hours to work on big amounts of data (>200G in our case), and during the time of the migration the whole cluster would need  to go offline. We can't afford to have many hours of downtime on this cluster, so we were looking for the migration procedure that involves the least downtime possible, no matter how nice or ugly the procedure itself is. 
Coming up with this solution took quite some research time, lots of trial and error in the lab, and the resulting procedure is not exactly simple.
A short explanation of the migration would be as follows. We remove two nodes from the replicaset, change their configs to become a new one, start them as new replicaset, add an arbiter to each of the replicasets to fullfill the requirement of having 3 nodes, change the clientcode to implement sharding, and then remove the unused data from each replicaset.
I'm sharing the exact process of how we did it here in the hope to save somebody else the research. In total it includes around 12 steps, of which I'm going to describe each in detail.
This whole procedure should be done during a time when the cluster is less busy than the maximum peak time. Altough there is -no downtime- (yay!), the resources which are available to handle queries are decreased by 50% during some time of the migration.

Preparing a Test Environment

To demonstrate the procedure I'm first creating a test environment of 4 data nodes and join them into a replicaset called "rs1".
mst@mst-gentoo-nb ~/ $ for i in 1 2 3 4; do mongod --dbpath ~/mongodb/instance4 --nojournal --port 27047 --logpath ~/mongodb/instance4/mongo.log --logappend --rest --replSet rs1 --oplogSize 128 --maxConns 20 --fork; done 
forked process: 14736
all output going to: /home/mst/mongodb/instance1/mongo.log
child process started successfully, parent exiting
forked process: 14780
all output going to: /home/mst/mongodb/instance2/mongo.log
child process started successfully, parent exiting
forked process: 14824
all output going to: /home/mst/mongodb/instance3/mongo.log
child process started successfully, parent exiting
forked process: 14868
all output going to: /home/mst/mongodb/instance4/mongo.log
child process started successfully, parent exiting
Now I join all of them into the replicaset "rs1". To do that I connect to one of them, create the replicaset config, and do an rs.initiate(conf).
mst@mst-gentoo-nb ~/ $ mongo --port 27017
MongoDB shell version: 2.2.3
connecting to: 127.0.0.1:27017/test
> conf={
...         "_id" : "rs1",
...         "version" : 1,
...         "members" : [
...                 {
...                         "_id" : 0,
...                         "host" : "localhost:27017"
...                 },
...                 {
...                         "_id" : 1,
...                         "host" : "localhost:27027"
...                 },
...                 {
...                         "_id" : 2,
...                         "host" : "localhost:27037"
...                 },
...                 {
...                         "_id" : 3,
...                         "host" : "localhost:27047"
...                 }
...         ]
... }
> rs.initiate(conf)
{
        "info" : "Config now saved locally.  Should come online in about a minute.",
        "ok" : 1
}
To be sure that during the migration procedure no data gets lost, I will add some testdata into the replicaset.
 rs1:PRIMARY> use mydb1
 switched to db mydb1
 rs1:PRIMARY> db.createCollection("mycol1")
 { "ok" : 1 }
 rs1:PRIMARY> db.mycol1.insert({"_id":0, "key1":"value1"})
 rs1:PRIMARY> db.mycol1.insert({"_id":1, "key2":"value2"})
 rs1:PRIMARY> db.mycol1.insert({"_id":2, "key3":"value3"})
 rs1:PRIMARY> db.mycol1.insert({"_id":3, "key4":"value4"})

Each migration step in detail

Prepare 2 new arbiter nodes

We will need two arbiter nodes because each replicaset needs to have at least 3 nodes to hold elections. Currently we have only 4 active nodes to split into 2 replicasets. So we will add 1 arbiter to each of the 2 replicasets to reach the total of 3 nodes.
In the test environment I'll simply create a new empty directory for each of them to use as their dbpath.
mst@mst-gentoo-nb ~/mongodb $ mkdir arb1 arb2

Stop data node 3 and 4

It's best to not do this during the busiest peak times, because the amount of read queries that can still be handled is now drastically reduced.
mst@mst-gentoo-nb ~/mongodb $ cat instance3/mongod.lock instance4/mongod.lock 
14824
14868
mst@mst-gentoo-nb ~/mongodb $ kill 14824 14868

Add 1 new arbiter node to the currently running cluster

1 of the 2 new arbiter nodes can now be started and added into the currently running cluster.
mst@mst-gentoo-nb ~/ $ mongod --dbpath ~/mongodb/arb1 --nojournal --port 27057 --logpath ~/mongodb/arb1/mongo.log --logappend --rest --replSet rs1 --oplogSize 128 --maxConns 20 --fork                                              
forked process: 15988
all output going to: /home/mst/mongodb/arb1/mongo.log
child process started successfully, parent exiting
Then I connect to the current primary and add the arbiter.
rs1:PRIMARY> rs.addArb("localhost:27057")
{ "ok" : 1 }

Deploy a version of the client code

To make sure that during the next few steps the nodes 3 and 4 are not being used by the client, in our case PHP, we need to deploy a new version of the code which only uses the node's 1 and 2. For doing that we simply change its configuration to use only the IPs of these 2 nodes, instead of all 4, as replicaset "rs1".

Remove node 3 and 4 from the replicaset configuration

The datanode 3 and 4 are currently still in the config of replicaset "rs1". They will simply not be used because the replicaset detects them to be down. In the future, these 2 nodes will become the datanodes for the new replicaset "rs2", so we can remove them from the config of "rs1" on the primary node.
rs1:PRIMARY> rs.conf()
{
        "_id" : "rs1",
        "version" : 2,
        "members" : [
                {
                        "_id" : 0,
                        "host" : "localhost:27017"
                },
                {
                        "_id" : 1,
                        "host" : "localhost:27027"
                },
                {
                        "_id" : 2,
                        "host" : "localhost:27037"
                },
                {
                        "_id" : 3,
                        "host" : "localhost:27047"
                },
                {
                        "_id" : 4,
                        "host" : "localhost:27057",
                        "arbiterOnly" : true
                }
        ]
}
rs1:PRIMARY> rs.remove("localhost:27037")
rs1:PRIMARY> rs.remove("localhost:27047")

Start node 3 and 4 without the --replSet parameter

Next we are going to manually edit the definition of the replicaset inside node 3 and 4. For doing that we want to start them without any replicaset being active, so we just remove the --replSet parameter.
for i in 3 4; do mongod --dbpath ~/mongodb/instance${i} --nojournal --port 270${i}7 --logpath ~/mongodb/instance${i}/mongo.log --logappend --rest --oplogSize 128 --maxConns 20 --fork; done
forked process: 24482
all output going to: /home/mst/mongodb/instance3/mongo.log
child process started successfully, parent exiting
forked process: 24492
all output going to: /home/mst/mongodb/instance4/mongo.log
child process started successfully, parent exiting
Ok, the node 3 and 4 are started. Now we need to store a definition of the new replicaset "rs2" into their local.system.replset. Let's first look at what's there already.
> use local
switched to db local
> db.system.replset.find()
{ "_id" : "rs1", "version" : 2, "members" : [   {       "_id" : 0,      "host" : "localhost:27017" },   {       "_id" : 1,       "host" : "localhost:27027" },   {       "_id" : 2,      "host" : "localhost:27037" },   {       "_id" : 3,       "host" : "localhost:27047" },   {       "_id" : 4,      "host" : "localhost:27057",     "arbiterOnly" : true } ] }
Obviously thats still the old config, including node 3 and 4. That's because at the moment when we removed them, they were down, so they couldn't replicate this change yet.
Build the config for rs2 and store it into the variable "conf". Note that port 27067 is the port which the second new arbiter will use once it is started.
> conf = { "_id" : "rs2", "version" : 2, "members" : [
... { "_id" : 0, "host" : "localhost:27037" },
... { "_id" : 1, "host" : "localhost:27047" },
... { "_id" : 2, "host" : "localhost:27067", "arbiterOnly" : true }
... ]}
{
        "_id" : "rs2",
        "version" : 2,
        "members" : [
                {
                        "_id" : 0,
                        "host" : "localhost:27037"
                },
                {
                        "_id" : 1,
                        "host" : "localhost:27047"
                },
                {
                        "_id" : 2,
                        "host" : "localhost:27067",
                        "arbiterOnly" : true
                }
        ]
}
Insert the new config into db.system.replset
> db.system.replset.insert(conf)
Finally remove the old config for the replicaset "rs1"
> db.system.replset.remove({"_id":"rs1"})
These steps have to repeated exactly the same way for each of node 3 and 4.

Activate the replicaset on node 3 and 4

Stop node 3 and 4 again
mst@mst-gentoo-nb ~/mongodb $ cat instance3/mongod.lock instance4/mongod.lock 
14824
14868
mst@mst-gentoo-nb ~/mongodb $ kill 14824 14868
Restart them the with --replSet parameter. Make sure that --replSet specifies the name of the new replicaset, which in our case is "rs2". Now they will try to resync the replicaset, but realize that the new arbiter isn't reachable yet.
for i in 3 4; do mongod --dbpath ~/mongodb/instance${i} --nojournal --port 270${i}7 --logpath ~/mongodb/instance${i}/mongo.log --logappend --rest --replSet rs2 --oplogSize 128 --maxConns 20 --fork; done
forked process: 24947
all output going to: /home/mst/mongodb/instance3/mongo.log
child process started successfully, parent exiting
forked process: 25002
all output going to: /home/mst/mongodb/instance4/mongo.log
child process started successfully, parent exiting

Start the second arbiter node

Now the node 3 and 4 are using their new replicaset "rs2", so they are already trying to connect to the second arbiter node. As soon as we start it on an empty dbpath, it should automatically join "rs2" and synchronize the config with node 3 and 4.
mst@mst-gentoo-nb ~/ $ mongod --dbpath ~/mongodb/arb2 --nojournal --port 27067 --logpath ~/mongodb/arb2/mongo.log --logappend --rest --replSet rs2 --oplogSize 128 --maxConns 20 --fork
forked process: 25840
all output going to: /home/mst/mongodb/arb2/mongo.log
child process started successfully, parent exiting
By doing rs.status() on either node 3 or 4 we can see that they have automatically added the new arbiter after a few seconds
rs2:PRIMARY> rs.status()
{
        "set" : "rs2",
        "date" : ISODate("2013-02-17T16:29:12Z"),
        "myState" : 1,
        "members" : [
                {
                        "_id" : 0,
                        "name" : "localhost:27037",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 92,
                        "optime" : Timestamp(1361117147000, 1),
                        "optimeDate" : ISODate("2013-02-17T16:05:47Z"),
                        "lastHeartbeat" : ISODate("2013-02-17T16:29:10Z"),
                        "pingMs" : 0,
                        "errmsg" : "syncing to: localhost:27047"
                },
                {
                        "_id" : 1,
                        "name" : "localhost:27047",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 92,
                        "optime" : Timestamp(1361117147000, 1),
                        "optimeDate" : ISODate("2013-02-17T16:05:47Z"),
                        "self" : true
                },
                {
                        "_id" : 2,
                        "name" : "localhost:27067",
                        "health" : 1,
                        "state" : 7,
                        "stateStr" : "ARBITER",
                        "uptime" : 66,
                        "lastHeartbeat" : ISODate("2013-02-17T16:29:10Z"),
                        "pingMs" : 0
                }
        ],
        "ok" : 1
}
Tadaa! We have created a functioning second replicaset, which still contains all of the data that "rs1" contained.

Deploy the final client code

Since we now have 2 running replicasets which both contain all the data, we can deploy the client code that implements sharding. This code simply does a UserID % 2 before every mongo query, and based on the result it decides if it should use "rs1" or "rs2". As soon as the new code is active, only half of the data in each of the replicasets will be used.
To verify that all the data is still there, I connect to "rs1" and query it.
rs1:PRIMARY> use mydb1
switched to db mydb1
rs1:PRIMARY> db.mycol1.find()
{ "_id" : 0, "key1" : "value1" }
{ "_id" : 1, "key2" : "value2" }
{ "_id" : 2, "key3" : "value3" }
{ "_id" : 3, "key4" : "value4" }
Same on "rs2"
rs2:PRIMARY> use mydb1
switched to db mydb1
rs2:PRIMARY> db.mycol1.find()
{ "_id" : 0, "key1" : "value1" }
{ "_id" : 1, "key2" : "value2" }
{ "_id" : 2, "key3" : "value3" }
{ "_id" : 3, "key4" : "value4" }

Remove the unused half of data from both replicasets

Now we can safely remove the unused half of each replicaset. The implementation of this should be done by the client side. In our case it was really easy to implement, because the PHP was aware of which data has to be on which replicaset, so we could simply go through all UserIDs and remove them from the replicaset where they are not used.
In order to not cause any additional load we did the remove very slowly over around 3 months of time. Each night, when most of the users are sleeping anyway, and the db wasn't so busy, around 50k datasets were removed.
If at some time in the future we run into similar trouble again, with too much load caused by updates, we can even repeat the same procedure. We would have to first double the amount of hardware, grow the two replicasets into the new hardware, and then repeat this procedure on each of the replicasets.

Conclusion

The result of this whole migration was as expected. The load on each of the data nodes has decreased a lot, because the amount of updates has decreased by half. The speed of the updates has also increased, so the new performance numbers make the project manager happy.

Thursday, August 18, 2011

PECL-Memcache standard loadbalancer for Nginx

In a previous post i have already introduced my Nginx loadbalancer, which is compatible with the PECL-Memcache consistent hashing. Unfortunately we discovered that the implementation of the consistent hashing on the PHP side is quite inefficient since it rebuilds the hashring on every request, and in the case of our PHP framework even for each Memcache server on each request. So we had to decide between a Memcache proxy like Moxi to implement the consisten hashring, or just forget about the advantages of a consistent hash and save the additional hop of a moxi. The conclusion was to just use the standard PECL-Memcache loadbalancer, since this is probably the most resource efficient way to implement a cache, and then just live with having to regenerate some caches in case we add/remove Memcache instances.
So in order to make the Nginx find cache values which have been stored by PHP into Memcache, i wrote another Nginx loadbalancer which is simply copying the behaviour of the PECL-Memcache standard loadbalancer.
Example:
 70 upstream profilememcache {
 71     hash_key $memcached_key;
 72     server 10.20.0.27:11211;
 73     server 10.20.0.26:11211;
 74 }

-----

 570         location /profile {
 570         set $memcached_key $uri;
 571         error_page      500 404 405 = /fallback_php;
 572         memcached_pass  profilememcache;
 573         }
In the above example you see that the memcached_pass can be used just as always and in the upstream definition you simply need to add the parameter "hash_key" together with based on what the loadbalancer should do the hashing on.
Unfortunately its really important that the order of the Memcache servers in the upstream definition is the same as the order in which the Memcache servers are being added to the Memcache object in the PHP code. If the positions of the servers are not the same the whole thing won't work.

Wednesday, June 22, 2011

PHP session parser in production

Last year I have posted about the Nginx ngx_http_php_session module that i wrote and which is able to parse php sessions and extract certain values from them. Unfortunately this took quite long time, but now i can finally announce that i am actually using this on a production system and so far it seems stable.
In this current case where I'm using the module in production now, it is used to optimize the way how Nginx is using the Memcaches. Since quite long time I already had a Memcache in production which is fed plain HTML by PHP and then read out and delivered by the Nginx. This works well as long as the content is more or less static and independent on the user which accesses it, and exactly this second requirement is now not necessary anymore. Using the "ngx_http_php_session" module the Nginx can check if a requesting user has a valid session, and if he does, it can check for things like payment status, gender, his preferences or whatever your site content is depending on. After this is known to the Nginx, the correct cache can then be retrieved accordingly. As a result there are many more pages that are cachable in a very efficient way.
Here i show an example how this module is used to decide if the Nginx should use the profile cache key for paying users or the one for non-paying users.
location /

    if ($cookie_session_id = "") {
        rewrite . /login redirect;
    }

    eval $session {
        set $memc_key $cookie_session_id;
        memc_pass session_memcache;
    }

    php_session_parse $authenticated $session "symfony/user/sfUser/authenticated";

    if ($authenticated != "b:1") {
        rewrite . /login redirect; #the user is logged out
    }

    php_session_parse $is_paying $session "symfony/user/sfUser/attributes|s:10:\"subscriber\";s:9:\"is_paying\"";
    php_session_parse $user $session "symfony/user/sfUser/attributes|s:10:\"subscriber\";s:11:\"getNickname\"";

    php_session_strip_formatting $stripped_user $user;

    if ($is_paying = "i:1") {
        rewrite . profile_cache$uri:paying last; # the user is paying
    }

    rewrite ^/(.+)$ $1;
    set $tmp_compare "$stripped_user:$uri";

    if ($tmp_compare ~* "^(.*):\1$") {
        rewrite . profile_cache$uri:paying last; # the user is viewing his own profile
    }

    rewrite . profile_cache$uri:non-paying last; # the user is non-paying
}

location profile_cache {
    default_type    text/html;
    add_header      "Content" "text/html; charset=utf8";
    lower           $uri_low $uri;
    set $memc_key $uri_low;
    memcached_next_upstream error timeout not_found;
    error_page      500 404 405 = /fallback_to_php;
    memc_pass  profilememcache;
}
In the previous example you can also see the directive "php_session_strip_formatting". This is used because a string which is extracted from a PHP session is formatted, like for example "s:7:astring". By using the "php_session_strip_formatting" directive the actual string "astring" will be extracted and the formatting will be stripped away.
Another directive from this example is "lower". This directive is coming from another module that i wrote with the name "lower_upper_case". It simply uppercases or lowercases a string in the Nginx config, and stores the result in another variable. In the above example I'm using it in to make the memcache keys case insensitive. To achieve this i just lowercase all keys on the Nginx, and also on the PHP side when it is filling the Memcache.
Both modules can be retrieved from github where I'm hosting the repositories:
https://github.com/replay/ngx_http_lower_upper_case
https://github.com/replay/ngx_http_php_session

Tuesday, July 27, 2010

Generate secured download links on the fly in Nginx

I created a module which can be used to generate secure download links directly in the Nginx. Those links can then be verified by the NginxHttpSecureDownload module.
I call the new module NginxHttpGenerateSecureDownloadLinks and it is supposed to _only_ be used over the Nginx SSI module. This works as in the following example:
37         location / {
 38             ssi on;
 39             root   html;
 40         }
 41 
 42        location /gen_sec_link {
 43             internal;
 44             rewrite /gen_sec_link(.*)$ $1 break;
 45             generate_secure_download_link_expiration_time 3600;
 46             generate_secure_download_link_secret $remote_addr;
 47             generate_secure_download_link_url $uri;
 48             generate_secure_download_link on;
 49         }
If you now access for example /test.html and the output should contain any secured download links, then test.html should contain SSI tags like in this example:
1 thisisatest
2 <a href="<!--# include virtual="/gen_sec_link/this_is_my_link" -->">this_is_my_link</a>
3 some text abc
4 <a href="http://somewhateverhost.com<!--# include virtual="/gen_sec_link/this_is_another_link" -->">this_is_another_link</a>
5 more text
If you payed attention you might already be expecting that those ssi tags are getting replaced by secured links, the html output will look like this:
thisisatest
<a href="http://www.blogger.com/this_is_my_link/509325bc5fac6e4e42687fe096d67a9d/4C4EC7C3">this_is_my_link</a>
some text abc
<a href="http://somewhateverhost.com/this_is_another_link/badbcb4d20500cca464c609da41001b2/4C4EC7C3">this_is_another_link</a>
more text
I will give you the link to the module here. But please keep in mind that i uploaded this today and its NOT tested well, i'm glad about feedback tough.

Thursday, June 3, 2010

Nginx does not know about the DNS names of specified backend servers?

I did already mention this problem in one previous post, still I would like to talk about it one more time because it kind of bothers me.
While writing an Nginx loadbalancer module I discovered that when you define a backend server using the server directive in an upstream block in the config file, even if you specify the upstream by using a DNS name, Nginx doesn't seem to keep the information of the actual DNS name anywhere available. It only keeps the resolved IPs after it did the name resolution once on startup.
The reason why this is a problem in my case is because I write a loadbalancer which should always find exactly the same backend server as the PHP-Memcache module load balancer does. Since the PHP-Memcache loadbalancer decides based on a hash of the actual strings that got passed to the addServer method, it is crucial for my Nginx loadbalancer to be able to do the same. Which means that if somebody in the Nginx config specifies an IP i will have to hash the string of the IP, if somebody enters the hostname of the backend server i will have to hash the hostname.
To make the whole thing work as I wanted I had to do what I really hated to do, I created a DNS supporting and a non-DNS branch of my loadbalancer module on github. While my neckhairs stood on end just by the thought of what I am going to do, I added a patch for the upstream module to the DNS aware branch which doesn't do more than adding a string to the ngx_http_upstream_server_t struct:
typedef struct {
 +    ngx_str_t                        name;
     ngx_addr_t                      *addrs;
     ngx_uint_t                       naddrs;
     ngx_uint_t                       weight;
     ngx_uint_t                       max_fails;
     time_t                           fail_timeout;
 
     unsigned                         down:1;
     unsigned                         backup:1;
 } ngx_http_upstream_server_t;
Plus I added one line to the ngx_http_upstream_server function which stores the DNS name of this backend into the above added ngx_str_t:
+    us->name = u.url;
    us->addrs = u.addrs;
    us->naddrs = u.naddrs;
    us->weight = weight;
    us->max_fails = max_fails;
    us->fail_timeout = fail_timeout;
After applying this patch on the nginx source, it required only minor changes to the loadbalancer module to make it use the DNS names instead of resolved IPs. What I don't understand is why the author of the upstream module decided that he should spare those few bytes and not store the actual DNS names that are specified in the config file.
However, hereby I would like to solicit the author of the upstream module or whoever has the power to do this to add this simple ngx_str_t to the ngx_http_upstream_server_t.
Thanks

Tuesday, February 23, 2010

Variables from PHP sessions in Nginx config

Recently I struggeled over this really cool Nginx module mod_eval:
http://github.com/vkholodkov/nginx-eval-module

It allows you to store responses from Nginx upstreams (backends) into variables which can be reused inside the Nginx configuration syntax. This offers a whole lot of new possibilities.
So it made me think what I could use that for...

One problem that we are often facing is that we would like to move more things from PHP into Nginx, since the Nginx simply is waaay more efficient than our PHP framework. After considering a lot of possible enhancements this module could give us, I came to the conclusion that it would be really useful if the Nginx knew details about the users, which usually only the PHP knows about.

The Nginx itself can already extract cookie values and store them into $cookie_* variables, there already is a Memcache module for the Nginx which i can use in combination with the mod-eval to retrieve Memcache values, combined I can use those modules to do following:

  • Get the users session id out of his session_id cookie using the Nginx internal header parsing
  • Retrieve the users serialized PHP session out of Memcache using the Nginx Memcache module
  • Store the serialized PHP session into an Nginx variable using the mod_eval

Now the only piece missing is a parser for the serialized PHP sessions, so thats what i did.
Unfortunately the Nginx configuration syntax doesn't support multidimensional array structures, only simple variables. So I couldn't implement this thing in a way which really represents the whole session. I had to implement it as some kind of string scanner which takes a search path that has to be extracted from the serialized multi dimensional session array.

I guess that sounds quite complicated now... Well, I can't say it isn't, but an example should help:

In PHP i stored this structure into my session
$_SESSION['symfony/user/sfUser/attributes'] => Array
  (
    [users_dynamic] => Array
      (
        [get_last_online_state] =>
        [update_counter_time] => 1266041164
      )
    [subscriber] => Array
      (
        [user_actual_culture] => de
        [lastURI] => http://dev.poppen.lab/frontend_dev.php/home
        [invisibility] => 0
        [getGender] => m
      )
  )

My goal is to extract the users gender, so I specify the search path
symfony/user/sfUser/attributes|s:10:"subscriber";s:9:"getGender"
The return value will then use the PHP serialize syntax like
s:1:"m"
which means it is of type string, has length 1, and value "m".

And now the same thing in the Nginx:

location / {
  eval $session {
    # store the retrieved memcache value into the variable $session

    set $memcached_key $cookie_session_id; # extract the value of the cookie session_id
    # extract the value of the cookie "session_id" and use it as memcache key

    memcached_pass 1.2.3.4:11211;
    # get the serialized session from memcache
  }

  php_session_parse $result $session "symfony/user/sfUser/attributes|s:10:\"subscriber\";s:9:\"getGender\"";
  # extract the gender from the serialized session in $session and store the return value into $result

  if ($result = "s:1:\"w\"")
  {
    # for girls
  }

  if ($result = "s:1:\"m\"")
  {
    # for boys
  }

  # for logged out users and transsexuals
}
This whole thing seems to be working quite well for me and I can't find problems with it at the moment, but I have to admit that we don't have it running in any production environment yet, with emphasis on yet. Once its running in some prod env I will post again about what we used it for, and if it works or not, not necessarily in that order.

For the ones who dare to take the risk already, I'd be glad about comments, usecases and murder threats of admins who got fired because I killed their production sites: