2010/07/27

Generate secured download links on the fly in Nginx

I created a module which can be used to generate secure download links directly in the Nginx. Those links can then be verified by the NginxHttpSecureDownload module.
I call the new module NginxHttpGenerateSecureDownloadLinks and it is supposed to _only_ be used over the Nginx SSI module. This works as in the following example:
37         location / {
 38             ssi on;
 39             root   html;
 40         }
 41 
 42        location /gen_sec_link {
 43             internal;
 44             rewrite /gen_sec_link(.*)$ $1 break;
 45             generate_secure_download_link_expiration_time 3600;
 46             generate_secure_download_link_secret $remote_addr;
 47             generate_secure_download_link_url $uri;
 48             generate_secure_download_link on;
 49         }
If you now access for example /test.html and the output should contain any secured download links, then test.html should contain SSI tags like in this example:

If you payed attention you might already be expecting that those ssi tags are getting replaced by secured links, the html output will look like this:
thisisatest
<a href="http://www.blogger.com/this_is_my_link/509325bc5fac6e4e42687fe096d67a9d/4C4EC7C3">this_is_my_link</a>
some text abc
<a href="http://somewhateverhost.com/this_is_another_link/badbcb4d20500cca464c609da41001b2/4C4EC7C3">this_is_another_link</a>
more text
I will give you the link to the module here. But please keep in mind that i uploaded this today and its NOT tested well, i'm glad about feedback tough.

2010/06/03

Nginx does not know about the DNS names of specified backend servers?

I did already mention this problem in one previous post, still I would like to talk about it one more time because it kind of bothers me.
While writing an Nginx loadbalancer module I discovered that when you define a backend server using the server directive in an upstream block in the config file, even if you specify the upstream by using a DNS name, Nginx doesn't seem to keep the information of the actual DNS name anywhere available. It only keeps the resolved IPs after it did the name resolution once on startup.
The reason why this is a problem in my case is because I write a loadbalancer which should always find exactly the same backend server as the PHP-Memcache module load balancer does. Since the PHP-Memcache loadbalancer decides based on a hash of the actual strings that got passed to the addServer method, it is crucial for my Nginx loadbalancer to be able to do the same. Which means that if somebody in the Nginx config specifies an IP i will have to hash the string of the IP, if somebody enters the hostname of the backend server i will have to hash the hostname.
To make the whole thing work as I wanted I had to do what I really hated to do, I created a DNS supporting and a non-DNS branch of my loadbalancer module on github. While my neckhairs stood on end just by the thought of what I am going to do, I added a patch for the upstream module to the DNS aware branch which doesn't do more than adding a string to the struct ngx_http_upstream_server_t:
typedef struct {
 +    ngx_str_t                        name;
     ngx_addr_t                      *addrs;
     ngx_uint_t                       naddrs;
     ngx_uint_t                       weight;
     ngx_uint_t                       max_fails;
     time_t                           fail_timeout;
 
     unsigned                         down:1;
     unsigned                         backup:1;
 } ngx_http_upstream_server_t;
Plus I added one line to the ngx_http_upstream_server function which stores the DNS name of this backend into the above added ngx_str_t:
+    us->name = u.url;
    us->addrs = u.addrs;
    us->naddrs = u.naddrs;
    us->weight = weight;
    us->max_fails = max_fails;
    us->fail_timeout = fail_timeout;
After applying this patch on the nginx source, it required only minor changes to the loadbalancer module to make it use the DNS names instead of resolved IPs. What I don't understand is why the author of the upstream module decided that he should spare those few bytes and not store the actual DNS names that are specified in the config file.
However, hereby I would like to solicit the author of the upstream module or whoever has the power to do this to add this simple ngx_str_t to the ngx_http_upstream_server_t. Thanks

2010/02/23

Variables from PHP sessions in Nginx config

Recently I found this really cool Nginx module mod_eval: http://github.com/vkholodkov/nginx-eval-module
It allows you to store responses from Nginx upstreams (backends) into variables which can be reused inside the Nginx configuration syntax. This offers a whole lot of new possibilities. So it made me think what I could use that for... One problem that we are often facing is that we would like to move more logic from PHP into Nginx, since the Nginx simply is waaay more efficient than our PHP framework. After considering a lot of possible enhancements this module could give us, I came to the conclusion that it would be really useful if the Nginx knew details about the users, which usually only the PHP knows about.
The Nginx itself can already extract cookie values and store them into $cookie_* variables, there already is a Memcache module for the Nginx which i can use in combination with the mod-eval to retrieve Memcache values, combined I can use those modules to do following:
  • Get the users session id out of his session_id cookie using the Nginx internal header parsing
  • Retrieve the users serialized PHP session out of Memcache using the Nginx Memcache module
  • Store the serialized PHP session into an Nginx variable using the mod_eval
Now the only piece missing is a parser for the serialized PHP sessions, so thats what i wrote. Unfortunately the Nginx configuration syntax doesn't support multidimensional array structures, only simple variables. So I couldn't implement this thing in a way which really represents the whole session. I had to implement it as some kind of string scanner which takes a search path that has to be extracted from the serialized multi dimensional session array. I guess that sounds quite complicated now... Well, I can't say it isn't, but an example should help:
In PHP i stored this structure into my session
$_SESSION['symfony/user/sfUser/attributes'] => Array
  (
    [users_dynamic] => Array
      (
        [get_last_online_state] =>
        [update_counter_time] => 1266041164
      )
    [subscriber] => Array
      (
        [user_actual_culture] => de
        [lastURI] => http://dev.poppen.lab/frontend_dev.php/home
        [invisibility] => 0
        [getGender] => m
      )
  )
My goal is to extract the users gender, so I specify the search path symfony/user/sfUser/attributes|s:10:"subscriber";s:9:"getGender" The return value will then use the PHP serialize syntax like s:1:"m"which means it is of type string, has length 1, and value m.
And now the same thing in the Nginx:
location / {
  eval $session {
    # store the retrieved memcache value into the variable $session

    set $memcached_key $cookie_session_id; # extract the value of the cookie session_id
    # extract the value of the cookie "session_id" and use it as memcache key

    memcached_pass 1.2.3.4:11211;
    # get the serialized session from memcache
  }

  php_session_parse $result $session "symfony/user/sfUser/attributes|s:10:\"subscriber\";s:9:\"getGender\"";
  # extract the gender from the serialized session in $session and store the return value into $result

  if ($result = "s:1:\"w\"")
  {
    # for girls
  }

  if ($result = "s:1:\"m\"")
  {
    # for boys
  }

  # for logged out users
}
This whole thing seems to be working quite well for me and I can't find problems with it at the moment, but I have to admit that we don't have it running in any production environment yet, with emphasis on yet. Once its running in some prod env I will post again about what we used it for, and if it works or not, not necessarily in that order.
For the ones who dare to take the risk already, I'd be glad about comments, usecases and murder threats of admins who got fired because I killed their production sites: http://github.com/replay/ngx_http_php_session

2010/02/21

PECL-Memcache compatible consistent hashing loadbalancer for Nginx

During the past year we often saved a lot of resources by adding caches into all kinds of layers of our infrastructure. Some of the most important ones are Memcaches which are getting filled by PHP and read by Nginx.

Now, as long as you have only one Memcache this works pretty well. The PHP stores values and the Nginx reads values, very simple... But what if you need to grow your Memcache cluster? You will run into the problem that the Nginx doesn't know where in the cluster the PHP stored values.
Our first solution to this problem was the Moxi Memcache proxy, which is a really great, recommandable and worth to be mentioned piece of software:
http://code.google.com/p/moxi/

The only solution which might be even more optimal than the Moxi would be if Nginx itself could predict where PHP would store a certain key/value. That kind of predictions are easy to implement if you use consistent hashing on the Memcache keys, a method that is already implement in the PECL-Memcache php module, but not activated by default.

Understanding consistent hashing

So the only thing that I had to do was to read through the code of the PHP Memcache module to check the details of their implementation and then implement the same as upstream balancer for Nginx.
Unfortunately there is some minor flaw in the implementation of the Nginx upstream module which makes it impossible for an upstream loadbalancer to detect if the parameter to the "server" directive in the Nginx configuration file is a DNS name or an IP. In the case of the consistent hashing this is important because the PECL-Memcache implementation uses exactly the given string to build the hashring, which means that the results differ based on if you use DNS names or IPs in your PHP codes.
I personally really hate to have to patch the Nginx upstream module, so i created two branches on github:
  1. The "master" branch only works on environments where the PHP uses IP's to connect to the Memcache servers, this branch doesn't require you to patch the Nginx.
  2. The "dns" branch requires you to patch the Nginx, but it also works if you use DNS names in your PHP code.

As far as I know this module is in prod on multiple websites and I've heard many feedbacks that its working well.

And that's where you get it from:

http://wiki.nginx.org/NginxHttpUpstreamConsistentHash
http://github.com/replay/ngx_http_consistent_hash

I have to warn you tough. The PECL Memcache implementation of the consistent hashing can add considerable overhead to the PHP servers. This overhead comes from the Memcache->addServer method which is rebuilding its whole internal hash ring structure every time when you add a server to the Memcache object. If you have to execute the addServer method very often you might want to consider caching the Memcache objects after adding the servers somehow (don't ask me how).