Ruby on Rails, Io, Lisp, JavaScript, Dynamic Languages, Prototype-based programming and more...

Technoblog reader special: $10 off web hosting by FatCow!

Monday, June 25, 2007

Advanced Concepts in Ruby on Rails Hosting Part II

Last week, we were discussing the analogy of serving websites being similar to running a translation company. A request would come in as a document, be handed to an application server as a translator, and returned to the client. We left off with a scenario of three translation offices with 10 translators in each office. One of the simplest methods to distribute work among these translators is to hand out documents one at a time in a round-robin way. However, due to inherit traits of certain documents being longer than others and certain translators being faster than others, backups build up for some of the translators, leading to a random lag and customer complaints.

Rather than the brute force method of adding more offices and translators, can you think of a better way to distribute resources?

The bottleneck in the scenario sketched above is management. Our translation company still has only one manager, thus limiting his ability to distribute resources more effectively. If we hire office managers and let the manager hand documents to the office managers, this lets us think of more interesting distribution techniques. For example, instead of overwhelming our translators with a growing pile of documents, and thus a growing pile of responsibilities, the office manager can wait until each translator has finished their job before handing them a new document.

Let us think about the consequences of this change. First some assumptions. Assume John is faster at translating than Susie because he has less on his mind (in computer lingo this would mean that Susie is experiencing a memory leak, possibly due to a bad programming library). Further assume a pile of documents comes in with this order: a 10 page document, a 2 pager, a 20 pager, a 1 pager, a 3 pager. In our original setup we could easily find ourselves in the situation where Susie gets a pile with the 10 pager, the 20 pager, and finally the 3 pager; whereas John only got the 2 pager and 1 pager. You can see that Susie's 3 pager should have been easy and fast, but was stuck behind a few bigger documents and is in the hands of the slower translator.

With the new distribution algorithm, the worst case scenario would be that Susie would be chugging away at the 20 pager, but since John quickly made chump change of the other documents, he can turn over the the 3 pager before Susie even finishes the 20 pager. This is much more streamlined because the queue was processed as quickly as the resources freed themselves up as a group, not relying on the individual translator to handle the concurrency.

The typical Rails setup of a reverse proxy handing requests to mongrel is not the most efficient use of the resources, so I built a load balancer I call drproxy which sits between the reverse proxy and the Rails dispatchers and queues up requests, handing them out in a more efficient way as each resource is freed. Furthermore, I build drproxy in Erlang, a language built from the ground up to excel at concurrency. Ruby is a slug when it comes to handling concurrency and multi-threaded environments. Erlang is like a Porsche.

There are, however, even more ways to make the system more efficient in an algorithmic way. Think about it for a while and I will tell you what I did next week.

You should follow me on twitter here.

Technoblog reader special: click here to get $10 off web hosting by FatCow!

Tuesday, June 19, 2007

Advanced Concepts in Ruby on Rails Hosting Part I

Let us imagine how a translation company starts out, lets call this company MOG Translation, Inc. At first, there might be one translator and one manager. The manager receives a document from a client and hands it to his translator. The translator might turn around the document in 1 hour, making its way back to the manager and then customer's hands. That is also the fundamentals behind hosting a website. In the most simple form, a web server acts like the manager: it takes in a request (http://mog.com/ for example) and hands it to the application server. The application server acts like a translator, it receives the request and turns it into HTML code that is passed back to the web server and shows up in your web browser.

Soon, MOG Translation, Inc. gets a good reputation and the translation documents come in faster than one per hour. Suddenly our poor lonely translator can't keep up and the papers keep piling up. We all know what to do: hire more translators. Now MOG Translation, Inc. has 10 translators on staff and the manager gets to pick how to hand out the work. One of the simplest ways to hand out documents is to pass them out one at a time to each translator. "One document for Nancy, One document for Drew ...", then when you get to the end of the line go back and starting handing out to Nancy again. Whenever the translators finish, they hand the translation back to the manager who makes sure it goes to the right client. In computer lingo, this system is called round-robin. You have one web server that distributes requests to 10 application servers in a round-robin way.

MOG Translation, Inc. gets such a good reputation that 10 translators is simply not enough. Unfortunately, the office is already a bit cramped so it is time for MOG to open a new branch, another office with another 10 translators. This is equivalent to adding a new server to handle more load. In order to make this change invisible to the customer, we do not want to change the face of our company, so the manager stays the same. But now when he gets to the end of the line in office 1, he faxes the next documents directly to the desks of the translators at office 2. This system is still round-robin, and works fine when adding even more offices.

This is the standard way that many web site's infrastructures grow. In order to handle more requests, they get more servers with more instances of the application running to handle more people visiting the site. The vast majority of web sites never need to grow beyond this point, however popular ones like MOG do. Imagine a talented manager who sends requests to three offices with 10 people in each office in a round-robin way. Problems arise because some documents are longer than others, and some translators are faster than others. This leads to congestion where even though the manager is handing the documents out evenly, some translators desks build up piles. When a pile builds up, even if a very easy document comes in, it might take a lot longer due to the other documents before it. Sure, you could just get more translators, but can you think of a better way to utilize the resources you already have?

Next week I will tell you what I did to better utilize our resources. For the more technical of my readers, the "manager" web server software we use is nginx, a very fast reverse proxy from Russia and the "translator" application software is mongrel which renders Ruby on Rails. Using a reverse proxy and mongrel is the canonical way to serve Rails web sites.

You should follow me on twitter here.

Technoblog reader special: click here to get $10 off web hosting by FatCow!


If you like this blog, you might also like top photography schools.