routing data-source

I am having a challenge in my project – this is a Saas (Software As A Service) project. The “software” is called “Bime” to serve the Bi-Medical lab, it is a J2EE web application with typical crud operations to manage lab members, research activities. It is all done. One thing about this Bime application is it was designed with a single lab in mind – there is no “lab” table or “lab-id” in data model. This is partially intentionl because 1)it makes the data model much simpler 2)the clients take the security seriously and they dont want “their” data to mess with others 3)it is an option for clients to export their whole database to own their own copy of the software.

Anyway, we end up with one MySql database per client (N databases on same MySql server with client name as database name). Now to serve N clients(labs), one straightforward way is to have N tomcat servers, connected to N MySQL database; each client gets their own sub-domain like “lab1.bime.com”, “lab2.bime.com”, and we configure haproxy to forward requests to appropriate tomcat instance.

This works, except one problem…..the cost. The MySql is not problem, even to host N databases (the data is relatively small); It is the Tomcat or the J2EE container, specially the Memory. Even we dont predict huge traffic from those doctors and researchers, with all the good and free open source frameworks like Spring, Hibernate etc etc we will still need 512M memory to have decent performance (without out-of-memory-error). And Java hosting is NOT cheap. Take Slicehost.com as eample, one 512M slice is 38$ per month; With Amazon EC2, reserved instance, a small 1.7G memory, it is about 40$ per month. Basically one tomcat per client makes this project unsustainablecost-wise.

So, we need to make one application to be able to serve multiple clients (labs), connected to multiple MySql database.  It may be a shard thing or scalability thing, and we achieved it through leveraging the routing datasource feature of Spring.

In J2EE application, normally we define and use fixed data sources (url, user, password). But Spring has also a special “AbstractRoutingDataSource” – which can be used to decide dynamically a data source to use based on looking up any parameter. Its setting is like below

With help from UserRoutingDataSource, Spring can look up the key first, if it is “lab1” then use the “lab1DataSource” as underlying data source.

Now our new structure becomes

http://lab1.bime.com lab1 database

http://lab2.bime.com ——-> Haproxy/apache —–> bime on tomcat — lab2 database

http://lab3.bime.com lab3 database

Now we can use ONE bime application to serve multiple clients.

Below are what we do to achieve the routing data source.

1, set up a request filter (which will execute per request before the servlet). It will do one thing – find out the client name from the URL “lab1.bime.com”, and put it into a thread local variable. ) The thread local context is thread-safe. As long as our server stays with the one-thread-per-request model, we are safe here.

2, later in the web application code, when we are doing any database operation, the Spring/Hibernate will use the UserRoutingDataSource to get database connections. The UserRoutingDataSource will then get the client name from the thread local context and return it as “key”, and the appropriate “real” data source will be found.

In general, it is sort of cheating, but solve our problem.

Because we are talking about “data” here, we need to make sure it works as expected. We then do two things as safeguards

First, disable the Hibernate cache, esepcially the second-level cache.

Second, because we have “id” primary key in every table (auto-increment), we make sure the “id” of same table will begin with different value for different clients.

For example, for lab1, the “id” of table “Project” will begin from 1,000,000; for lab2, the “id” begin from 2,000,000

This way, when lab1 user is requesting http://lab1.bime.com/project?id=1000001, and “for some unknown reason”, the routing data source routes to lab2 data source, then,

the user will get a 500 error because there is no record in lab2 table with id 1000001.

There is still space to improve though. In above example application context file, we were hard-coding lab1, lab2 data sources and the routing table. That means, when

a new client register with our service, we have to update this file and maybe re-start the application too. Can we make it more automatic too?

Here is our solution –

First on our Linux server, we have below directory structure

Bime-Home/

Bime/

Lab1/

to store anything specific to lab 1

Lab2/

Lab3/

Basically for each client we have a dedicated directory for it under the “Bime-Home” root directory.

When a client register successfully to our service, we (the Bime Controller application) will create new directory for the client and create a file “jdbc.properties” under

the client directory. It has below content

jdbc.labName=test555

jdbc.driverClassName=com.mysql.jdbc.Driver

jdbc.url=jdbc:mysql://localhost/test555

jdbc.username=test555

jdbc.password=a94a8fe5ccb19ba61c4c0873d391e987982fbbd3

This is the data source file for this particular client “test555”.

Then we create a “DynDataSourceRegister” in our application which will be called after application starts. It will scan the directories under “Bime-Home”, and looking for

the file “jdbc.properties”. If it finds one, it will read and create a “data source bean” in Application Context with the parameters loaded from this file; It will then call the “UserRoutingDataSource” to add this new bean (along with its KEY) to the routing table. This way, we don’t need ever to change the application context file and all the client data sources will be loaded automatically when application starts.

To make this adding-data-source also happen in runtime, we create a new Spring MVC controller “AddDataSourceController”, to call the “UserRoutingDataSource” to add data source on the fly, by giving the client directory name.

So, assume one client “lab123” just registered on our web, and the “Bime Controller” web app will do below

1, create a directory “lab123” under “Bime-Home”

2, create jdbc.properties file

3, send http request to http://www.bime.com/AddDataSource?dir=lab123. And the new data source is automatically loaded to runtime. It will also be loaded automatically next time

when application restarts.

Advertisements

3 thoughts on “routing data-source

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s