'Is it OK to specify a schema in `table_name_prefix`?

TL;DR: Is it OK to specify a schema in table_name_prefix?

We have a large Rails application that is not quite a traditional multi-tenant app. We have a hundred clients, all supported by one app, and that number will never grow more than 1-2 per year. Currently, every client has their own Postgresql database.

We are addressing some infrastructure concerns of having so many distinct databases...most urgently, a high number of simultaneous database connections when processing many clients' data at the same time.

The app is not visible, even to clients, so a lot of traditional multi-tenant web site philosophies don't apply here neatly.

  • Each tenant has a distinct Postgres database, managed in database.yml.
  • Each database has a schema, named for the tenant.
  • We have a model specific to each tenant with notably different code.
  • Each model uses establish_connection to select a different database and schema.
  • Each model uses a distinct table_name_prefix with the client's unique name.

The tables vary extensively for each tenant. There is no hope or desire to normalize the clients together. Clients are not provisioned dynamically -- it is always a new code release with migrations.

We intend to move each of the client schemas into one database, so fewer distinct connection pools are required. The unique names we currently have at the database, schema, and table names mean there is no possibility of name collisions.

We've looked at the Apartment gem, and decided it is not a good fit for what we're doing.

We could add all hundred schemas to schema_search_path, so all clients could share the same connection pool and still find their schema. We believe this would reduce our db connection count one-hundred-fold. But we're a bit uneasy about that. I've found no discussions of how many are too many. Perhaps that would work, and perhaps there would not have a performance penalty finding tables.

We've found a very simple solution that seems promising, by adding the schema in the table_name_prefix. We're already setting this like:

def self.table_name_prefix
  'client99_'
end

Through experimenting and looking within Rails 4 (our current version) and Rails 5 source code, this works to specify the schema ('tenant_99') as well as the traditional table prefix ('client99') :

def self.table_name_prefix
  'tenant_99.client99_'
end

Before that change, queries looked like this:

SELECT COUNT(*) FROM 'client99_products'

After, they include the schema, as desired:

SELECT COUNT(*) FROM 'tenant_99.client99_products'

This seems to answer our needs, with no downsides. I've searched the Interwebs for people encouraging or discouraging this practice, and found no mention of it either way.

So through all this, here are the questions I haven't found definitive answers for:

  • Is there a concern of having too many schemas listed in schema_search_path?
  • Is putting a schema name in table_name_prefix okay?


Solution 1:[1]

To address your concerns in reverse order:

  • Is putting a schema name in table_name_prefix okay?

There are no problems with this just as long as the names are unique(internal and external).

  • Is there a concern of having too many schemas listed in schema_search_path?

The answer is maybe, any non-fully qualified request(asking for a table by name only) will have to search each of the schemas in the order listed in schema_search_path If it is cached in memory there is little penalty; an on-disk search of all schemas will be slow(proportional to its location in the list.) Be sure to list the most active schemas first.

A fully qualified request should take no longer than the separated database solution.

Assuming all of your calls are fully qualified, this technique should provide the full advantages of connection pooling, when possible.

Remember that connection pooling only minimizes the overhead of the setup and tear-down times of the connections, by taking advantage of "gaps" during communication.

For example:

  • You have four clients and three of them are making near constant requests, you will still have four connections to the server, even with pooling.

  • The advantage comes when you have four clients each utilizing a quarter of the resources, pooled over a single connection.

The underlying(excluding connection overhead) database utilization will remain the same, whether pooling with a single database or separate connections to separate databases.

The drawback/advantage to combining the databases into a single one is this: it is not possible to move individual databases to another server for load balancing purposes outside of PostgreSQLs methods for load balancing.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Strom