On this page

Pgpool-II for PostgreSQL add-ons

Pgpool-II is a tool available on application instances to help manage your connection pool and load balancing of your PostgreSQL add-ons.

What is Pgpool-II?

Pgpool-II is a proxy software that sits between PostgreSQL servers and a PostgreSQL database client.

You can read the official project page for more information.

Why use Pgpool-II?

Connection pooling: Pgpool-II maintains established connections to the PostgreSQL servers, and reuses them whenever a new connection with the same properties (i.e. user name, database, protocol version, and other connection parameters if any) comes in. It reduces the connection overhead, and improves the system’s overall throughput.

Load balancing: If a database is replicated (because running in either replication mode or native replication mode), performing a SELECT query on any server will return the same result. Pgpool-II takes advantage of the replication feature in order to reduce the load on each PostgreSQL server. It does that by distributing SELECT queries among available servers, improving the system’s overall throughput. In an ideal scenario, read performance could improve proportionally to the number of PostgreSQL servers. Load balancing works best in a scenario where a lot of users execute many read-only queries at the same time.

Automated fail over: If one of the database servers goes down or becomes unreachable, Pgpool-II will detach it and will continue operations by using the rest of the database servers. Some sophisticated features that help for automated failovers including timeouts and retries.

Limiting Exceeding Connections: PostgreSQL puts a limit on the amount of concurrent connections, so that new connections are rejected when this number is reached. Raising this maximum number of connections, however, increases resource consumption and has a negative impact on overall system performance. Pgpool-II also limits the number of concurrent connections, but extra connections will be queued instead of returning an error immediately. Or you can configure it to return an error when the connection limit is exceeded (4.1 or later).

In Memory Query Cache: In memory query cache allows to save a SELECT statement with its result. If an identical SELECT comes in, Pgpool-II returns the value from cache. Since neither SQL parsing nor access to PostgreSQL are involved, using in memory cache is extremely fast. On the other hand, it might be slower than the normal path in some cases, because it adds some overhead of storing cache data.

The various features are available in the official Pgpool documentation.

How to configure Pgpool-II

Generate direct variables for your PostgreSQL add-ons.

Go to the “Add-on dashboard” tab of your PostgreSQL add-ons and click on the “Generate direct hostname and port” button.

Enable Pgpool-II for your application

In order to configure Pgpool-II, the first thing to do is to link your PostgreSQL add-on. To do that, you can go to the Service Dependencies page of your application and select your PostgreSQL add-on.

Once linked, you can enable the Pgpool-II feature by defining the following environment variable: CC_ENABLE_PGPOOL=true. If you ever need to stop using Pgpool-II, you can remove this variable or set it to false.

Your application will use a Unix Domain Socket to connect to Pgpool-II. Unix domain sockets are faster than TCP sockets because there is no handshake and the connection is made locally on the same machine. A special environment variable will be injected to your environment variables: CC_PGPOOL_SOCKET_PATH. This variable contains the path to the Unix Domain Socket you have to connect to. See Usages below for some examples on how to use it.

The available Pgpool-II variables and their descriptions are available on our environment variables reference page.

Concurrent session and pool size

You can use the CC_PGPOOL_NUM_INIT_CHILDREN and CC_PGPOOL_MAX_POOL environment variables to set the number of concurrent sessions and the number of connection pool caches per connection.

Let’s take an example where you have 3 PostgreSQL servers, and the CC_PGPOOL_NUM_INIT_CHILDREN variable configured with a value of 32.

On startup, Pgpool-II will start 32 processes, one process per connection. The CC_PGPOOL_MAX_POOL variable is used to configure the number of cached connections per connection (process). This option is mostly used when you have different credentials for the same PostgreSQL server (database, user…). But in the case of Streaming mode on Clever Cloud, you have access to only one database, and the credentials are the same on all add-ons with binary replication. You should not need to change the default value of this variable (1 by default).

An example of the SHOW POOL_POOLS; command with CC_PGPOOL_NUM_INIT_CHILDREN=32 and CC_PGPOOL_MAX_POOL=1:

psql -U u5dh6v2ymn********** -d bjzfhkl5qc**********

bjzfhkl5qc**********=> show pool_pools;
 pool_pid |     start_time      | pool_id | backend_id |       database       |       username       |     create_time     | pool_backendpid |
----------+---------------------+---------+------------+----------------------+----------------------+---------------------+-----------------+
 3526     | 2021-06-02 15:22:12 | 0       | 0          | bjzfhkl5qc********** | u5dh6v2ymn********** | 2021-06-02 15:22:35 | 1520665         | 
 3526     | 2021-06-02 15:22:12 | 0       | 1          | bjzfhkl5qc********** | u5dh6v2ymn********** | 2021-06-02 15:22:35 | 1665071         | 
 3526     | 2021-06-02 15:22:12 | 0       | 2          | bjzfhkl5qc********** | u5dh6v2ymn********** | 2021-06-02 15:22:35 | 531483          | 
 3527     | 2021-06-02 15:22:12 | 0       | 0          |                      |                      |                     | 0               | 
 3527     | 2021-06-02 15:22:12 | 0       | 1          |                      |                      |                     | 0               | 
 3527     | 2021-06-02 15:22:12 | 0       | 2          |                      |                      |                     | 0               | 
 3528     | 2021-06-02 15:22:12 | 0       | 0          |                      |                      |                     | 0               | 
 3528     | 2021-06-02 15:22:12 | 0       | 1          |                      |                      |                     | 0               | 
 3528     | 2021-06-02 15:22:12 | 0       | 2          |                      |                      |                     | 0               | 
 ...      | ...                 | .       | .          |                      |                      |                     | .               | 

We can see the 32 available processes (3526, 3527, 3528…), and active connections for process 3526, with an open connection for each PostgreSQL server (1520665, 1665071 and 531483).

If a process remains inactive for more than 300 seconds, it will be deleted and a new process will take its place. This value can be configured with the CC_PGPOOL_CHILD_LIFE_TIME variable. This is a measure to prevent memory leaks and other unexpected errors in Pgpool-II children.

Cached connections for each process do not have a default expiration time, but you can set one if needed with the CC_PGPOOL_CONNECTION_LIFE_TIME variable.

It’s also possible to specify the lifetime of a Pgpool-II child process (CC_PGPOOL_CHILD_MAX_CONNECTIONS) and the time in seconds to disconnect a client if it remains inactive since the last request (CC_PGPOOL_CLIENT_IDLE_LIMIT).

Replication and load balancing

Use Pgpool-II to distribute the load and separate WRITE and READ queries between your PostgreSQL servers.

When using the Streaming mode, it’s not Pgpool-II that manages the replication of your databases. Replication must be set up on the PostgreSQL side, otherwise known as binary, Hot Standby or Streaming replication. To schedule this procedure, you can contact our support at support@clever-cloud.com or make a request via our sales form.

Once replication is in place, you can use the CC_PGPOOL_FOLLOWERS environment variable to add the follower(s) to your Pgpool-II configuration. This variable is in JSON format, and must contain the host, port and weight of each follower.

An example of the CC_PGPOOL_FOLLOWERS variable with two followers:

[
  {
    "hostname": "<HOST>",
    "port": "<PORT>",
    "weight": "1"
  },
  {
    "hostname": "<HOST>",
    "port": "<PORT>",
    "weight": "1"
  }
]

The weight is used to specify the load balance ratio of the backends, you can also configure the weight of the leader with the CC_PGPOOL_LEADER_WEIGHT variable.

Many other load balancing options are configurable with environment variables. You can refer to our environment variables reference page and the official Pgpool-II documentation for more information.

Health check

Pgpool-II periodically connects to the configured PostgreSQL backends to detect any error on the servers or networks. If an error is detected, Pgpool-II performs failover or degeneration depending on the configurations.

The health check is not activated by default, but you can activate it with the CC_PGPOOL_HEALTH_CHECK_PERIOD variable.

Many other health check options are configurable with environment variables, such as:

  • CC_PGPOOL_HEALTH_CHECK_TIMEOUT: Specifies the timeout in seconds to give up connecting to the backend PostgreSQL if the TCP connect does not succeed within this time.
  • CC_PGPOOL_HEALTH_CHECK_MAX_RETRIES: Specifies the maximum number of retries before giving up and initiating failover when health check fails.
  • CC_PGPOOL_HEALTH_CHECK_RETRY_DELAY: Specifies the amount of time (in seconds) to sleep before retrying after a failed health check retries.
  • CC_PGPOOL_CONNECT_TIMEOUT: Specifies the amount of time in milliseconds before giving up connecting to backend using connect() system call.

You can consult the official Pgpool-II documentation to get more information about the health check.

In memory cache

In memory cache saves the pair of SELECT statement and its result. If the same SELECT comes in, Pgpool-II returns the value from cache. This feature can be activated with the CC_PGPOOL_MEMORY_CACHE_ENABLED=on variable.

Many other in memory cache options are configurable with environment variables, such as:

  • CC_PGPOOL_MEMQCACHE_TOTAL_SIZE: Specifies the shared memory cache size in bytes.
  • CC_PGPOOL_MEMQCACHE_MAX_NUM_CACHE: Specifies the number of cache entries. This is used to define the size of cache management space.
  • CC_PGPOOL_MEMQCACHE_CACHE_BLOCK_SIZE: Specifies the cache block size.
  • CC_PGPOOL_MEMQCACHE_EXPIRE: Specifies the life time of query cache in seconds.
  • CC_PGPOOL_MEMQCACHE_AUTO_CACHE_INVALIDATION: Automatically deletes the cache related to the updated tables.
  • CC_PGPOOL_MEMQCACHE_MAXCACHE: Specifies the maximum size in bytes of the SELECT query result to be cached.
  • CC_PGPOOL_CACHE_SAFE_MEMQCACHE_TABLE_LIST: Specifies a comma-separated list of table names whose SELECT results should be cached by Pgpool-II.
  • CC_PGPOOL_CACHE_UNSAFE_MEMQCACHE_TABLE_LIST: Specifies a comma-separated list of table names whose SELECT results should NOT be cached by Pgpool-II.

You can consult the official Pgpool-II documentation to get more information about the in memory cache.

Scalability

When the scalability of your application is enabled, you may need to tweak the CC_PGPOOL_NUM_INIT_CHILDREN value. This is what we are going to see here.

Auto scalability will add or remove instances following your needs. This means that you may have multiple instances running at the same time for a certain period of time. If your application restarts, you will have your current instance number * 2 instances running in parallel while the deployment finishes and the old instances are stopped. A new deployment starts new instances alongsides the old ones.

This means that have up to maximum scalability * 2 instances can run at the same time. And if all of your instances open the maximum connections they are allowed to, it means there will be up to maximum scalability * 2 * CC_PGPOOL_NUM_INIT_CHILDREN connections at the same time. We will call this result MaxCon.

Example

Let’s say that I’m hosting a simple PHP application on 1 XS instance, with a scalability that can go up to 4 M instances. And I’m using the PostgreSQL S Small Space plan which has a connection limit of 125.

Now my application receives a lot of traffic and scales up to 4 M instances. But at the same time, I also need to deploy a hot fix. This means that 4 new M instances will be started, alongside the already existing 4 M instances. I need to make sure that MaxCon doesn’t go above 125.

Here is the summary:

  • 8 instances: 4 currently running, 4 currently deploying my hot fix
  • 125 max connections: the maximum number of connections of my PostgreSQL add-on’s plan
  • 5 other connections: I want to be able to use PG Studio or any CLI tool at the same time, in case I need it

CC_PGPOOL_NUM_INIT_CHILDREN = ( (125 - 5) / 8 ) = 15. If I set CC_PGPOOL_NUM_INIT_CHILDREN to 15, each instance should have enough connections to query the database without errors. And it also leaves me some spare connections (5) available if anything goes wrong and I need to connect to my database using a CLI tool or PG Studio.

Tips / debug

How can I access Pgpool-II from my application?

Connect via ssh to your application and use the psql command.

ssh -t ssh@sshgateway-clevercloud-customers.services.clever-cloud.com <app_id>
psql

How can I see the configured nodes?

You can use the SHOW POOL_NODES; command.

bjzfhkl5qc**********=> SHOW POOL_NODES;
 node_id |                         hostname                          | port | status | lb_weight |  role   | select_cnt | load_balance_node |
---------+-----------------------------------------------------------+------+--------+-----------+---------+------------+-------------------+
 0       | bjzfhkl5qc**********-postgresql.services.clever-cloud.com | 57** | up     | 0.333333  | primary | 0          | false             |
 1       | b3diajq7iy**********-postgresql.services.clever-cloud.com | 57** | up     | 0.333333  | standby | 2          | false             |
 2       | bmtwtemn40**********-postgresql.services.clever-cloud.com | 57** | up     | 0.333333  | standby | 1          | true              |

How can I see the Pgpool-II processes waiting for connections and processing a connection?

You can use the SHOW POOL_PROCESSES; command.

bjzfhkl5qc**********=> SHOW POOL_PROCESSES;
 pool_pid |     start_time      |       database       |       username       |     create_time     | pool_counter
----------+---------------------+----------------------+----------------------+---------------------+--------------
 21554    | 2021-06-03 11:28:36 | bjzfhkl5qc********** | u5dh6v2ymn********** | 2021-06-03 11:28:48 | 1
 11409    | 2021-06-02 19:54:53 |                      |                      |                     |
 11410    | 2021-06-02 19:54:53 |                      |                      |                     |
 21566    | 2021-06-03 11:29:20 |                      |                      |                     |

How can I see the list of pools managed by Pgpool-II?

You can use the SHOW POOL_POOLS; command.

bjzfhkl5qc**********=> SHOW POOL_POOLS;
 pool_pid |     start_time      | pool_id | backend_id |       database       |       username       |     create_time     | pool_backendpid |
----------+---------------------+---------+------------+----------------------+----------------------+---------------------+-----------------+
 21554    | 2021-06-03 11:28:36 | 0       | 0          | bjzfhkl5qc********** | u5dh6v2ymn********** | 2021-06-03 11:28:48 | 1549123         |
 21554    | 2021-06-03 11:28:36 | 0       | 1          | bjzfhkl5qc********** | u5dh6v2ymn********** | 2021-06-03 11:28:48 | 1695967         |
 21554    | 2021-06-03 11:28:36 | 0       | 2          | bjzfhkl5qc********** | u5dh6v2ymn********** | 2021-06-03 11:28:48 | 562378          |
 11409    | 2021-06-02 19:54:53 | 0       | 0          |                      |                      |                     | 0               |
 11409    | 2021-06-02 19:54:53 | 0       | 1          |                      |                      |                     | 0               |
 11409    | 2021-06-02 19:54:53 | 0       | 2          |                      |                      |                     | 0               |
 11410    | 2021-06-02 19:54:53 | 0       | 0          |                      |                      |                     | 0               |
 11410    | 2021-06-02 19:54:53 | 0       | 1          |                      |                      |                     | 0               |
 11410    | 2021-06-02 19:54:53 | 0       | 2          |                      |                      |                     | 0               |
 21566    | 2021-06-03 11:29:20 | 0       | 0          |                      |                      |                     | 0               |
 21566    | 2021-06-03 11:29:20 | 0       | 1          |                      |                      |                     | 0               |
 21566    | 2021-06-03 11:29:20 | 0       | 2          |                      |                      |                     | 0               |

How can I see the statistics of SQL commands?

You can use the SHOW POOL_BACKEND_STATS; command.

bjzfhkl5qc**********=> SHOW POOL_BACKEND_STATS;
 node_id |                         hostname                          | port | status |  role   | select_cnt | insert_cnt | update_cnt | delete_cnt |
---------+-----------------------------------------------------------+------+--------+---------+------------+------------+------------+------------+
 0       | bjzfhkl5qc**********-postgresql.services.clever-cloud.com | 57** | up     | primary | 0          | 0          | 0          | 1          |
 1       | b3diajq7iy**********-postgresql.services.clever-cloud.com | 57** | up     | standby | 2          | 0          | 0          | 0          |
 2       | bmtwtemn40**********-postgresql.services.clever-cloud.com | 57** | up     | standby | 1          | 0          | 0          | 0          |

Is it possible to use the PCP command?

Yes, the PCP tool is preconfigured and available on your application.

pcp_pool_status -h /tmp -U pcp -w

Attach or detach a given node to Pgpool-II:

pcp_pool_status -h /tmp -U pcp -w -n 0
pcp_pool_status -h /tmp -U pcp -w -n 0

Display the statistical data of the health check on the given node ID:

pcp_pool_status -h /tmp -U pcp -w -n 0 -v
Node Id                       : 0
Host Name                     : bjzfhkl5qc**********-postgresql.services.clever-cloud.com
Port                          : 57**
Status                        : up
Role                          : primary
Last Status Change            : 2021-06-02 19:54:53
Total Count                   : 0
Success Count                 : 0
Fail Count                    : 0
Skip Count                    : 0
Retry Count                   : 0
Average Retry Count           : 0.000000
Max Retry Count               : 0
Max Health Check Duration     : 0
Minimum Health Check Duration : 0
Average Health Check Duration : 0.000000
Last Health Check             :
Last Successful Health Check  :
Last Skip Health Check        :
Last Failed Health Check      :

You can find all PCP commands in the official Pgpool-II documentation.

Usage

PHP using PDO

Using PDO, you have to use the unix_socket option in your DSN:

<?php
// This variable is injected during deployment
$host = getenv("CC_PGPOOL_SOCKET_PATH");
// Get the database name from the environment
$database = getenv("POSTGRESQL_ADDON_DB");
// Get the database user from the environment
$user = getenv("POSTGRESQL_ADDON_USER");
// Get the database password from the environment
$password = getenv("POSTGRESQL_ADDON_PASSWORD");

$dsn = "pgsql:host=$host;dbname=$database;user=$user;password=$password";

try {
    $conn = new PDO($dsn);
    if($conn){
	echo "Successfully connected to $dbname!";
     }
} catch (PDOException $e){
     echo $e->getMessage();
}
?>

Wordpress

For WordPress, you can change the DB_HOST variable in your wp-config.php:

// To connect using a socket, the syntax is: `localhost:/path/to/socket`
define( 'DB_HOST', "localhost:" . getenv("CC_PGPOOL_SOCKET_PATH") );

Node.js

On Node.js, using the pg npm package:

const { Client } = require('pg');
client = new Client({
    // This variable is injected during the deployment
    host: process.env["CC_PGPOOL_SOCKET_PATH"],
    // Get the database name from the environment
    database: process.env["POSTGRESQL_ADDON_DB"],
    // Get the database user from the environment
    user: process.env["POSTGRESQL_ADDON_USER"],
    // Get the database password from the environment
    password: process.env["POSTGRESQL_ADDON_PASSWORD"],
});

client.connect();

Comments