Google wants to add an analytical turbo to PostgreSQL


Google has added a cloud database to its portfolio yet again. He presented the AlloyDB preview at the virtual Google I/O conference taking place May 11-12.

The American group has not yet communicated a date for general availability.

AlloyDB is an on-demand managed database built on the open source PostgreSQL RDBMS. Google already has several cloud services that support PostgreSQL, including Cloud Spanner and Cloud SQL for PostgreSQL.

“AlloyDB fills an important gap in Google’s database offering,” said Carl Olofson, analyst at IDC. “It is a fully relational DBMS capable of performing both analysis and transaction, as well as mixed operations which we call analytical transaction processing at IDC.”

Google is also integrating its Vertex AI service with AlloyDB to allow users to use machine learning directly with the database.

Combine analytical and transactional processing

Suppliers have been embracing this trend for a few years. While IDC refers to this capability as ATP, Forrester Research refers to the concept of translytic databases, while vendors like PingCAP refer to it as “hybrid transactional and analytical processing.” Google Cloud uses the diminutive HTAP.

Carl Olofson considers AlloyDB’s functionality to stand out from Google’s other database offerings, including Cloud Spanner and BigQuery. He finds BigQuery ideal for querying large tables. Spanner, aims to offer to run distributed processing on databases deployed in multiple cloud regions. Finally, AlloyDB is cut out for the HTAP approach.

As to why Google decided to build yet another database that supports PostgreSQL, Andi Gutmans, General Manager and VP of Engineering, Databases at Google Cloud, explains in an interview that this offering stems from demand from clients. A speech that we have already heard from the manager when he worked at AWS.

According to Andi Gutmans, Cloud SQL for Potgres customers are happy with the service, but need more security, performance, scalability, and availability.

To meet these needs, the manager mentions the fact that AlloyDB is compatible with the open source version of PostgreSQL. According to Andy Gutmans, this will allow customers to adopt it and migrate their workloads more easily.

Also according to the manager, Google focuses on the quality of the service, its performance and its scalability.

In addition to the distributed architecture, AlloyDB benefits from the cluster-scale file system, Colossus. But it already propels Spanner, Cloud SQL or FireStore.

Mechanisms to differentiate AlloyDB

Although it seems common to a good number of services, the decoupling of compute and storage as well as the multizone replication system are specific to AlloyDB. This is the means chosen by the cloud giant to counter the big limitation of PostgreSQL. Namely that at some point, to extend the DBMS implementation, administrators create read-only copies of the database. Although standard, this technique lengthens the failover time and causes latencies.

Instead of copying the database, Google Cloud proposes to add several read-only replica instances in support of the main DBMS instance in charge of query processing.

This architecture relies on a distributed storage layer across a cloud region including a service to write WALs (write ahead logs or INSERT/UPDATE/DELETE change set information) from the main DBMS instance to a store. low latency. From this log store, WAL log processing services produce database “blocks” placed in a regional and sharded storage space. These blocks represent the state of the content of the DBMS at a time T. These operations are replayed very quickly with each modification of data. In the event of a planting of the primary node or the fall of a block of data, these can be “reserved” at the main node and at the replicas, in order to avoid the loss of information. According to GCP, this optimizes I/O paths, creating replicas without multiplying copies unnecessarily.

To read data into PostgreSQL, the primary instance and the replicas use an in-memory buffer, shared between the different processes to store tables, indexes and execution plans. With its architecture, GCP asserts that there is no need for the database to interact with the storage layer, until a block of storage is dropped. And if the requests are too heavy for this buffer, Google Cloud has added an additional cache layer at the DBMS level.

This architecture supposed to be more efficient does not solve all the disadvantages of PostgreSQL. By default, the DBMS stores data online. However, column-oriented data storage remains more efficient for analytical uses. It must speed up queries and compress data, where the open source RDBMS struggles to hold the load in case of intensive use if it is not supported by a much more expensive infrastructure.

Fortunately, PostgreSQL is efficient in supporting extensions. GCP has developed a column-oriented accelerator (or engine). This includes storage space and an optimized query engine that promises performance “up to 100 times” better than a standard Postgres. It is this device that makes it possible to obtain HTAP capacities and to compete with the services of AWS, Oracle and Microsoft. For example, the Redmond firm supports the Citus project (since the acquisition of the eponymous startup in 2019), which joined Azure for PostgreSQL, and which offers almost the same functionalities. This is also the specialty of Swarm64, acquired by ServiceNow.

Looking to the future, Andy Gutmans said Google’s AlloyDB development team has many ideas on how to continue improving query processing and optimization.

“I think we’re off to a good start, but there are lots of other ideas about things we could do to make the customer experience even easier,” he says.

Leave a Comment