DISCLAIMER: This post intended to aggregate research performed while evaluating various DB solution fitting specific requirements. While it may be a bit lengthy, I hope it would give the reader good starting point while deciding which path to take.
SkySQL was reviewed in more details, and this article presents in-depth review of “last-mile” considerations. The intent is not to force reader into particular solution, but to share all accumulated information.
While deciding about changes to the database layer at the core of your system many questions and concerns are being raised.
- Will my system perform as expected after migration?
- How much effort is required to migrate existing solution?
- Will it require serious code refactoring?
- How a new DB solution will fit existing CI/CD practices?
- What would be impact on product budget in the near future and a long term?
- Is it available on current cloud platform? Can it be cloud provider agnostic?
- Can my DevOps team support it and/or will require extensive additional training?
Team needs to take a few things into considerations to ensure a smooth transition and minimize downtime while gaining as much from the migration.
Let’s establish more narrow goals and assess our milestones:
- Evaluate a Database-as-a-Service vs database as an appliance in the cloud.
- Ensure High Availability for the database layer.
- Add Horizontal Scalability and Multi-Primary support.
- Minimize impact on existing application while minimizing adjustments to the code (in-place drop-in is ideal, i.e. keep standard connectivity to database through non-API DB access layer, if possible).
- Ability to integrate into existing CI/CD workload.
For purpose of our specific research a few additional requirements also apply:
- Existing solution uses MariaDB database engine with Primary and Replicas in place.
- Downtime should be no more than few hours.
Let’s review each requirement, its benefits, and then options available with MariaDB engine.
“Database-as-a-Service (DBaaS) is one of the fastest-growing cloud markets. According to Gartner, 75% of all databases will be running in the cloud by 2022. The first relational DBaaS offerings were created by cloud service providers – Amazon RDS, Microsoft Azure Database and Google Cloud SQL. These services started with MariaDB, MySQL and/or PostgreSQL.”
- Save money by combining hardware, hosting, software and support cost into one package.
- Rapid Provisioning of DB instances takes shorter time due to automation and streamlining of the associated activities.
- Outsourcing of administration and monitoring reduces on-premise IT team activity requirements and optimizes time to market and results in further cost savings.
- Enhanced Security created through “by-design” security practices helping with compliance and security breaches. Off-site hosting lowers complexity of on-site security requirements.
- Improved monitoring and tracking of time, space and resource consumption.
- Server Space requirements are reduced and optimized.
- Scalability is achieved through advanced provisioning via provider’s offering.
High availability / Always-On Architecture
“High availability (HA) is a characteristic of a system which aims to ensure an agreed level of operational performance, usually uptime, for a higher-than-normal period.”
To facilitate High Availability and Always-On Architecture number of considerations need to be taken.
MariaDB already provides several options (reference):
- MariaDB Replication – MaxScale’s MariaDB Monitor (mariadbmon) detects primary server failure and promotes the most up-to-date replica based on Global Transaction ID (GTID), waits for that replica to execute any transactions in its relay log, and begins routing queries to it.
- MaxScale’s Galera Monitor (galeramon) is used to minimize application impact upon server failure. Additionally, MaxScale may be used to assign primary and replica roles to database instances within a cluster to support read/write traffic splitting and to minimize the risk of certification failures.
- Multi-Node Enterprise ColumnStore – MaxScale’s MariaDB Monitor (mariadbmon) determines which node is the primary server, and automatically performs fail-over upon primary server failure.
- MariaDB Xpand – Xpand divides tables and indexes into slices, and it distributes those slices among all the nodes. Xpand maintains replicas of each slice, so that it can recover from a node failure without loss of data.
The replication approach is the most common one today and works well with non-distributed processing. However, it puts certain limitations on how architecture can scale horizontally.
Multi Master Architecture / Horizontal Scalability / Continuous Availability
Horizontal scalability: The ability to increase capacity of the server by connecting multiple hardware or software entities so that they work as a single logical unit. HS can be achieved with the help of clustering, distributed file system, and load balancing.
Continuous availability: The infrastructure (or the applications running on it) cannot be interrupted at all. Essentially, there is no allowance for any outage, either planned or unplanned.
This is where the multi-master approach (ex: MaxScale’s Galera Cluster – option #2 above) comes in.
“Galera Cluster turns the popular MySQL and MariaDB server into a multi-master replication enabled solution with an automated member provisioning and joining process and no-data-loss guarantee with a proper configuration”.
With the maturity of MariaDB Xpand Distributed SQL architecture, MaxScale can be substituted to achieve even greater MMA flexibility.
The concept of MMA is not new and is used with distributed AD controllers, multi-zone data centers, etc.
Here is an old introduction video that provides some top-level details about how Galera architecture extends the classical Replication approach – Galera Cluster for MySQL introduction – YouTube
Official information on Galera can be found here – https://mariadb.com/kb/en/what-is-mariadb-galera-cluster/
All MariaDB Galera features listed below are now rolled into the Enterprise cluster setup and are supported natively.
Let’s review 3 workflows:
MMR can be used to protect the availability of a mission-critical database. For example, a multi-master replication environment can replicate data in your database to establish a fail-over site should the primary site become unavailable due to system or network outages. Such a fail-over site can also serve as a fully functional database to support application access when the primary site is concurrently operational.
The database can be configured for automatic fail-over, which enables the main node to switch over to a different master site if the first master site fails.
Load Balancing and Multi-Master Replication
MMR is useful for transaction processing applications that require multiple points of access to database information with the following goals:
- Distributing a heavy application load.
- Ensuring continuous availability.
- Providing support for edge computing efficiency.
- Protect from geographical impacts.
The main goal of MMR is to ensure that data is available when and where you need it. The following describes several different scenarios that have different information delivery requirements.
Applications that have application load distribution requirements commonly include customer service-oriented applications.
What is MaxScale
MariaDB MaxScale is a “DB conductor” which enables database access orchestration and automatic load balancing and has the following characteristics:
- High availability, scalability, and security.
- Simplifying application development by providing unified direct client connections, native MariaDB look & feel.
- Transparent load balancer with support of “Write to primary, read through replicas” distribution.
- Automatic membership control, and primary recovery on primary loss.
When comparing various DB options, it is important to note that there could be simply a difference in DB engines that will drive your decision. As an example, while MySQL and MariaDB are close in theory, and MariaDB supports MySQL backward compatibility mode, engines are different and software must be tested to ensure a smooth transition between the two. Furthermore, after version 10.2 MariaDB is no longer considered a drop-in replacement for MySQL.
- Incompatibilities and Feature Differences Between MariaDB 10.4 and MySQL 8.0
- Benchmark: MariaDB vs MySQL on Commodity Cloud Hardware (provided by MariaDB / Sept 15, 2020).
- While Maria DB is younger product and some concerns could be raised about its future, many hosting providers are switching to MariaDB, and there is now much more diverse community behind it as compare to mySQL.
We have established the foundation, let’s look at what is offered in a cloud. There are more options available, but we keep the list short for the purposes of this research.
- Cloud SQL – mySQL / SQL Server / PostgeSQL – hosted in GCP by Google team. HA* options. No multi-master support. No Maria DB support.
- SkySQL MariaDB – hosted in GCP/AWS by MariaDB managed services. HA and MMS/MMR.
- AWS Aurora – mySQL / PostgeSQL – hosted in AWS by AWS team. HA and MMS.
- Azure Database for MariaDB/MySQL – MariaDB/MySQL – hosted in AWS by AWS team. HA enabled. Scalable.
- Azure Cosmos – fully managed NoSQL DB – hosted in Azure. HA and MMS.
- Google Cloud Spanner – non-traditional RDMS as data access requires API abstraction. HA enabled.
MariaDB/MySQL based DBaaS Solutions Side-by-Side
Cloud SQL Features
Cloud SQL’s second generation was announced on Aug 1, 2016, for general availability and introduced some changes not applicable in the previous iteration.
- Fully managed with 99.95% availability
- Based on MySQL 8.0.18 / 5.7 – slightly behind from official but not terribly (8.0.23 is officially available). As of time of this research, CloudSQL no longer includes information about current build used in official release notes.
- Reliable – SSAE 16, ISO 27001, and PCI DSS compliant and supports HIPAA compliance
- Offered by Google, but is lesser offering than SkySQL or Cloud Spanner.
- No Memcache support
- Read replicas are supported, but not as clustered solution. Application must maintain direct connection to a read replica and perform load balancing internally.
- CloudSQL High Availability configuration
Note: The HA standby instance cannot be used for read queries. Read replicas cannot have HA mode enabled.
HA is achieved by disk replication between Zones to a standby instance. In order for standby instance be brought online, engine on standby would be brought online and would process any pending binlog transactions which may take up to 3 minutes.
HA in both Amazon RDS and GCP CloudSQL implemented via Shared Storage topology (DISK level synchronous replication across different zones). Actually, on the failure of the Primary server watchdog will switch to the instance which is in the standby state. During boot, MariaDB/MySQL service will analyze binary logs and roll back all uncommitted transactions. So, that 2-3 minutes are the time required for boot and applying binary log to data. In my opinion, 2-3 minutes is a very optimistic time. Probably GCP is using Shared Storage topology because it is much cheaper (MaxScale analog – MySQL Router is available under a dual license, probably this obstacle don’t let to integrate it as a solution for cloud providers).
HA in SkySQL implemented via MaxScale. So it is not correct to compare CloudSQL and SkySQL side-by-side. Also additional MaxScale layer means that if it fails no matter how many replicas are in place, DB will stay unavailable until MaxScale instance is recovered. SkySQL now offers HA option for MaxScale now to address the issue.
SkySQL Features and Benefits
SkySQL is MariaDB’s answer to CloudSQL and builds on top of extensive enterprise knowledge the MariaDB team acquired over the years.
SkySQL is managed solution which expands MariaDB, MaxScale, and Galera Cluster offerings to the cloud and includes all features from individual products. It is available in GCP and AWS allowing for easy hosting migration.
The solution is managed by the SkySQL team – the professional services arm of MariaDB Corporation.
First announced on March 31, 2020. GA announced on May 12.
- MariaDB Enterprise Server – 10.4.14-8
- MaxScale – 2.5.3
- ColumnStore – 5.4.1
- Xpand – used in Distributed SQL configuration. Would resemble CloudSpanner in behaviour without API overhead
- Galera 4.0+ – offered as an option but can be overlooked by proper tuning of MaxScale cluster or via Distributed SQL configuration.
- Fully managed with 99.95% – 99.995% availability
- Cluster Nodes split across zones within the same region with DR nodes in different region if necessary.
- TLS 1.2/1.3 required for all connections
Please refer to the attached October 2020 SkySQL Announcement for additional details.
SkySQL usage is reviewed below in Transactional usage and offered in 4 configurations:
- Transactional Standalone – 1 node
- HA – primary and 2 replicas – 3 nodes
- Galera – 3 nodes
- Always up to date: SkySQL is kept current and updated to stay in line with current products available
- Cloud Distributed: SkySQL includes a fully distributed cloud data warehouse that now provides massively parallel processing (MPP) for scalability and high availability on large datasets.
- End-to-end security: All SkySQL databases are secure by default, built from the ground up to provide the ultimate security in the cloud. SkySQL now also enforces secure SSL/TLS connections for any database access, avoiding exposure of data due to insecure defaults or configuration choices. AD integration is available on request.
- Disaster Recovery: SkySQL combines nightly backups, persistent disks, point-in-time rollback and cross-region replication to minimize the recovery point and recovery time objectives (RPO/RTO) of mission-critical, production databases.
- Expanded monitoring: SkySQL monitoring shows the status and all vital metrics for database instances and is highly customizable. Health state of the nodes within defined scalability strategies.
- Radically reduced complexity for application development: SkySQL provides a single connection point for applications rather than exposing individual database instances, primaries or high availability replicas. SkySQL manages read/write-splitting, seamless fail-over and application session migration. The result is fault tolerance and efficient use of resources that is completely transparent to developers and end users.
- SkySQL is provided and supported by MariaDB team as a cloud solution
- Based on MariaDB Platform X5
- Runs on Kubernetes Containers and ServiceNow Workflow Engine
- Implements a reference cloud security architecture – ISO/IEC 27001:2013 certification achieved Feb 1, 2021
- No SOC 2 certification at this time, but artifacts could be generated per client’s request. SOC2 is targeted for EOY to align with ISO 27001 renewal
- Support for AD integration is included
- MariaDB Xpand for distributed SQL (youtube)
- Optionally can include DBA support as well – premium service available for engagement
SkySQL for Distributed SQL
Alternative to SkySQL cluster and Galera is SkySQL for Distributed SQL.
- Based on Xpand technology.
- Requires min 3 nodes.
- Single node failure without loss of data. Auto healing without intervention.
- Elastic scale-up/scale-down infrastructure topology.
SkySQL SLA Considerations
Official Service Downtime metrics and Service Credits
- 99.95% service availability in multi node configuration – ex: 22 min max over 30 days
- if exceeded 99.95% Service Credits will be applied per calculated monthly uptime percentage prorate.
- Excludes impact of GCP and GKE engine.
SkySQL Support Options
As presented in the Pricing section several tiers are available:
- Normal response time 4-8 hours for Standard tier.
- SkyDBA / Premium – in minutes / self managed / self driven – additional cost.
- DBA backup recovery – normal – severity 1 – 8 min avg, 30 min SLA – immediate action upon reporting / detection.
- GCP Kybernetes engine takes on recovery of the lost node and will rejoin the cluster – does not require ticket.
- Patches are applied every 3 months with 2 weeks advanced notice with ability to delay.
- Patches are first applied to replicas. Time can be agreed on.
- Testing instance can be spin off for advance evaluation.
- 7 days daily backup option is included. Backup retention policy can be extended for additional pass-through cost of GCS storage. Nearline storage class is used.
- AD authentication for access can be implemented during on-boarding.
SkySQL vs CloudSQL
SkySQL vs CloudSQL comparison provided by (by SkySQL team)
Light recap – read above for a more detailed comparison.
- CloudSQL is mySQL based vs MariaDB in SkySQL
- CloudSQL caps write throughput forcing to upgrade instance type.
- CloudSQL only allows Read replicas
- In addition to Primary/replica supported by CloudSQL, SkySQL offers Multi-master and Distributed SQL configuration based on MariaDB Enterprise Server nodes for transactional access.
- CloudSQL HA option is based on Hot/Spare two node configuration. While “Hot” node is alive, all traffic routed to it. Spare is used for auto healing when main node fails. Per documentation failover can take up 2 minutes to complete transition. May require transaction replay if caught in-flight.
SkySQL failover takes less than a second. No transaction replay if caught in-flight required.
- CloudSQL does not offer automatic load balancing.
Expect things to be slightly slower
While analyzing stats from your staging setup before making a final decision, do not overlook the fact that you are looking at Apples and Oranges. While the DBaaS engine may be a drop-in for your project and not many physical changes are required to keep it running, one thing is important – you are changing IT architecture. Even with GCP dedicated interconnect, you are still looking at virtualized environments and new considerations need to be made for network pipes length, edge networks, regions, and zones. Don’t be dissuaded, by making a few important, and maybe postponed, changes to your current topology and data patterns, you should be able to regain “lost” speed.
Said that after careful review and tuning comparable speeds were achieved. Adding to its ability to distribute processing load between nodes, the final result should be satisfactory.
This article makes attempt to give the reader enough information and references to make the best choice suitable for your application configuration while satisfying your CI/CD requirements and I hope you find it useful. Google Cloud, Azure, and AWS offer many different database options and it is important to choose a DBaaS solution that makes your project a success. Good luck.