Unboxing Stelo V6.1: MERGE Support
Stelo Data Replicator is known for robust, real-time data replication, and V6.1 is no exception. With additional features, V6.1 is poised to support evolving data management strategy long into the future. With over 30 years of experience, Stelo has developed best practices for moving information that guide us in every version update.
In the final installment of our three-part blog series, we’ll break down one more new feature in V6.1. For an overview of Stelo’s high-performance support for non-SQL destinations, read part one. For more detail on PowerShell scripting, support for Linux, and container-based deployment, read part two.
Here, we’ll cover MERGE SQL support. With MERGE SQL functionality, compressed change data can be delivered using micro-batches, which reduce latency and exploit Relational Database Management System (RDBMS) optimizations like column-store indexing.
Introduced in the 2003 SQL standard, MERGE is just starting to make its way into modern data management deployments. How does MERGE work? And what kinds of databases does MERGE best serve?
MERGE offloads logic from the client to the server.
MERGE SQL is a compound data manipulation language (DML) statement. It’s designed to provide compound statements based on predicates—expressions that insert, update and delete.
Traditionally, deletions, updates and insertions are individually issued, but with MERGE, actions depend on the state of the database at any given time. MERGE allows for selective decision-making. For example, if you try to insert a row and find that a row with the same characteristics already exists, MERGE will turn the statement into an update, rather than an insertion. Usually, if you try to update something that doesn’t exist, that’s an error. MERGE is intelligent enough for more complex responses to evolving conditions.
Zooming out, when faced with requests, either the server or the client can be tasked with the responsibility of figuring out what to do. If the client is making decisions, it may have to communicate with the database more than usual. More queries between the client and server creates more latency. A query might even become irrelevant before the system gets around to issuing a request (i.e., a race condition). Since MERGE offloads logic to the server, programming is simpler, and there’s a massive reduction in latency.
MERGE encourages the use of micro-batches.
By shifting decision-making to the server, MERGE incentivizes transferring a mass of data, called a micro-batch, rather than transferring data row-by-row. Micro-batches are usually 10 to 100 times faster to transfer than individual requests because the server can perform logic operations without having to communicate with the client about each row.
There’s an art to deciding how long it takes to accumulate enough changes to form an efficient micro-batch, and it’s largely dependent on application-based data replication requirements. At Stelo, most of our timers are set to fire every thirty seconds to three minutes, but we have customers who would prefer to set their micro-batches to fire every hour or even once or twice a day. As the change data capture applied component, we construct micro-batches and assess whether there’s enough data to send based on the customer’s needs. It's worth noting that extremely low latency applications are a better fit for DML support because sending micro-batches with a one-second delay doesn’t take advantage of the efficiency benefits of MERGE.
Since databases like Snowflake and PostgreSQL aren’t transactional, they’re not meant to handle large amounts of high-speed inserts, updates and deletions, but they can efficiently merge batches into existing tables. MERGE was made for these kinds of databases, but it’s considered an add-on for databases that are designed as query-based, large-scale data repositories, like SQL server and Oracle. That said, these databases can also benefit from Stelo micro-batch compression innovations in situations where the amount of data to be transferred and applied is greatly reduced. In that case, the result is reduced host resource utilization and improved latency, especially when transiting wide area networks (WAN).
MERGE plays better with column-store indexing.
Column-store indexing is a complex way for databases to organize information based on a simple concept: data should only be stored one time. For example, if a database stores a field for state of residence on every record, with column store indexing, each state might be assigned a number between 1 and 50. If you wanted to search for all records with state number 30 (i.e., Wisconsin), it’s a lot more efficient to sort numbers than state names because the strings are shorter. In this case, to perform an operation, there’d be a separate reference file that dictates which number corresponded with which state
In the absence of row-based indexing—the traditional way of implementing indexes—databases with column store indexing don’t handle massive individual updates well. Column-store indexing is advantageous for database manufacturers who are interested in data retrieval, but it poses problems for applications that involve a lot of updating, such as in change data capture. Combining the strengths of MERGE and individual DML operations allows the database to handle action-related optimizations on its own. That said, MERGE is better for applications that are focused on net results rather than reconstructing the exact sequences with which operations occurred.