Avrorouter Tutorial

This tutorial is a short introduction to the Avrorouter, how to set it up and how it interacts with the binlogrouter.

The avrorouter can also be deployed directly on the master server which removes the need to use the binlogrouter. This does require a lot more disk space on the master server as both the binlogs and the Avro format files are stored there.

The first part configures the services and sets them up for the binary log to Avro file conversion. The second part of this tutorial uses the client listener interface for the avrorouter and shows how to communicate with the the service over the network.

Binlog-Avro Translator

Configuration

Preparing the master server

The master server where we will be replicating from needs to have binary logging enabled, the binary log format set to row based replication and the binary log row image needs to contain all the changed. These can be enabled by adding the two following lines to the my.cnf file of the master.

binlog_format=row
binlog_row_image=full

You can find out more about replication formats from the MariaDB Knowledge Base

Configuring MaxScale

We start by adding two new services into the configuration file. The first service is the binlogrouter service which will read the binary logs from the master server. The second service will read the binlogs as they are streamed from the master and convert them into Avro format files.

# The Replication Proxy service
[replication-service]
type=service
router=binlogrouter
router_options=server-id=4000,
               master-id=3000,
               binlogdir=/var/lib/maxscale/binlog/,
               mariadb10-compatibility=1,
user=maxuser
passwd=maxpwd

# The Avro conversion service
[avro-service]
type=service
router=avrorouter
source=replication-service
router_options=avrodir=/var/lib/maxscale/avro/,
               filestem=binlog

# The listener for the replication-service
[replication-listener]
type=listener
service=replication-router
protocol=MySQLClient
port=4000

# The client listener for the avro-service
[avro-listener]
type=listener
service=avro-service
protocol=CDC
port=4001

You can see that the source parameter in the avro-service points to the replication-service we defined before. This service will be the data source for the avrorouter. The filestem is the prefix in the binlog files and the additional avrodir router_option is where the converted Avro files are stored. For more information on the avrorouter options, read the Avrorouter Documentation.

After the services were defined, we added the listeners for the replication-service and the avro-service. The CDC protocol is a new protocol added with the avrorouter and, at the time of writing, it is the only supported protocol for the avrorouter.

Preparing the data in the master server

Before starting the MaxScale process, we need to make sure that the binary logs of the master server contain the DDL statements that define the table layouts. What this means is that the CREATE TABLE statements need to be in the binary logs before the conversion process is started.

If the binary logs contain data modification events for tables that aren't created in the binary logs, the Avro schema of the table needs to be manually created. There are two ways to do this:

All Avro file schemas follow the same general idea. They are in JSON and follow the following format:

{
    "Namespace": "MaxScaleChangeDataSchema.avro",
    "Type": "record",
    "Name": "ChangeRecord",
    "Fields":
    [
        {
            "Name": "name",
            "Type": "string"
        },
        {
            "Name":"address",
            "Type":"string"
        },
        {
            "Name":"age",
            "Type":"int"
        }
    ]
}

The avrorouter uses the schema file to identify the columns, their names and what type they are. The Name fiels contains the name of the column and the Type contains the Avro type. Read the Avro specification for details on the layout of the schema files.

All Avro schema files for tables that are not created in the binary logs need to be in the location pointed by the avrodir router_option and must use the following naming: <database>.<table>.<schema_version>.avsc. For example, the schema file name of the test.t1 table would be test.t1.0000001.avsc.

Starting MariaDB MaxScale

The next step is to start MariaDB MaxScale and set up the binlogrouter. We do that by connecting to the MySQL listener of the replication_router service and executing a few commands.

CHANGE MASTER TO MASTER_HOST='172.18.0.1',
       MASTER_PORT=3000,
       MASTER_LOG_FILE='binlog.000001',
       MASTER_LOG_POS=4,
       MASTER_USER='maxuser',
       MASTER_PASSWORD='maxpwd';

START SLAVE;

This will start the replication of binary logs from the master server at 172.18.0.1:3000. For more details about the details of the commands, refer to the Binlogrouter documentation.

After the binary log streaming has started, the avrorouter will automatically start converting the binlogs into Avro files.

For the purpose of this tutorial, create a simple test table using the following statement and populated it with some data.

CREATE TABLE test.t1 (id INT);
INSERT INTO test.t1 VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);

This table will be replicated through MaxScale and it will be converted into an Avro file, which you can inspect by using the maxavrocheck utility program.

[markusjm@localhost avrodata]$ ../bin/maxavrocheck test.t1.000001.avro
File sync marker: caaed7778bbe58e701eec1f96d7719a
/home/markusjm/build/avrodata/test.t1.000001.avro: 1 blocks, 1 records and 12 bytes