Publisher-related Properties

The following table lists the properties related to publishers. They specify how data is to be processed as it is written to the target. These are specified in the bdglue.properties file.

Property Required Type Default Notes
bdglue.publisher.class Yes String bdglue2.publisher.console.ConsolePublisher This is the fully qualified class name (FQCN) of the class that will be called to Publish the data. These Encoders, and any that are custom built, implement the interface bdglue2.publisher.BDGluePublisher. Built-in options are:
        * bdglue2.publisher.console.ConsolePublisher : writes the encoded data to the console. Useful for smoke testing upstream configurations before worrying about actually delivering data to a target. Json encoding is perhaps most useful for this.
        * bdglue2.publisher.flume.FlumePublisher : delivers encoded data to Flume.
        * bdglue2.publisher.hbase.HBasePublisher : delivers data to HBase. The NullEncoder should be used for this publisher.
        * bdglue2.publisher.nosql.NoSQLPublisher : delivers to Oracle NoSQL. Use the AvroEncoder for the KV API, and NullEncoder for the Table API.
        * bdglue2.publisher.kafka.KafkaPublisher : delivers to Kafka. The AvroEncoder and JsonEncoder are perhaps most useful for this publisher. Note: this publisher uses an older Kafka API and is included for reasons of compatibility.
        * bdglue2.publisher.kafka.KafkaRegistryPublisher delivers to Kafka using the newer Kafka API. This publisher is also compatible with the Confluent “schema registry”, although interfacing with the registry is not strictly required to use this publisher.
        * bdglue2.publisher.cassandra.CassandraPublisher : delivers data to Cassandra. The NullEncoder should be used for this publisher.
        * bdglue2.publisher.bigquery.BigQueryPublisher : delivers data to Google’s BigQuery. The NullEncoder should be used for this publisher.
bdglue.publisher.threads No Integer 2 The number of publishers to run in parallel.
bdglue.publisher.hash No String rowkey Select the publisher thread to pass an encoded event to based on a hash of either the table name (“table”) or row key (“rowkey”). This is to ensure that changes made to the same row are always handled by the same publisher to avoid any sort of race condition.
bdglue.nosql.host No String localhost The hostname that we will connect to for NoSQL
bdglue.nosql.port No String 5000 The port number where the NoSQL KVStore is listening.
bdglue.nosql.kvstore No String kvstore The name of the NoSQL KVStore to connect to.
bdglue.nosql.durability No String WRITE_NO_SYNC The NoSQL durability model for these transactions. Options are: SYNC, WRITE_NO_SYNC, NO_SYNC.
bdglue.nosql.api No String kv_api Specify whether to use the kv_api or table_api when writing to Oracle NoSQL.
bdglue.kafka.topic No String goldengate The name of the Kafka topic that GoldenGate will publish to.
bdglue.kafka.batchSize No Integer 100 The number of Kafka events to queue before publishing. The default value should be reasonable for most scenarios, but should be decreased to a smaller value for low volume situations, and perhaps made larger in extremely high volume situations. This property only applies to the KafkaPublisher as batching is handled by that publisher directly. Use bdglue.kafka.producer.batch.size for the KafkaRegistryPublisher as batching is handled by the actual Kafka producer logic in that case.
bdglue.kafka.flushFreq No Integer 500 The number of milliseconds to allow events to queue before forcing them to be written to Kafka in the event that ‘batchSize’ has not been reached.
bdglue.kafka.serializer.class No String kafka.serializer.DefaultEncoder The serializer to use when writing the event to Kafka. The DefaultEncoder passes the encoded data received verbatim to Kafka in a byte-for-byte fashion. It is not likely that there will be need to override the default value.
bdglue.kafka.key.serializer.class No String kafka.serializer.StringEncoder The serializer to use when encoding the Topic “key”. It is not likely that the default value will need to be overridden.
bdglue.kafka.metadata.broker.list Yes String localhost:9092 “A comma-separated list of host:port pairs of Kafka brokers that may be published to. Note that this is for the Kafka broker, not for Zookeeper.”
bdglue.kafka.metadata.helper.class No String bdglue2.publisher.kafka.KafkaMessageDefaultMeta A simple class that implements the bdglue2.publisher.kafka.KafkaMessageHelper interface. Its purpose is to allow customization of message “topic” and message “key” behavior. Current built-in options are:
        * bdglue2.publisher.kafka.KafkaMessageDefaultMeta – writes all messages to a single topic specified in the properties file, and the key is the table name.
        * bdglue2.publisher.kafka.KafkaMessageTableKey – publishes each table to a separate topic, where the topic name is the table name, and the message key is a concatenated version of the key columns from the table in this format: /key1/key2/…
bdglue.kafka.request.required.acks No Integer 1 0 – write and assume delivery. Don’t wait for response (potentially unsafe); 1 – write and wait for the event to be accepted by at least one broker before continuing; -1 – write and wait for the event to be accepted by all brokers before continuing.
bdglue.cassandra.node No String localhost The Cassandra node to connect to.
bdglue.cassandra.batch-size No Integer 5 The number of operations to group together with each call to Cassandra.
bdglue.cassandra.flush-frequency No Integer 500 Force writing of any queued operations that haven’t been flushed due to batch-size after this many milliseconds
bdglue.cassandra.insert-only No Boolean false Convert update and delete operations to an insert. Note that the default key generated by SchemaDef may need to be changed to include operation type and timestamp if this is set to ‘true’.
bdglue.flume.host Yes String localhost The name of the target host that we will connect to.
bdglue.flume.port Yes Integer 5000 The port number on the host where the target is listening.
bdglue.flume.rpc.retries No Integer 5 The number of times to retry a connection after encountering an issue before aborting.
bdglue.flume.rpc.retry-delay No Integer 10 The number of seconds to delay after each attempt to connect before trying again.
bdglue.flume.rpc.type No String avro-rpc Currently only pertinent for Flume. Defines the type of event RPC protocol being used for communication. Options are avro-rpc and thrift-rpc. Avro is most common. Do not confuse avro RPC communication with avro encoding of data. Same name, different things entirely. One does not require the other.
bdglue.bigquery.dataset Yes String default_dataset The BigQuery dataset name to connect to.
bdglue.bigquery.batch-size No Integer 5 The size of the batch to commit. The default value is for testing. BigQuery wants a much larger number for production loads. Try 500 to start.
bdglue.bigquery.flush-frequency No Integer 500 The number of milliseconds to wait before forcing a write even if the specified batch size has not been reached.
bdglue.bigquery.insert-only No Boolean true True if we want to convert deletes and updates into inserts. Assumes that inclusion of operation type and timestamp has been specified in the properties. Note that the data streaming API used by BDGlue doesn’t currently support updates or deletes, so at present this value should always be set to ‘true’. Reconciliation of these opertaions should be done periodically downstream via an ETL job.