Encoder-related Properties

The following table lists the properties related to encoders. They specify how data is to be formatted, what meta-information about a record should be included, etc. These are specified in the bdglue.properties file.

Property Required Type Default Notes
bdglue.encoder.threads No Integer 2 The number of encoder threads to run in parallel.
bdglue.encoder.class Yes String bdglue2.encoder.JsonEncoder The fully qualified class name (FQCN) of the class that will be called to encode the data. These Encoders, and any that are custom built, implement the interface bdglue2.encoder.BDGlueEncoder. Built-in options are:
        * bdglue2.encoder.AvroEncoder : encode in an Avro formatted byte array
        * bdglue2.encoder.AvroGenericRecordEncoder : encode an instance of an Avro GenericRecord
        * bdglue2.encoder.DelimtedTextEncoder : encode in delimited text format
        * bdglue2.encoder.JsonEncoder : encode in JSON format
        * bdglue2.encoder.NullEncoder : does not encode the data. This is used when the publisher will not pass along the data as encoded, and instead will apply the data to the target “column-by-column”. Example targets that approach things this way include HBase, Oracle NoSQL Table API, Cassandra, and others.
bdglue.encoder.delimiter No Integer 1 Default is ^A (001). Enter the numeric representation of the desired character (i.e. a semicolon is 073 in octal, 59 in decimal).
bdglue.encoder.tx-optype No Boolean true Include the transaction operation type in a column in the encoded data. Note that this configuration must match the corresponding schemadef.tx-optype property in the schemadef.properties file.
bdglue.encoder.tx-optype-name No String txoptype The name of the column to populate the operation type value in. Note that this configuration must match the corresponding schemadef.tx-optype-name property in the schemadef.properties file.
bdglue.encoder.tx-timestamp No Boolean true Include the transaction operation type in a column in the encoded data. Note that this configuration must match the corresponding schemadef.tx-timestamp property in the schemadef.properties file.
bdglue.encoder.tx-timestamp-name No String txtimestamp The name of the column to populate the transaction timestamp value in. Note that this configuration must match the corresponding schemadef.tx-timstamp-name property in the schemadef.properties file.
bdglue.encoder.tx-position No Boolean true Include information pertaining to the position of this operation in the transaction flow. This is used to allow sorting of operations when they are occurring more frequently than the granularity of the tx-timestamp. Note that this configuration must match the corresponding schemadef.tx-position property in the schemadef.properties file.
bdglue.encoder.tx-position-name No String txposition The name of the column to populate the transaction position value in. Note that this configuration must match the corresponding schemadef.tx-position-name property in the schemadef.properties file.
bdglue.encoder.user-token No Boolean true Populate a field that will contain a comma delimited list of any user tokens that accompany the record in the form of “token1=value, token2=value, …”. This property must be the same as the corresponding schemadef.user-token property found for schemadef.
bdglue.encoder.user-token-name No String usertokens The name of the field that will contain the list of user-defined tokens. This property must be the same as the corresponding schemadef.user-token-name property found for schemadef.
bdglue.encoder.tablename No Boolean false Populate a field with the name of the source table. This will be the “long” table name in schema.table format.
bdglue.encoder.tablename-col No String tablename The name of the field to populate with the name of the source table.
bdglue.encoder.txid No Boolean false Populate a field with a transaction identifier.
bdglue.encoder.txid-col No String txid The name of the field to populate with the transaction identifier.
bdglue.encoder.replace-newline No Boolean false Replace newline characters found in string fields with another character. This is needed because newlines can cause problems in some downstream targets.
bdglue.encoder.newline-char No String <space> The character to substitute for newlines in string fields. The default is “ “ (a space). Override with another character if needed.
bdglue.encoder.json.text-only No Boolean true Whether or not to represent all column values as quoted text strings. When ‘true’, a numeric field would be represented as “ID”:”789”. When false, that same field would be represented as “ID”:789, (no quotes around the value), which allows the downstream JSON parser to know to parse this as a number.
bdglue.encoder.include-befores No Boolean false Include the before images representation of all columns when encoding an operation. This option is only supported for JSON encoding at this time and will be ignored by other encoders.
bdglue.event.header-optype No Boolean true Include the operation type in the Flume event header
bdglue.event.header-timestamp No Boolean true Include the transaction timestamp in the Flume event header.
bdglue.event.header-rowkey No Boolean true Boolean as to whether or not to include a value for the row’s key as a concatenation of the key columns in the event header information. HBase and NoSQL KV API need this. It is also needed if the publisher hash is based on key rather than table name.
bdglue.event.header-longname No Boolean true Boolean as to whether or not to include the “long” table name in the header. The long name is normally in the form of “schema.tablename”. FALSE will cause the “short” name (table name only) to be included. Most prefer the long name. HBase and NoSQL prefer the short name.
bdglue.event.header-columnfamily No Boolean true Boolean as to whether or not to include a “columnFamily” value in the header. This is needed for Hbase.
bdglue.event.header-avropath No Boolean false Boolean as to whether or not to include the path to the Avro schema file in the header. This is needed for Avro encoding where Avro-formatted files are created in HDFS, including those that will be leveraged by Hive.
bdglue.event.avro-hdfs-schema-path No String hdfs:///user/flume/gg-data/avro-schema/ The URI in HDFS where Avro schemas can be found. This information is passed along as the header-avropath and is required by Flume when writing Avro-formatted files to HDFS.
bdglue.event.generate-avro-schema No Boolean false Boolean on whether or not to generate the avro schema on the fly. This is really intended for testing and should likely always be false. It might be useful at some point in the future to use to support Avro schema evolution. Note that current built-in schema generation capabilities are not on par with those in schemadef.
bdglue.event.avro-namespace No String default The namespace to use in avro schemas if the actual table schema name is not present. The table schema name will override.
bdglue.event.avro-schema-path No String ./gghadoop/avro The path on local disk where we can find the avro schemas and/or where they will be written if we were to generate them on the fly.