SchemaDef Properties

The following table lists the properties that can be specified in the schemadef.properies file.

Property Required Type Default Notes
schemadef.jdbc.driver Yes String com.mysql.jdbc.Driver The fully qualified class name of the jdbc driver.
schemadef.jdbc.url Yes String jdbc:mysql://localhost/bdglue The connection URL for JDBC
schemadef.jdbc.username Yes String root The database user that we will connect as.
schemadef.jdbc.password Yes String prompt The database user’s password. If this property is set to the value “prompt”, SchemaDef will prompt the user to enter the password from the command line.
schemadef.jdbc.tables Yes String N/A A whitespace-delimited list of schema.table pairs that we should generate schema/ddl information for. More than one table may be specified per line, and a line may be continued by placing a backslash (‘\’) as the last character of the current line in the file.
schemadef.output.format No String avro The type of metadata / ddl to generate. Options are: avro, hive_avro, and nosql.
schemadef.output.path No String ./output The directory where we should store the generated files.
schemadef.numeric-encoding No String double How to encode numeric, non-integer fields (decimal, numeric types) in the schema: string, double, float.
schemadef.set-defaults No Boolean true Whether or not to set default values in the generated Avro schema.
schemadef.tx-optype No Boolean true Include the transaction operation type in a column in the encoded data. Note that this configuration must match the corresponding bdglue.encoder.tx-optype property in the bdglue.properties file.
schemadef.tx-optype-name No String txoptype The name of the column to populate the operation type value in. Note that this configuration must match the corresponding bdglue.encoder.tx-optype-name property in the bdglue.properties file.
schemadef.tx-timestamp No Boolean true Include the transaction operation type in a column in the encoded data. Note that this configuration must match the corresponding bdglue.encoder.tx-timestamp property in the bdglue.properties file.
schemadef.tx-timestamp-name No String txtimestamp The name of the column to populate the transaction timestamp value in. Note that this configuration must match the corresponding bdglue.encoder.tx-timestamp-name property in the bdglue.properties file.
schemadef.tx-position No Boolean true Include details of the operation’s position in the replication flow in a column in the encoded data to allow sorting when transactions are occurring more rapidly than the granularity of the transaction timestamp can support. Note that this configuration must match the corresponding bdglue.encoder.tx-position property in the bdglue.properties file.
schemadef.tx-position-name No String txposition The name of the column to populate the transaction position information in. Note that this configuration must match the corresponding bdglue.encoder.tx-position-name property in the bdglue.properties file.
schemadef.user-token No Boolean true Populate a field that will contain a comma delimited list of any user tokens that accompany the record in the form of “token1=value, token2=value, …”. Note that this configuration must match the corresponding bdglue.encoder.user-token property in the bdglue.properties file.
schemadef.user-token-name No String usertokens The name of the field that will contain the list of user-defined tokens. Note that this configuration must match the corresponding bdglue.encoder.user-token-name property in the bdglue.properties file.
schemadef.tablename No Boolean false Populate a field that will contain the long version of the table name (schema.table format).
schemadef.tablename-col No String tablename The name of the field that will contain the table name.
schemadef.txid No Boolean false Populate a field that will contain a transaction identifier.
schemadef.txid-col No String txid The name of the field that will contain the transaction identifier.
schemadef.avro-url No String /path/to/avro/schema Tells the Hive Avro SerDe where to find the avro schema for this table. Required for avro_hive schema generation
schemadef.data-location No String /path/to/avro/data Tells the Hive Avro SerDe where to find the avro-encoded data files for this table. Required for avro_hive schema generation.
schemadef.cassandra.replication-strategy No String { 'class' : 'SimpleStrategy', 'replication_factor' : 1 } The replication strategy for the table. Note that this string is passed into SchemaDef and the corresponding CQL that is generated verbatim … it must be syntactically correct.
schemadef.replace.invalid_char No String _ (underscore) Replace non-alphanumeric ‘special’ characters that are supported in table and column names in some databases with the specified character or characters. This is needed because most of the big data targets are much more limited in terms of the characters that are supported. This value must be the same as the value specified for the equivalent schemadef.replace.invalid_char property in bdglue.properties.
schemadef.replace.invalid_first_char No String x Prepend this string to table and column names that begin with anything other than an alpha character. This is needed because of limitations on the big data side of things. Set to a null value to avoid this functionality. This value must be the same as the value specified for the equivalent schemadef.replace.invalid_first_char property in bdglue.properties.
schemadef.replace.regex No String [^a-zA-Z0-9_\\.] This is a regular expression that contains the characters that are supported in the target. (Note: the ^ is required just as in the default). All characters not in this list will be replaced by the character or characters specified in schemadef.replace.invalid_char. This value must be the same as the value specified for the equivalent schemadef.replace.regex property in bdglue.properties.
Next