Schema File Configuration
This doc will guide on using the schema url configuration for types that do not have their own schema.
Overview
Writing data to sink in Avro
or Parquet
format requires schema.
If incoming data is of following type then there is no need of specifying the schema explicitly as it already gets the schema from schema registry:
- Avro
- Protobuf
- Json with Schema Registry
- Json with embedded schema
But for below formats there is not info about the schema to the connector:
- Json without schema
- Json-String
To be able to write data in either Avro or Parquet format we need to explicitly specify the schema to the connector.
There are two ways you can specify the schema:
- Pass path to the Avro schema file
- Pass the URL to the Avro schema file
Writing schema file
There are few points to be implemented before leveraging this featue:
For field
type
,null
should precede any other data type.For eg.
{"name": "uid", "type": ["null", "string"], "default": null}
The
default
should always benull
for all optional fieldsFor eg.
{"name": "isSatisfied", "type": ["null", "boolean"], "default": null},
Here is a sample Avro schema file:
{"namespace": "topics.avro",
"type": "record",
"name": "sample_schema",
"fields": [
{"name": "timestamp", "type": ["null", "long"], "default": null},
{"name": "uid", "type": ["null", "string"], "default": null},
{"name": "isConditionSatisfied", "type": ["null", "boolean"], "default": null},
{"name": "partition", "type": ["null", "int"], "default": null},
{"name": "locationDetails", "type": {"name": "locationDetailsObj", "type": "record", "fields": [
{"name": "city", "type": ["null", "string"], "default": null},
{"name": "state", "type": ["null", "string"], "default": null},
{"name": "latitude", "type": ["null", "double"], "default": null},
{"name": "longitude", "type": ["null", "double"], "default": null},
{"name": "country", "type": ["null", "string"], "default": null},
{"name": "timezone", "type": ["null", "string"], "default": null}
]}},
{"name": "analytics", "type": {"name": "analyticsObj", "type": "record", "fields": [
{"name": "analyticsChildObj", "type": ["null", "string"], "default": null}
]}}
]
}
Passing schema file
If deploying using Docker image, you can copy the schema file in the image while building it.
For detailed instructions refer to this doc
Passing URL to schema
If your schema is hosted at a remote service, you can pass the URL of that service.
Eg.
topic-name.schema.url: https://some.host.com/path/to/schema