Are you happy with your logging solution? Would you help us out by taking a 30-second survey? Click here


A schema store service that tracks and manages all the schemas used in the Data Pipeline

Subscribe to updates I use schematizer

Statistics on schematizer

Number of watchers on Github 50
Number of open issues 2
Main language Python
Open pull requests 2+
Closed pull requests 0+
Last commit almost 3 years ago
Repo Created about 3 years ago
Repo Last Updated over 1 year ago
Size 17.9 MB
Organization / Authoryelp
Page Updated
Do you use schematizer? Leave a review!
View open issues (2)
View schematizer activity
View on github
Fresh, new opensource launches 🚀🚀🚀
Trendy new open source projects in your inbox! View examples

Subscribe to our mailing list

Evaluating schematizer for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)


What is it?

The Schematizer is a schema store service that tracks and manages all the schemas used in the Data Pipeline and provides features like automatic documentation support. We use Apache Avro to represent our schemas.

Read More

How to download

git clone


Running unit tests

make -f Makefile-opensource test

Running unit integration tests

make -f Makefile-opensource itest

Setup and Configuration

  1. Create a mysql database for Schematizer Service::

  2. Create MySQL tables in <db_name> database for Schematizer Service::

    cat schema/tables/*.sql | mysql <db_name>
  3. Create a topology.yaml file

    -   cluster: <schematizer_cluster_name>
    replica: master
        - charset: utf8
          use_unicode: true
          host: <db_ip>
          db: <db_name>
          user: <db_user>
          passwd: <db_password>
          port: <db_port>
  4. In config.yaml assign values to the following configs::

    schematizer_cluster: <schematizer_cluster_name>

topology_path: /path/to/topology.yaml

Use `serviceinitd/` to start the Schematizer service.

### Interactive directly with Schematizer Service.

Registering a schema::

curl -X POST --header 'Content-Type: application/json' --header 'Accept: text/plain' -d '{ namespace: test_namespace, source_owner_email:, source: test_source, contains_pii: false, schema: {\type\:\record\,\namespace\:\test_namespace\,\source\:\test_source\,\name\:\test_name\,\doc\:\test_doc\,\fields\:[{\type\:\string\,\doc\:\test_doc1\,\name\:\key1\},{\type\:\string\,\doc\:\test_doc2\,\name\:\key2\}]} }' ''

Getting Schema By ID::

curl -X GET --header 'Accept: text/plain' ''

### Interactive with Schematizer Service using Schematizer Client Lib.

Registering a schema::

from data_pipeline.schematizer_clientlib.schematizer import get_schematizer test_avro_schema_json = { type: record, namespace: test_namespace, source: test_source, name: test_name, doc: test_doc, fields: [ {type: string, doc: test_doc1, name: key1}, {type: string, doc: test_doc2, name: key2} ] } schema_info = get_schematizer().register_schema_from_schema_json( namespace=test_namespace, source=test_source, schema_json=test_avro_schema_json,, contains_pii=False )

Getting Schema By ID::

from data_pipeline.schematizer_clientlib.schematizer import get_schematizer

schema_info = get_schematizer().get_schema_by_id( schema_id=schema_info.schema_id )

We're still in the process of setting up this service as a stand-alone. There may be additional work required to run a Schematizer instance and integrate with other applications.

Schematizer is licensed under the Apache License, Version 2.0:

Everyone is encouraged to contribute to Schematizer by forking the Github repository and making a pull request or opening an issue.
schematizer open pull requests (View All Pulls)
  • Added build tag to readme
  • Update schematizer
schematizer list of languages used
Other projects in Python