Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Support for Schema References. #92

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

whizzzkid
Copy link

@whizzzkid whizzzkid commented Feb 8, 2021

Namaste Team

In this PR:

  • I am adding support for SchemaReferences using a combination of typeHook and logicalType
  • This seems to work in out internal testing, but I'll need to figure out how to write unit tests for this.
  • This issue is being talked in avsc: Support for external schema references mtth/avsc#309 but they will not have a solution unless they would like to make api calls to the registry.

Issues Touched:

Nishant Arora and others added 7 commits February 8, 2021 10:52
* master:
  Fix test failures
  tweak error message to be more explicit
  Add tests for schemas without a namespace
  Clean up and differentiate Schema type
  Treat namespace as required only when necessary
@whizzzkid whizzzkid marked this pull request as ready for review February 8, 2021 20:44
@alonpi
Copy link

alonpi commented Mar 15, 2021

@whizzzkid, is there an ETA on when it will be possible to use schema references? :)

@whizzzkid
Copy link
Author

@alonpi we are using schemaReferences as of today based on my fork. I am not sure how to get this PR reviewed, they don't a have a contribution doc, nor am I able to assign reviewers for this PR. If @Nevon can take a look, we can go from there.

PS: I noticed the v2 release, will resolve the conflicts shortly.

@Nevon
Copy link
Member

Nevon commented Mar 15, 2021

I can try to help, but I have never heard of schema references until now, so I have some catching up to do. One thing I can say for sure is that any contribution that adds functionality like this needs to be tested. We generally favor integration tests, so if you can describe the expected usage, it should be pretty straightforward to write a test that does exactly that.

From my very brief reading of confluentinc/schema-registry#1280, this seems to be a feature that exists for not just Avro, but also JSON Schema and Protobuf. We haven't really decided on a policy of how to keep the different schema types in feature parity. Would you be able to drive this change also for Protobuf or JSON Schema? Without having delved into it, I don't know if doing this is a large or small effort.

@whizzzkid
Copy link
Author

@Nevon Schema References were added in v5.5 of Confluent Platform: https://docs.confluent.io/platform/current/schema-registry/serdes-develop/index.html#referenced-schemas

So essentially any newer implementation should support this. This was a blocker for our (AppDirect) kafka deployment, and this was the easiest fix. The ask for the support has been since November in #82.

I can look into implementing this change for protobuf and JSON schemas too and fix the test cases as you mentioned. Before that I will need to understand what changes does v2 bring in.

I'll get back to you in a few days!

@tejans24
Copy link

tejans24 commented Apr 15, 2021

@whizzzkid are you still working on this PR? We are trying to use the lib and would love to use schema references as part of schema creation through this package. Thanks for working on adding this feature.

@whizzzkid
Copy link
Author

@tejans24 this is ready to merge if @Nevon agrees. I can work on the protobuf/JSON ask in a seperate pr.

We published an internal version using my fork and seems to work well for us. So far no complaints received.

@tejans24
Copy link

@tejans24 this is ready to merge if @Nevon agrees. I can work on the protobuf/JSON ask in a seperate pr.

We published an internal version using my fork and seems to work well for us. So far no complaints received.

@whizzzkid: As @Nevon mentioned earlier, is there way to help add some integration testing for this? I guess it can get slightly complex because of the nature of schema references.

@300LiterPropofol
Copy link

300LiterPropofol commented Apr 19, 2021

@whizzzkid Thank you a lot for this schema reference AVRO contribution! I wonder is there a description of how to use your commit? How should I register my producer on a nesting schema, is it still

const registry = new SchemaRegistry({});
const kafka = new Kafka({});
const producer = kafka.producer();
const subject = "";
const version = 1;
const valueId = await registry.getRegistryId(subject, version);
producer.send({
        topic: '',
        messages: [
            {
                value: await registry.encode(valueId, payload)
            },
        ],
    });

? Thank you! I also noticed that the code based on which you modified on is not the current code in master branch. Thus I cannot simply copy paste the your "changed line" under my node_modules folder. Can you tell about which version of @kafkajs/confluent-schema-registry you used to write your fork branch?

@300LiterPropofol
Copy link

@whizzzkid Hi! I couldn't figure out how to use your fork.
I mean, if I npm install git+https://github.com/whizzzkid/confluent-schema-registry.git\#feature/schema-references, your module seems to have a dependency compatibility issue. This command will give error as below:

npm ERR! code 1
npm ERR! git dep preparation failed
npm ERR! command /home/.nvm/versions/node/v15.1.0/bin/node /home/.nvm/versions/node/v15.1.0/lib/node_modules/npm/bin/npm-cli.js install --only=dev --prod --ignore-prepublish --no-progress --no-save --cache=/home/.npm/_cacache --prefer-offline=false --prefer-online=false --offline=false --before=
npm ERR! npm WARN invalid config before="" set in command line options
npm ERR! npm WARN invalid config Must be one of: null, valid Date string
npm ERR! npm ERR! code ERESOLVE
npm ERR! npm ERR! ERESOLVE unable to resolve dependency tree
npm ERR! npm ERR!
npm ERR! npm ERR! While resolving: @kafkajs/[email protected]
npm ERR! npm ERR! Found: [email protected]
npm ERR! npm ERR! node_modules/jest
npm ERR! npm ERR!   dev jest@"^25.2.7" from the root project
npm ERR! npm ERR!
npm ERR! npm ERR! Could not resolve dependency:
npm ERR! npm ERR! peer jest@">=24 <25" from [email protected]
npm ERR! npm ERR! node_modules/ts-jest
npm ERR! npm ERR!   dev ts-jest@"^24.0.2" from the root project
npm ERR! npm ERR!
npm ERR! npm ERR! Fix the upstream dependency conflict, or retry
npm ERR! npm ERR! this command with --force, or --legacy-peer-deps
npm ERR! npm ERR! to accept an incorrect (and potentially broken) dependency resolution.
npm ERR! npm ERR!

If I changed ts-jest version to 25.2.1, still when I try to const { SchemaRegistry } = require('@kafkajs/confluent-schema-registry');
There throws an error:

node:internal/modules/cjs/loader:335
      throw err;
      ^

Error: Cannot find module '/home/dishaw/serial_number/confluent_kafka/node_modules/@kafkajs/confluent-schema-registry/dist/index.js'. Please verify that the package.json has a valid "main" entry

According to your reply above, your fork seems to work for you. Can you describe more about how you installed your fork? Thank you very much.

@whizzzkid
Copy link
Author

whizzzkid commented Apr 22, 2021

@300LiterPropofol that will not work because you need to build the module first, This project uses Azure to build the module. Internally we added a manual build step in our infrastructure to publish this to our own internal npm instance which works. You can build the fork locally and then use: npm i <local_path> to run this.

@300LiterPropofol
Copy link

@300LiterPropofol that will not work because you need to build the module first, This project uses Azure to build the module. Internally we added a manual build step in our infrastructure to publish this to our own internal npm instance which works. You can build the fork locally and then use: npm i <local_path> to run this.

Thank you very much for your response. @whizzzkid , sorry that I am not very familiar with nodejs. Could you explain more about how to build the packages locally?

@whizzzkid
Copy link
Author

@300LiterPropofol

  • Clone the fork.
  • Install Deps: yarn install
  • Build: yarn build
  • Use it in your project like: npm i -S ../path/to/fork

@300LiterPropofol
Copy link

@whizzzkid Thank you very much! I will give a try.

@300LiterPropofol
Copy link

300LiterPropofol commented Apr 23, 2021

@whizzzkid Sorry to disturb you again. I tried to build the module locally and installed from local. I hit an error that

Error: undefined type name: selfdefinednamespace.selfdefinedname
 at Function.Type.forSchema (/home/node_customize/confluent-schema-registry/node_modules/avsc/lib/types.js:177:11)

I wonder how deep your schema reference feature can go, I have a schema structure like:
schema A (refer to schema B)
schema B(refer to schema C)
schema C (refer to schema D)
schema D (refer to schema E)
and the error undefined type name: selfdefinednamespace.selfdefinedname is the fullname of schema E.
Is there any limitation about how deep the feature can retrieve? Thanks!

Update on the above comment:
There is no limitation depth in this add-on feature. I finally found out what caused this kind of confusing error.
For anyone who is upcoming and may be stuck in the same issue, in case you need it,
To be more specific, I am using Confluent Cloud, so the schema created can be observed in the Confluent Cloud webpage UI.
If you are using references, you can see three columns named "Reference name", "subject", "version" under the "Schema References" text block.
The "Reference name" column MUST be filled with the fullname, i.e. namespace.name, of the referred schema. If you fill the column with something else. It will stilln pass the schema validation and probably do not have effect on the nesting schema registry producer (for example I was using python client, this field does not have impact.) But for whizzzkid's add-on, that field has to be the fullname.
Now I have it working! Thank you a lot again for your amazing job!!

@whizzzkid
Copy link
Author

@300LiterPropofol There shouldn't be a limit on the depth of the schema, glad the naming worked out for you. We can fix the naming, but I am not sure if that would be a good idea. I am looking into fixing the 👇🏽 conflicts first.

Thanks for validating :)

@overbit
Copy link

overbit commented Jul 15, 2021

Is there any update on this?
We are really keen in using this library but all our Avro Schemas contain nested schemas and this is not currently supported

@brinchj
Copy link

brinchj commented Jul 30, 2021

Just a heads up: I've started a discussion around references in the related issue here: #82

I think we are a bunch who are interested in this and I think we could team up and produce the changes needed together in uniform 👍

@WhereIsMyRum
Copy link

I guess this is dead by now, is it? 🤔

@whizzzkid
Copy link
Author

@WhereIsMyRum this was working as is in production without any failures. Then I switched jobs, had some personal commitments and got covid, I hope it's still working there. I'll see if I can find sometime to get this through in the next few weeks, or will close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants