Fully understand `enableSchemaHooks`

tayloruk · September 5, 2022, 7:31pm

We are currently performing a 4.x to 5.x update, and everything seems fine in terms of breaking changes for our app, except for enableSchemaHooks.

Noting the changes listed on the changelog, I see that this option is recommended for multiple servers which connect to a single database. Below is a network diagram of our setup for prod and staging.

Questions

Is it okay to set the enableSchemaHooks option for staging and prod? Staging is a single server, and prod has several instances.
Could anyone explain (in simple terms) what is happening under the hood of this new option and if there is anything we need to be cautious of?
MongoDB Atlas v5.0.11 with a replica set seems to have Change Stream functionality - please can someone confirm this is the case.

Many thanks for your time.

Regards

Simon

Manuel · October 12, 2022, 6:17pm

Is it okay to set the enableSchemaHooks option for staging and prod? Staging is a single server, and prod has several instances.

Yes, it doesn’t matter how many server instances there are, you can always enable the feature.

Could anyone explain (in simple terms) what is happening under the hood of this new option and if there is anything we need to be cautious of?

The _SCHEMA collection contains a definition of the Parse Server classes (i.e. MongoDB collections) and Parse Object fields (i.e. MongoDB document keys or properties). When you add a new Parse Server class, the server instance that created the class knows that there is an additional class. But if you run multiple server instances behind a load balancer, how would the other instances know? In Parse Server 4 they regularly fetched the schema from the database and cached it. If they fetch it too often, they cause unnecessary additional read load on the database. If they fetch is less often, they won’t know about the new collection for a longer period of time. If an instance doesn’t know about a collection, but the code tries to query it or modify it, it can cause app errors.

Parse Server 5 has the option to use MongoDB change stream. Instead of pulling the schema from the database in regular intervals, any change there is now pushed to the server instance - in fact to every server instance. This works by the Parse Server telling the MongoDB driver to tell the MongoDB database to watch the _SCHEMA collection for any changes. If a change happens, the MongoDB database notifies the MongoDB driver which notifies Parse Server to fetch the schema collection.

Therefore it doesn’t matter whether you enable the enableSchemaHooks option even if you have only 1 server instance. Even though the server instance knows that the schema has changed (because it changed it), it will be notified about the schema change right afterwards and fetch the schema - that’s somewhat unnecessary, but it’s technically possible and won’t cause any issues.

All nice, but there is a tradeoff. The MongoDB change stream works by setting a cursor on the MongoDB oplog collection. The oplog is a history of every write operation in your database. The higher the write frequency in your cluster, the more oplog data the database is generating per hour. For a busy cluster that can easily go into the 10 - 100 GB per hour. If you run many Parse Server instances, each one will open a change stream and set an oplog cursor. The cursor runs down the oplog and notifies the server when there has been a change in the schema collection. Each cursor adds an overhead to your MongoDB cluster resources in terms of CPU and RAM consumption. You can see how this can easily get tricky if you have many servers watching a busy oplog. If your schema doesn’t change much, the constant overhead on the MongoDB resources may also be non-sensical.

Before you upgrade, you should therefore look into what that means for your MongoDB resources. Make an estimation and be safe by scaling up your cluster vertically (more CPU, RAM). You may always scale down afterwards.

MongoDB Atlas v5.0.11 with a replica set seems to have Change Stream functionality - please can someone confirm this is the case.

That will work. Change stream is also available in MongoDB 4, however see the docs regarding availability.

On a general note

The feature was merged before we introduced alpha / beta testing, and frankly, we still didn’t get much feedback on the feature. We already opened an issue to re-add the previous schema pull functionality, in addition to the schema hooks. There are scenarios in which schema hooks are disadvantageous and impractical. In hindsight, we wouldn’t merge such a significant change anymore without alpha / beta testing and without making it optional for a while.

tayloruk · October 12, 2022, 7:02pm

Perfect, @Manuel I really appreciate you taking the time to write that comprehensive explanation!

Thanks again for this and all your work on Parse.