Share Your Performance Tips

evtimmy · April 13, 2021, 11:09pm

I wanted to create a topic where the community could share their tips on getting better Parse Server performance. Perhaps posts could include type of infrastructure, workloads and tips for improving performance? Here I’ll share some of our experience:

Infrastructure

configuration: AKS with our own custom helm chart, official docker parse-server images:
** dev/stage/prod namespaces
*** app server - multiple parse-server instances to handle client requests
*** job server - a dedicated parse-server where we run our jobs
*** redis-cache
*** dashboard
Database: MongoDB Atlas 4.4, tens of millions of objects < 1kb, 1-20k objects per user; our indexes don’t fit memory but we still get great performance
Usage Patterns: cyclical with pronounced daily/weekly patterns
Request types: heavy cloud code with multiple queries per request; custom search queries and indexing in cloud code and MongoDB
Push-notifications: silent as well as scheduled (with daily peaks); custom schedule and dispatch queue in cloud code

Performance Tips:

Load balancing
** Using a redis-cache instance is a must for horizontal scaling.
** We route requests from app server instances back to the load-balancer - our heavy cloud code that issues multiple concurrent queries gets properly balanced.
** Using a dedicated parse-server instance as our job server made it easy to monitor/troubleshoot jobs and eliminated job resource usage spikes from affecting the client-serving parse-server instances.
Push notifications
** Switch from certificate based to API key authenticated notifications - we saw more consistent memeory usage; this also eliminated need for us to manually renew expiring certificates.
Traffic
** We enabled brotli with gzip fallback on the ingress controller - big savings for our workloads, especially when clients load batches of objects.
** We enabled zlib and snappy compression for the mongodb driver (in the database uri string) with good results.
** We profiled batch size (query limit) and saw 11% traffic decrease as well as faster loads when we went from 100 to 250 limit on the batch load queries.
Database
** Don’t use query skip/limit for paging! MongoDB can’t efficiently use skip - all the skipped keys/documents will be loaded and examined. We page on updatedAt and objectId (with proper indexes created to support that).
** Use query.select() to load only needed fields. When appropariate, create an index so that the query is covered in MongoDB and no documents get loaded.
** We favor locality in indexes to reduce paging (and memory pressure) - a lot of the objects we query have natural grouping by user account and creating compound indexes with the acccount as a prefix key increases locality (less of the index needs to be resident in memory for the query to scan); it will allow us to easily do sharding on the account key in the future.
** We don’t index on _rperm as it is an array field, takes lots of space and most of the time it’s faster for the query planner to just fetch the document and filter. Disclaimer: we create unique role per user account (shared by multiple users) and most workloads either query ‘public’ or ‘account role’ where other keys on the query makes the permissions almost always guaranteed to match (keys examined and docs returned for us is almost always 1:1).
** Be careful with query.include() - this triggers a subsequent query on the objectId field (hits the _id index). In WiredTiger _id index is huge for huge collections and queries on it may cause more paging due to its distribution (depending on your workload, ours queries hit both old and new objects). For our workload where we load both old and new objects (as opposed to mostly newish objects), we get better performance when we create a separate query, constrain on the user account (natural grouping for locality) and force the query to execute on the compound index.
Client caching
** Super important - our apps are optimized to extensively use the cache and fetch only what is needed. General strategy is have version and changelog; load state from cache and compare with version from server; fetch and apply only changes during client session.

Some performance wishlists/questions:

It would be nice to be able to add additional conditions to be applied on the query.include() resulting query (as well as hints for index use). One apparent use-case would be sharding - if the collection (that objects are included from) is sharded, we’d need to specify the shard key as condition on the query to make that efficient.
Does the current mongodb driver in parse-server support zstandard compression setting?

Manuel · April 14, 2021, 12:28am

This is one of the most complete lists of common Parse Server / MongoDB optimizations I have seen so far. I would say a must-read for beginning developers. Well composed, hats off!

One thing I may add:

Use short class and field names, they contribute to data traffic costs.

If the working set does not fit into RAM, upgrading MongoDB is usually required due to how MongoDB works internally. It may seem to run fine for some time, but one needs to have a trained eye for MongoDB metrics to recognize early signs that it’s about to crash. MongoDB doesn’t allow much leeway between running optimally and going under, unlike other DMSs that demonstrate a slow, steady performance degradation.

A word of caution: load balancer traffic is usually a distinct cost factor for Cloud Service providers. When routing requests from a server instance not to itself but back to the load balancer for distribution, keep an eye on the costs, this can be significantly more expensive. The alternative to redistribution would be adapting the load balancer algorithm to something else than round-robin.

Parse Server uses the latest MongoDB Nodejs driver version 3.6.6 and according to the docs it does not support zstd compression. It is possible that version 4.0.0 which is currently in beta will support it.

A big +1 for extending sharding support in Parse Server. Please feel free to open a new topic here or new issue on GitHub to lay out how which obstacles / scenario you discovered and how Parse Server can improve in that regard.

evtimmy · April 15, 2021, 7:08am

Thank you @Manuel, I’ll file feature request for the additional sharding support once we finish our preliminary testing.

Yes, great point about costs and load balancers, something to watch out for for sure! We actually route the request to internal-only load-balancer that sits in front of the app-servers, so that traffic stays internal to the cluster and does not go to the load balancer that handles the request to/from the cluster ip.

As a side-note we currently have setup the compression at the ingress controller level, so the internal sub-requests that go between the parse-server instances do not actually use compression. An alternative would be to enable compression on each parse-server instance, but we haven’t looked into what kind of saving that would bring (is it worth it for our workloads?) or even how to configure… perhaps using npm compress-brotli would be one way to go…

Regarding mongo working set - absolutely. Once the working set doesn’t fit memory, everything hangs. I guess the point I was alluding to is that working set heavily depends on access patterns and index locality as opposed to a simply enough RAM to fit everything/all indexes. Getting in the mindset of thinking about access patterns, data locality, traffic/load patterns and data growth patterns is also a must when considering horizontal scaling. Some of the metrics we closely monitor to ensure we have adequate MongoDB resources are:

Page faults - this is our most important metric - we are looking for very low numbers and consistent patterns with no big spikes - it’s OK for this metric to slightly track traffic (ideally we typically page fault every once in a while when a user comes in and their cold data gets paged in). Our current baseline is 0.02 page faults/sec with a gradual creep up to 0.08 during heaviest traffic. We do get some momentary spikes even up to 4 faults/sec when we add a new index to the mix and the working set gets adjusted and that’s ok.
Disk IOPS - similar to page faults - we want consistent metric with plenty of headroom to handle unexpected spikes in traffic - we are looking to be below 50% of max IOPS during our heaviest times.
CPU IOWait % - we are looking for steady low value here to indicate that there is no stalling waiting for data to load.
Operation execution time - we are monitoring for consistent low times, with no spikes.
System restarts - if nodes restart unexpectedly, there’s an issue to be investigated
We do have alerts on all those and address accordingly.

Manuel · April 15, 2021, 8:01am

I think there is still quite some optimization potential when it comes to data traffic between Parse Server and DB. That traffic is often a major cost driver, especially when scaling up and especially when using DBaaS providers who often charge a premium for that traffic.

I am currently looking into:

reduce number of requests
remove extraneous data transmitted as part of a request

I will open an issue and maybe you want to join in, since you already have a focus on optimization.

evtimmy · April 15, 2021, 8:33am

Thinking about performance - here are my dream performance-related feature requests

Parse-server specific performance metrics gathering/profiling (low-overhead) akin to MongoDB profiler in Atlas. Maybe add a prometheus exporter, others, for easy viewing/analysis/alerting. Example metrics that would be useful:
** Query timing. Could group by class and maybe source code location/originating cloud function? E.g. query on User in userAccount.js is taking longer than 1s.
** Cloud Function timing.
** Data in/out by Cloud Function / Query / endpoint
Anomaly detection on above metrics

The reason these are my dream requests is I want to have a detailed picture on how everything is performing so I know what to spend my time on optimizing.

Perhaps the approach for implementing something like this would be some sort of performance/metrics/profiling adapter. We could start with something basic like measuring queries and cloud functions and exposing that data in the dashboard?

Manuel · April 15, 2021, 9:33am

We have had this discussion before and my position is still against a built-in metrics feature.

The reason is that metrics are only really valuable in their architectural context of a distributed system. We would compete with sophisticated tools such as NewRelic, which achieve all of the features you mentioned. There are some open source projects available as well, such as netdata or prometheus. There is usually a lot more to it, such as anomaly detection which involves ML and alert notification to be practically usable beyond manual investigations.

I have shared a NewRelic configuration for Parse Server on StackOverflow, so it should be easy to try out.

Parse Server would have a hard time to compete with these tools and it would distract and divert resources from the core product in my view.

enodev · April 15, 2021, 10:05am

I am already using “express-prometheus-middleware”: “^0.8.5” in our custom build parse server docker image, and it was fairly trivial to integrate and now it collects a ton of useful metrics from each of our instance. Works like a charm and can remain external to parse server. It’s just an express middleware after all.

Taylorsuk · April 17, 2021, 8:58am

This is super interesting, thank you so much for taking time to share this @evtimmy!

Could you expand a little on how you’re running a separate Redis cache instance? What service are you using and how is it configured?

Our setup is:

Cloudflare LB to application servers which connect to Atlas.

We’re starting to feel some performance issues as we start performing more complex queries, which funnily enough use Includes!!

evtimmy · April 17, 2021, 7:41pm

Good points @Manuel, no I definitely don’t want to reinvent the wheels here. I’ll take a look at NewRelic, but since we run our own monitoring/logging infrastructure already, we are more interested in getting additional metrics from parse-server at this point. Currently we are using the official docker images, but this may be the reason we’ll switch to a custom docker image and add the express-prometheus-middleware as @enodev pointed out.

evtimmy · April 17, 2021, 7:54pm

Sure @Taylorsuk, we run in Azure, on Kubernetes (AKS). We wrote a custom helm chart that describes all our orchestration, but basically we run Redis (we use the bitnami chart for that - https://charts.bitnami.com/bitnami) and that is accessible with a url and password (internal to our cluster, so not accessible from outside the cluster). Then to make parse-server containers talk to the Redis container, we configure the cacheAdapter in the config.json that we use to initialize parse-server like this:

"cacheAdapter": {
    "module": "../Adapters/Cache/RedisCacheAdapter/index.js",
    "options": {
      "url": "redis://redis-endpoint:port",
      "password": "redis-password"
    }
}

evtimmy · April 17, 2021, 8:11pm

@Taylorsuk Atlas and MongoDB also does offer a lot of performance tuning options.

One tip for profiling queries - on your dev server you can do query.explain(true) - this is the same as ‘explain plan’ - the result of the query will be a json blob, the same you get with MongoDB Compass explain plan. In there you can find the ‘parsed query’ - which is the MongoDB query your parse-server query got translated into. You can intercept and debug queries from the client by taking advantage of the parse-server Parse.Cloud.beforeFind()
When developing new complex queries, I would first try them out in MongoDB Compass, using the ‘explain plan’ feature and make sure they have the appropriate indexes to run.
For existing queries that are getting slower as more data is accumulated, I’d first look at the Profiler tab in Atlas, there you can drill-down into slow operations and see what the query is - copy it and run it in MongoDB Compass ‘explain plan’ to get the winning and rejected plans from the query planner.
Sometimes the query planner will run your queries on the wrong index - you can add the hint on the index to use (caution - this is advanced, and the query will fail if the index doesn’t exist, this also makes your code more brittle). In parse-server this is done with query.hint(). You can even override/mutate queries issued by the client in your server when intercepting them with Parse.Cloud.beforeFind()

Taylorsuk · April 18, 2021, 6:51pm

@evtimmy that’s great to know and clears up some of my other questions around the mystery of what indexes are being hit (or not).

Once again, thank you for taking the time to write the list and also expanding on a few things. I’m sure there are many companies (including mine) who would pay handsomely for consultancy in this area.

Manuel · April 19, 2021, 10:59am

For readers interested in the MongoDB aspects of optimizations, which seem to be a focus here, MongoDB themselves offers many articles and webinars about how to do that, for example:

Manuel · August 14, 2021, 1:26pm

Hi @evtimmy I just want to catch up on sharding. Did you encounter any limitations in Parse Server that need to be removed when working with sharded MongoDB?

Manuel · September 2, 2021, 2:57pm

Here is a PR to add a Best Practice page to Parse Server.

If anyone wants to open another PR to amend that page and maybe add some of the points mentioned here, that would be great!

github.com/parse-community/docs

feat: add best practice page

parse-community:gh-pages ← mtrezza:add-best-practice-page

opened 05:38PM - 02 Sep 21 UTC

mtrezza

+25 -1

Add best practice page. This PR is just to add a page so others can extend it…. It does not claim to be complete by any means. There is an extensive list of optimizations in the forum, but a forum post is likely to become harder to find and partially obsolete over time, so these would be easy candidates to be added to this page: https://community.parseplatform.org/t/share-your-performance-tips/1572/

MobileVet · October 4, 2021, 3:29pm

@evtimmy This was a great post, thanks for sharing with the community.

multiple parse-server instances to handle client requests

Do you achieve this via Node clustering or with added server instances? Based on the low memory & CPU footprint of our Parse server, we found clustering to be much more economical extremely effective.

evtimmy · October 7, 2021, 12:25am

Hi @Manuel, unfortunately I haven’t completed the sharding investigation as we had to shift priorities and we were already getting good perf with a few optimizations. I’ll be sure to post here and open an issue if I find the time to resume that and find anything useful.

evtimmy · October 7, 2021, 1:10am

@MobileVet We run on Kubernetes with multiple Parse-server pods. With k8s Parse-server fits very well and it’s easy to orchestrate in a scalable way, this is roughly how we have it currently per namespace (dev, stage, production):

Application server deployment (parse-server talking to clients through load-balancer ingress) - multiple pods (scale with load)
Job server deployment (parse-server talking to application server, used to run regular jobs) - single pod
Redis cache for the parse-server deployment - single pod
LiveQuery server deployment - multiple pods
Redis cache for the live query server deployment - multiple pods
Dashboard deployment - single pod
File servers for static content deployment - multiple pods
Other microservices…

Then resource-wise, we monitor the resources for the cluster as a whole and we scale VMs as needed. We have found performance to be good, we fit quite a few pods on a VM.

dblythy · March 16, 2023, 4:19am

Probably mentioned here - but if you use are using multiple instances / clustering, a global cache adapter is necessary (such as a RedisCache), as by default Parse Server caches session token / user per request in memory. If instance A receives a request from a user, it will cache. Now, if instance B receives a write operation to user, and the next request hits instance A, instance A will have user cached from the old state

andreisu · March 19, 2023, 3:29am

Does the default implementation of redis cache adapter make it instance specific or global? How would you go about implementing a global cache?