Strange behaviour in Parse server

We’re observing really strange behaviour in Parse-server.

It works perfectly between 10:30am and around 3:30 pm and again from midnight to 7:30 am.
By that I mean the we get close to 200,000 - 900,000 logs per half an hour in those timeperiods,
however in the time period it’s not working the numbers drop to only 20,000 odd

In between the time it doesn’t work we keep getting the following errors -
Nginx reverse proxy -
[error] 1107#0: *20723 recv() failed (104: Connection reset by peer) while reading response header from upstream, client
Parse server throws the following error -
{"name":"MongoError","level":"error","message":"Uncaught internal server error. Cannot use a session that has ended","stack":"MongoError: Cannot use a session that has ended\n at applySession (/usr/lib/node_modules/parse-server/node_modules/mongodb/lib/core/sessions.js:695:12)\n at _command (/usr/lib/node_modules/parse-server/node_modules/mongodb/lib/core/wireprotocol/command.js:58:17)\n at command (/usr/lib/node_modules/parse-server/node_modules/mongodb/lib/core/wireprotocol/command.js:28:5)\n at writeCommand (/usr/lib/node_modules/parse-server/node_modules/mongodb/lib/core/wireprotocol/write_command.js:47:3)\n at Object.insert (/usr/lib/node_modules/parse-server/node_modules/mongodb/lib/core/wireprotocol/index.js:6:5)\n at Connection.insert (/usr/lib/node_modules/parse-server/node_modules/mongodb/lib/cmap/connection.js:187:8)\n at /usr/lib/node_modules/parse-server/node_modules/mongodb/lib/core/sdam/server.js:483:13\n at Object.callback (/usr/lib/node_modules/parse-server/node_modules/mongodb/lib/cmap/connection_pool.js:345:7)\n at processWaitQueue (/usr/lib/node_modules/parse-server/node_modules/mongodb/lib/cmap/connection_pool.js:468:23)\n at /usr/lib/node_modules/parse-server/node_modules/mongodb/lib/cmap/connection_pool.js:261:28","timestamp":"2021-04-12T08:00:32.083Z"}

This leads to a 502 gateway timeout after about a minute.

Our architecture is -
parse-sdk on a mobile app
AWS Application Load Balancer
Nginx Reverse Proxy
Parse-server
Mongodb
ELK stack

Our parse-servers are in an Auto Scaling Group with a minimum of 2 instances running constantly.

If someone could help with this issue it would be a huge help.
If anymore details are needed please let me know

Do you have any customization around the connection between Parse Server and MongoDB, specifically around the MongoDB driver / client?

You may want to post your MongoDB connection string (removing any sensitive infos) and how you configured the DB adapter in Parse Server.

Does the reverse proxy have anything to do with the connection between Parse Server and MongoDB or are you only using it to route incoming requests?

This means the MongoDB driver closed the connection (voluntarily or involuntarily) before Parse Server tried to send a DB request. My first assumption would be that you are interacting with the MongoDB driver directly (and incorrectly) in Parse Server.

It depends on what “upstream” here is in your architecture. If upstream is MongoDB, that means the DB server closed the connection while the requesting server assumed the connection was still open. This could indicate a timeout misconfiguration that lead to the involuntary closing of the DB connection. You could post and revise any relevant timeout settings for all components in the path between Parse Server and MongoDB.

Also, some basic questions:

  • What version of Parse Server?
  • What version of MongoDB driver?
  • What version of MongoDB?
  • Have you forked Parse Server or manually changed any dependencies?

Hi @Manuel ,

No customizations at all, I’m using a config.json file.

DB String -
“databaseURI”: “mongodb://username:URL encoded password@vimongo1:30000/db name?authSource=admin&tls=true&tlsCertificateKeyFile=Pem file path&tlsCAFile=CAFile Path”

The reverse proxy is only for incoming requests.

But then the strange thing is that, a. Using a config.json b. It works at specific times.

The only timeout we’ve configured is for Nginx we’ve set /etc/nginx/nginx.conf
keepalive_timeout 65;

Parse server version - 4.5.0
MongoDB driver is whatever is bundled with parse-server, we’ve not installed anything extra
MongoDB version - 4.2.8
We’ve simply used the node module for parse-server

Are you saying that you are directly interacting with the MongoDB driver?

What is your Parse Server config for the DB adapter?

Also, see my edit about the upstream error above.

In our case the “upstream” is the parse-server.
We’re using the proxy_pass directive -
proxy_pass http://127.0.0.1:[port number];
We’re using Nginx reverse proxies only for incoming requests not to connect to MongoDB.

I’m not sure I understand the question, doesn’t parse-server use mongoose to connect to the MongoDB?
And we haven’t changed anything in the NPM module for Parse-Server if that’s your question

By directly interacting I mean whether you are calling any MongoDB driver APIs directly in your custom code, compared to only calling Parse Server APIs and letting Parse Server interact with the DB for you. The difference would be that direct DB driver interaction requires you to manage the connections yourself, which could lead to the error you described if not done correctly.

Absolutely not, we allow parse-server to handle everything.
We haven’t got anything custom in terms of Parse-Server it’s just the default settings.

My assumption at this point is:

  • Request comes in from client to server
  • Server sends request to DB
  • Something happens on the DB side (DB does not send response, DB goes down trying to run a very resource intensive op, scaling op going on in a non-rolling fashion, etc)
  • DB does not send response but closes connection
  • Parse Server tries to send another request
  • DB driver throws because the connection has already been closed by DB

I don’t see a relation between the MongoDB error and the 502 error though. Unless during the 502 errors the request volume falls to very low levels that come into the range of timeout durations, like 1 request per 30/60 seconds per ALB instance. Instead, this may be another issue with a timeout mismatch between the ALB and the Nodejs http server instance (not Parse Server).

If you experience more 502s during off-peak-time or low request levels, that is a usual timeout misconfiguration pattern. Timeout mismatches are mostly in the last seconds of a timeout period. During high request load, the chances of a connection running into these last seconds is lower than during low request load, so you’d see more errors.

What are the timeout settings for

  • ALB
  • nginx
  • Nodejs http server

What we’ve noticed is that only the parse-server goes down.
We can still run operations on Mongodb.
While we were diagnosing I had mongo shell open and would regularly find the last document, at the same time Parse server was also open and would intermittently send data.

No the request volumes don’t really drop during the 502 if we go by the trends of when parse-server worked.

I actually did see a similar post on stackoverflow about this issue, but I couldn’t see an option to configure a keepalive timeout in parse-server.
Out of desperation I used a cloned Image to try and add server.keepaliveTimeout and server.headerTimeout from this link -

I went to /usr/lib/node_modules/parse-server/lib/cli/ParseServer.js and tried to set it there but I’m not even sure it worked properly.
We still got logs for 30-60 seconds before we started to see the 104 connection reset error and then the 502.

Even at our lowest volume we get around 100,000 per half an hour midnight to 7:30 am.
So it’s unlikely that requests will drop to 1 request per 30/60 seconds.

ALB - Idle timeout 60 seconds
Nginx - KeepAliveTimeout 65 Seconds
Nodejs https server - No timeout configured

Nodejs https server - No timeout configured

That may be the issue.

I wrote an extensive post regarding timeouts in relation to load balancers. In Heroku context, but the same applies in any architecture.

Hi Manuel,

We configured the following file for parse-server -
/usr/lib/node_modules/parse-server/lib/ParseServer.js

With the following code -

static createLiveQueryServer(httpServer, config, options) {
    if (!httpServer || config && config.port) {
      var app = express();
      httpServer = require('http').createServer(app);
      httpServer.timeout = 60 * 1000;
      httpServer.keepAliveTimeout = 70 * 1000;
      httpServer.headersTimeout = 120 * 1000;
      httpServer.listen(config.port);
    }

However it still didn’t make any difference.
After around 30 seconds it started with the 104 errors again and after around a minute we got the 502.
And at around 10:14 IST the logs started working properly again.

Could it be some kind of issue in parse-sdk where it’s sending packets constantly to parse-server which causes parse-server to work only for a minute?

You could easily answer that by looking at the traffic.

You could make a diagram of the infrastructure and jot down all the timeouts to reconstruct how requests and responses travel through the components and which timeouts apply. This to me looks much like a timeout misconfiguration. Each timeout has to be adjusted according to the whole context of timeouts. Remember, there are different types of timeouts a component may apply (nginx has a bunch of timeout types) and there may be default timeouts that you don’t see in the configuration if you don’t override them.

If you can reliably reproduce the issue, you could inspect the network traffic using a tool like wireshark. That usually gives you direct insight into the scenario that causes the 502, but it may take some time to learn how to use the tool and interpret the logs.

Wireshark and the diagram are excellent ideas!
The diagram might actually give us a clear understanding of the timeouts at each step.
Wireshark should be able to provide reliable proof of whatever the issue is.
However the problem is reliably reproducing the error.

Hi Manuel,

We’ve tried isolating the timeout issue but no matter what we set, it just returns the same

MongoError: Cannot use a session that has ended
    at applySession (/usr/lib/node_modules/parse-server/node_modules/mongodb/lib/core/sessions.js:695:12)
    at _command (/usr/lib/node_modules/parse-server/node_modules/mongodb/lib/core/wireprotocol/command.js:58:17)
    at command (/usr/lib/node_modules/parse-server/node_modules/mongodb/lib/core/wireprotocol/command.js:28:5)
    at Object.query (/usr/lib/node_modules/parse-server/node_modules/mongodb/lib/core/wireprotocol/query.js:57:3)
    at Connection.query (/usr/lib/node_modules/parse-server/node_modules/mongodb/lib/cmap/connection.js:175:8)
    at /usr/lib/node_modules/parse-server/node_modules/mongodb/lib/core/sdam/server.js:309:12
    at Object.callback (/usr/lib/node_modules/parse-server/node_modules/mongodb/lib/cmap/connection_pool.js:345:7)
    at processWaitQueue (/usr/lib/node_modules/parse-server/node_modules/mongodb/lib/cmap/connection_pool.js:468:23)
    at /usr/lib/node_modules/parse-server/node_modules/mongodb/lib/cmap/connection_pool.js:261:28
    at processTicksAndRejections (internal/process/task_queues.js:79:11)
  

Anything we should do at MongoDB level?

One of our devs found the following -
Here is the file that throws the error
https://mongodb.github.io/node-mongodb-native/3.1/api/node_modules_mongodb-core_lib_sessions.js.html

We need to handle proper db client connection closure on sdk side
db.collection(collection).find(query).project(project).sort(sort).limit(limit).toArray().then((records, err) => {
if(err) callback(null, err)
if(records) callback(records, null)
client.close() <=============== Close here
})
close client connection in callback

Is this handled by parse-server/parse-sdk or is there a way to code this somewhere?

Please do let me know.

PS - We have also increased the DB instance size to a c5.xlarge(4vcpus 8GB RAM) the volume is a gp3(SSD 300mbps throughput and 6000 IOPS).

Did you add code like db.collection... client side or server side?

No, it’s just posts to the parse-server URL.
We were just wondering if there’s a way to adapt this to parse-sdk or parse-server to hopefully get it working?

Just another update -
I added maxPoolSize to the DB URI in config.json and while we still get the 502 after around a minute, there are no more errors either in Nginx or Parse server.
I don’t know if that helps with the diagnosis
&maxPoolSize=1000&serverSelectionTimeoutMS=20000

Hi @Manuel,

I think I’ve kind of got it working, I’ve kept maxPoolSize to the db connection string.
I’ve also changed how I start parse server -
node --max-old-space-size=1500 /bin/parse-server /parse/config.json

It’s been stable from yesterday.
However some logs aren’t getting captured still.
Our current number of records/30 minutes is 900000, while we should be getting at least 1,6xx,xxx records.

Any idea on what I should tune further on node or parse server or Mongo?

Hard to tell from here. A “higher number of logs” is quite an abstract expectation, I think you’d need to pin it down to a specific issue.

From what I can understand it’s an issue between the connection between Parse-server and Mongodb.

After adding maxPoolSize=1000 to the db connection string in the parse-server config file it has started generating more logs.
At the same time the node --max-old-space-size setting allows parse-server to run more.