ParseSwift SDK: Observe LiveQuery WebSocket status

Hm, it makes sense and I guess that will be the same in case of restart or connection throttling. So if there is no way how to handle connection problems through that delegate method the clients should detect connection problems in some other way. One Idea would be to ignore connectivity at all and just react on the response from object .save() that could be rejected by cloud code function if the object in server database is newer. In other words, if the server would reject a .save() request, the client app would realise that the data in device might be obsolete and would try to refetch actual state and restart LiveQuery. But here I stuck again.

First I tried:

        let client = ParseLiveQuery.getDefault()
        client?.close()
        client?.open(completion: { error in
            print("error opening LQ: \(error)")
        })

the client?.close() seems to close the task, but keeps client?.isSocketEstablished = true what might be correct behaviour. Although I would expect that in this case it would set to false (at least by URLSessionWebSocketDelegate, but there the method didCloseWith does not get called also, even the connection to server is live). Is there a reason, why there is not set the status(.closed) here? Or should there be any URL Session invalidation in there?

Because when the client?.open(completion:...) tries to open the client again, the task will receive and error on line 59:

Optional

  • some : Error Domain=NSURLErrorDomain Code=-999 “cancelled” UserInfo={NSErrorFailingURLStringKey=https://felse.b4a.io/, NSLocalizedDescription=cancelled, NSErrorFailingURLKey=https://felse.b4a.io/}

This error does not occur on the fresh LQ connection during app launch and as there is no error handling after line 506, the code just fall through without setting isConnecting

on fresh app start are the task values the same as on the later open attempt

Printing description of task:
LocalWebSocketTask <9A3BC5D5-…7F05C1967>.<1>

Printing description of encodedAsString:
“{“op”:“connect”,“applicationId”:“coYfu…Ug4p”,“clientKey”:“L8PDq…87gl”,“sessionToken”:“r:5a…baf8”,“installationId”:“7ec2…c9ca”}”

Perhaps the .close() does not let the server know that the socket is closed and that’s why it is being canceled during later .open()…? As I am not experienced enough I am not sure if that is desired or e bug.

Even if I would not worry about .close() and .open() and would just unsubscribe and subscribe again with

try subscription!.query.unsubscribe()

the .send() function receives an error for both unsubscribe and subscribe on line 73/77:

Optional

  • some : Error Domain=NSPOSIXErrorDomain Code=57 “Socket is not connected” UserInfo={NSErrorFailingURLStringKey=https://felse.b4a.io/, NSErrorFailingURLKey=https://felse.b4a.io/}

as if has no error handling, this again falls through:

Printing client booleans shows as previously that socket is established:

testing subscription: Optional(ParseSwift.SubscriptionCallback<Felse.PrsProfile>)
testing LiveQuery isConnected: Optional(true)
testing LiveQuery isConnecting: Optional(false)
testing LiveQuery isSocketEstablished: Optional(true)
testing LiveQuery isSubscribed: Optional(true)
testing LiveQuery isPendingSubscription: Optional(true)

That’s why I believe there is no other way around than solving the socket status. And when the URLSessionWebSocketDelegate doesn’t call didCloseWith (what I understood it cannot when connection is dead or server down) then only manual reset would help, right? In that case I would need to clarify if here bellow line 130 should not be a manual status change to closed or invalidation of the URL:

There can be multiple LiveQuery connections through one socket, closing 1 connection shouldn’t close the socket itself.

The following PR may help with determining the status of a parse server being available:

LiveQuery ping pong will be a great feature to confirm the connection status. Nevertheless without resolving the issue of not being able to restart the LiveQuery/subscription I cannot take much advantage of ping-pong.

I noticed one more thing in Xcode debug navigator… When I launched the app first time today it had one active connection that sends/receives few kB each 10sec. When I uploaded the cloud code, the debug console prints an error and the active connection disappears immediately with send/receive kBs also. Still no notification in the URLSessionWebSocketDelegate that connection was closed:

2021-06-24 11:27:58.758572+0200 Felse[2054:92561] Connection 3: encountered error(1:53)

next launch opens 3 connection where only one is sending/receiving traffic:

When I reset the server / upload cloud code only that one active connection disappeared immediately again with sending/receiving few kBs:

A few seconds later also the two other connections closed with only sending traffic:

So I tried to launch the app third time and again 3 connections were open. After about 95sec of letting the app do nothing the two connections closed again with only sending traffic and the only one active LiveQuery connection remained:

For now I ignore the fact that there are 3 connections active, before the back4app support comes back to me, but I wonder… Should not the URLSessionWebSocketDelegate call didCloseWith when it seems that Xcode knows that the connection was closed?

I’ve looked further into ParseLiveQuery and fixed a bug where a web socket task was being reused after it was closed. I also addressed some of the error handling you mentioned. The fixes are in the PR I mentioned earlier.

You can try out the PR and test out the updated playgrounds:

I can confirm that with the fix I am able to reset LiveQuery successfully and also in the cloud code info log the messages appear immediately now (not sure if Back4App has done anything on their side as I got no feedback yet).

First I tried only to .unsubscribe() and .subscribe() after a broken connection (cloud code upload, server reset,…) but that was not successful as it gives the error for both functions:

Error Domain=NSPOSIXErrorDomain Code=57 “Socket is not connected” UserInfo={NSErrorFailingURLStringKey=https://felse.b4a.io/, NSErrorFailingURLKey=https://felse.b4a.io/}

Just using .unsubscribe() and .subscribe() works well before the connection is broken. So I adopted the .closeAll() function:

ParseLiveQuery.getDefault().closeAll()

Here I noticed are 3 ways to do it:

#1

  1. .unsubscribe()
  2. .closeAll()
  3. var subscription = query.subscribeCallback! with setting completion handlers

#2

  1. .closeAll()
  2. var subscription = query.subscribeCallback! !without! setting completion handlers, otherwise it doubles the event handlers

#3 ← I ended up using this and it works perfectly

  1. ParseLiveQuery.getDefault().closeAll()
  2. ParseLiveQuery.getDefault().open(completion: { error in … })

What would be the correct way from the server side? In the cloud code info log I see that it disconnects even using the second way (not calling .unsubscribe()):

2021-06-26T11:28:31.866Z - Client disconnect: ac12dfda-d1ad-4999-a08d-6effaf448a05

As the connection is already broken at that point I believe there is no advantage of calling .unsubscribe() on the server side, right? But can there be any zombie subscriptions in the LiveQuery server hanging?

Thank you for solving this issue! I believe, now there is a robust way to reset LQ connection without restarting the client app!

I don’t know much about how live query works on the server side to answer what you should do there. My guess is that once a connection is closed the server will discard the subscriptions, but @davimacedo might have more info here.

I’ll point out that now on the client side, after you unsubscribe from all of your subscriptions, it will automatically close the connection.

The second and third way you mentioned seems reasonable to me. It just matters your scenario. If you are subscribing to a new query in scenario 2, it should reconnect and also resubscribe to any previous queries as well. Scenario 3 should reconnect and resubscribe to all previous queries.

Note that you can also use: ParseLiveQuery.client?.open, ParseLiveQuery.client?.openPublisher, ParseLiveQuery.client?.close(), ParseLiveQuery.client?.closeAll()

1 Like

Thank you for clarification. In my case is the #3 the most elegant and working after broken connection (or also with live connection eventually).

As you mentioned I noticed, that .unsubscribe() is closing the connection (if not broken before), what I can confirm in Xcode and via cloud code info log. Just out of curiosity, should not the URLSessionWebSocketDelegate call didCloseWith at that moment? I see in the debug console a line:

2021-06-26 19:20:25.070351+0200 Felse[6674:997878] [websocket] Read completed with an error Operation canceled

But when I put breakpoint in the following lines, the function is not getting called. What I would expect (with my limited knowledge)…

     func urlSession(_ session: URLSession,
                    webSocketTask: URLSessionWebSocketTask,
                    didCloseWith closeCode: URLSessionWebSocketTask.CloseCode,
                    reason: Data?) {
        self.delegates.forEach { (_, value) -> Void in
            value.status(.closed)
        }
    }

When you unsubscribe from all subscriptions or call close or closeAll the socket is being closed from the client side (see Apple documentation for cancel()), not the server side. I’ll defer to the Apple documentation for the delegate, but my interpretation is the delegate method gets called when the server requests to close the connection which will then send a close frame to tell the client to gracefully close:

1 Like

I see, thank you for the patience and great clarification!

1 Like

In addition you can also receive connection metrics on the client side and make decisions from there. You can do that by becoming a receiveDelegate:

http://parseplatform.org/Parse-Swift/api/Classes/ParseLiveQuery.html#/s:10ParseSwift0A9LiveQueryC15receiveDelegateAA0acdF0_pSgvp

And then using received(_ metrics: URLSessionTaskTransactionMetrics)

1 Like

with this snippet:

let client = ParseLiveQuery.getDefault()
client?.receiveDelegate = self

I tried the ParseLiveQueryDelegate as you proposed and it indeed notify that there was a disconnection:

func received(_ error: ParseError) {
    print("received: \(error)")
}

This is received 2x no matter if I subscribe to 1 or 2 query:

received: ParseError code=-1 error=The operation couldn’t be completed. Socket is not connected

Digging a bit deeper in the LiveQuerySocket function…

…I see that the line 97 calls the receive(task) again, what is a bit confusing for me. Perhaps you could enlighten a bit on that. But as it is not related to disconnection I went further on line 103, 104. I did print the error before it gets translated to ParseError (line 103):

Error Domain=NSPOSIXErrorDomain Code=57 “Socket is not connected” UserInfo={NSErrorFailingURLStringKey=https://felse.b4a.io/, NSErrorFailingURLKey=https://felse.b4a.io/}

So the error code that should trigger query reset is 57 “Socket is not connected” and at that point the ParseLiveQuery should close itself (what it doesn’t currently). We can see that the line 104 pass it to the ParseLiveQuery line 473 and then to the receiveDelegate.

Why it gets called 2x I could not understand even with putting a lot of breakpoints. Let’s assume I would ignore the second call with some Bool frag I would like to implement reconnecting feature in the receive delegate. Here I noticed:

  1. that all functions are mandatory, what makes the delegate look like this:
extension ParseService: ParseLiveQueryDelegate {
    
    func received(_ challenge: URLAuthenticationChallenge, completionHandler: @escaping (URLSession.AuthChallengeDisposition, URLCredential?) -> Void) {
        
    }
    
    func received(_ error: ParseError) {
        print("received: \(error)")
    }
    
    func receivedUnsupported(_ data: Data?, socketMessage: URLSessionWebSocketTask.Message?) {
        
    }

    func received(_ metrics: URLSessionTaskTransactionMetrics) {
        
    }

    func closedSocket(_ code: URLSessionWebSocketTask.CloseCode?, reason: Data?) {
        
    }
        
}

Do you think that the functions could be made optional, or it would break some logic?

  1. the func received(_ error: ParseError) handles already translates ParseError and therefore hides the webSocket error code 57. As the ParseError does not have yet the Code 57 I think that it could be added and then passed through this function. Or on the other login on, the function could pass original Error instead of ParseError.

  2. Or maybe the SDK itself could react on the code 57 and try to reconnect, so that the client would not need to implement receiveDelegate to handle this?

What do you think? Thank you!

This is a requirement of URLSessionWebSocketTask. Apple has a video describing how URLSessionWebSocketTask works and there’s a blog that discusses:

This was already implemented, the extension just wasn’t public. The PR below makes it public.

You can test out the branch to see if it works: LiveQuery socket should always continue receiving by cbaker6 · Pull Request #204 · parse-community/Parse-Swift · GitHub

Great! Thank you for clarification!

A I am testing it out, it is calling the self.open(isUserWantsToConnect: false) { _ in } on line 489

But I believe that passing parameter isUserWantsToConnect: false does not set isConnected = false and therefore it stays true (as my debug prints show).

And step by step with breakpoints revealed that the open(isUserWantsToConnect:) returns on the line 540

Calling self.open(isUserWantsToConnect: true) { _ in } feel incorrect, so what if the isConnected would be set to false right before calling self.open(isUserWantsToConnect: false) { _ in }?

func receivedError(_ error: Error) {
        guard let posixError = error as? POSIXError else {
            notificationQueue.async {
                self.receiveDelegate?.received(error)
            }
            return
        }
        if posixError.code == .ENOTCONN {
            if attempts + 1 >= ParseLiveQueryConstants.maxConnectionAttempts + 1 {
                let parseError = ParseError(code: .unknownError,
                                            message: """
Max attempts (\(ParseLiveQueryConstants.maxConnectionAttempts) reached.
Not attempting to connect to LiveQuery server anymore.
""")
                self.receiveDelegate?.received(parseError)
            }
            self.isConnected = false   //<---- setting is connected to false here
            self.open(isUserWantsToConnect: false) { _ in }
        } else {
            notificationQueue.async {
                self.receiveDelegate?.received(error)
            }
        }
    }

One another state that could be covered is the failed self.open(isUserWantsToConnect: false) { _ in } try. Since it has empty completion block and the open(isUserWantsToConnect:) can fail with error, returning that error to empty completion block would not inform receiveDelegate that the reconnection failed. But perhaps this is covered by the status(_ status: LiveQuerySocket.Status, closeCode: URLSessionWebSocketTask.CloseCode?, reason: Data?) protocol function, I will have a look how does that behave…

Let me know how the PR below works:

This recovers the connections successfully. It just don’t work for the case when the server still did not boot up. As in my example when I upload new cloud code, the server hard disconnects and the resumeTask() tries to open the connection again, but I guess that the server is not yet ready (or client?). So in this example, it works only if I hold the execution for few seconds with a breakpoint, giving the server some time.

func resumeTask() {
        synchronizationQueue.sync {
            switch self.task.state {
            case .suspended:
                isSocketEstablished = false
                task.resume()
                URLSession.liveQuery.receive(self.task)
                URLSession.liveQuery.delegates[self.task] = self
            case .completed, .canceling:
                URLSession.liveQuery.delegates.removeValue(forKey: self.task)
                isSocketEstablished = false 
/* -----> */ task = URLSession.liveQuery.createTask(self.url) //<---- Breakpoint 5-10 seconds
                task.resume()
                URLSession.liveQuery.receive(self.task)
                URLSession.liveQuery.delegates[self.task] = self
            case .running:
                isConnected = false
                isSocketEstablished = true
                open(isUserWantsToConnect: false) { _ in }
            @unknown default:
                break
            }
        }
    }

With the help of that vbearkpoint I see in the debug:

Successfully subscribed to new query Inbox ({“limit”:100,“skip”:0,"_method":“GET”,“where”:{“rid”:“3xwiNx3zsU”}})
Successfully subscribed to new query Group ({“limit”:100,“skip”:0,"_method":“GET”,“where”:{“objectId”:{"$in":[“08BWZVHzES”]}}})
2021-08-01 18:24:52.717398+0200 Felse[20203:1835312] Connection 3: missing error, so heuristics synthesized error(1:53)
2021-08-01 18:24:52.717694+0200 Felse[20203:1835312] Connection 3: encountered error(1:53)
Successfully subscribed to new query Group ({“limit”:100,“skip”:0,"_method":“GET”,“where”:{“objectId”:{"$in":[“08BWZVHzES”]}}})
Successfully subscribed to new query Inbox ({“limit”:100,“skip”:0,"_method":“GET”,“where”:{“rid”:“3xwiNx3zsU”}})

But without the break point it does not recover.

The latest commit may help as it should add some delay before attempting to reconnect.

If the problem still is there, I recommend using the delegates to handle your custom situations. If you see a place in the SDK to improve, feel free to submit a PR.

Ah, great, thank you for implementing the delay there. Unfortunately the reconnection interval is too short and it does not help in this case.

I had a look and already the first breakpoint shown that the number of attempts was 4 so I did put the calculation of reconnection interval into the playgrounds and found out that it mostly generates 0 seconds:

for _ in 1...5 {
    //lets try 5x the attempts count 1-5...
    var intervals: [Int] = []
    for i in 1..<5 {
        let min = NSDecimalNumber(decimal: Swift.min(30, pow(2, i) - 1))
        intervals.append(Int.random(in: 0 ..< Int(truncating: min)))
    }
    print(intervals)
}

It seems that this random Int case the reconnectionInterval to often not wait as it gives back 0 seconds:

[0, 1, 2, 14]
[0, 0, 4, 3]
[0, 1, 1, 4]
[0, 0, 0, 13]
[0, 2, 6, 14]

When there is no Int.random(in: 0 ..< Int(truncating: min)) but only Int(truncating: min) it gives back seemingly more reasonable intervals.

[1, 3, 7, 15]
[1, 3, 7, 15]
[1, 3, 7, 15]
[1, 3, 7, 15]
[1, 3, 7, 15]

What is the idea behind random integer there? When I tried to understand the behaviour I noticed that even during the app launch the resumeTask() is getting triggered many times and mainly with a random reconnectionInterval. So I did a fork and tested it with the second and I can confirm that it reconnects successfully when I did upload a new cloud code. I did submit my first PR ever, so let me know if I should adjust anything

For reference for when others see this, linking to your last comment on Github where you mention this is solved: Removing random Int in the reconnection interval of ParseLiveQuery and added warning in playgrounds by lsmilek1 · Pull Request #208 · parse-community/Parse-Swift · GitHub

I am afraid I have to come back to this never ending topic. After further working with live query and cloud code I noticed that the LQ reconnects:

  • always when back4app container goes to sleep. It wakes it up again and reconnects
  • only some times (actually almost never) when I upload cloud code and the container restarts.

Implementing receiveDelegate I can see that the delegate receives only one error that seems to have consistent 4 time(s) count

ParseError code=-1 error=ParseLiveQuery Error: attempted to open socket 4 time(s)

And as previously mentioned, if I use breakpoint and hold the execution a bit, it reconnects successfully always. So I still suspect that there is some code race removing the task delegate and quitting the reconnection look. Shouldn’t the the completion on line 587 be called inside completion block of the self.resumeTask { _ in } on line 583?

so that the loop waits on potential task creation (236-241)? I’m not sure about the default break though.

Other improvement could be to not set reconnection interval to 0, but that is a bit hacky as we discussed.

Can you try out:

Update: SDK version 1.9.4 should address the reconnection issues: