Complex queries and need for ElasticSearch

Hello,

please pardon my limited experience as this is my first project with Parse Server and in programming world overall (a mechanical engineer here…).

I am trying to build mobile app that would connect people mainly for language tandem, fellow travellers, compatriots in foreign country and also allow to date (as we all know Tinder). Visual side is pretty defined and you might have a look on the prototype here: Felse

I decided to go with Parse to not lock the project in Google Firebase and as I am already at the phase where I need to define back-end and freeze the data structure I would like to kindly ask you for your opinions and experience. If such topic should be opened elsewhere I apologise, feel free to point me on other forums.

As you might noticed there are multiple types of search and each has a little different criteria. Apart from the “travel” one I got it working and it is also fetching data fast enough when I have only a first 10 fake profiles in my prototype database. I constructed following query function in swift (Xcode):

func fetchProfiles(completion: (([Profile]) -> Void)? = nil) {
    
    guard let userController = self.userController else { return }
    let currentProfile = userController.currentProfile
    let currentUser = userController.currentPrivateUser
    let currentGender = Gender(rawValue: currentProfile.gender)
    guard currentGender != .notDefined else {
        if let topVC = UIApplication.topViewController() {
            UIAlerts.setGenderAlert(viewController: topVC)
        }
        return
    }

    //create basic query with values that are common for all specific queries
    var basicConstraints = [QueryConstraint]()
    //restrict age to prefered range
    basicConstraints.append(contentsOf: ["age" > currentUser.minAge, "age" < currentUser.maxAge])
    if currentProfile.sameSexualityOnly {
        basicConstraints.append("orie" == currentProfile.orientation)
    }
    //if user prefers only same sexuality partners, set criteria for the inversed values so that it finds also profiles with not specified partners orientation (nil)
    for sexuality in SexOrientation.allCases {
        if sexuality.rawValue != currentProfile.orientation {
            basicConstraints.append("pOri" != sexuality.rawValue)
        }
    }
    //TODO: add a constraint that prevents fetching the same profile over an over again
    //perhaps "objectId" notContainedBy, but then the profile would need to store a huge array of user connections/interactions/"swipes"
    let basicQuery = PrsProfile.query(basicConstraints)
    
    //create specific queries for each search type (this should extend to 6+ later)
    var specificQueries: [Query<PrsProfile>] = []
    // ----- date partners -----
    //TODO: add check --> only if location is defined
    if currentProfile.date {
        var constraints = [QueryConstraint]()
        //look for people that are looking for other gender or notDefined. If field "date" is nil, then the users do not look for date
        constraints.append(currentGender == .woman ? "date" < Gender.man.rawValue : "date" > Gender.woman.rawValue)
        //if device user specify certain gender, look only for users of that gender
        if currentProfile.dateData != Gender.notDefined.rawValue {
            constraints.append("gndr" == currentProfile.dateData)
        }
        //restrict search only within given geoBox. Query type "near" cannot be used in combine query (must be top query)
        //TODO: investigate performance difference between "near" and "withinGeoBox"
        constraints.append(withinGeoBox(key: "loc", fromSouthWest: ParseGeoPoint(latitude: 45, longitude: 45), toNortheast: ParseGeoPoint(latitude: 55, longitude: 55)))
        //if user is interested in one-night-stand only, filter that field also
        if currentProfile.ons {
            constraints.append("ons" == true)
        }

        //append query to array for combined query call
        let dateQuery = PrsProfile.query(constraints)
        specificQueries.append(dateQuery)
    }


    // ----- language tandem partners -----
    if currentProfile.tandem, currentProfile.learnLanguages.count > 0 {
        var constraints = [QueryConstraint]()

        //look for people that are looking for other gender or notDefined. If field "tndm" is nil, then the users do not look for language tandem partners
        constraints.append(currentGender == .woman ? "tndm" < Gender.man.rawValue : "tndm" > Gender.woman.rawValue)
        //if device user specify certain gender, look only for users of that gender
        if currentProfile.tandemData != Gender.notDefined.rawValue {
            constraints.append("gndr" == currentProfile.tandemData)
        }
        //check for people speaking or learning any of the language that device user is learning. It has to ne "or" to fetch user that speak at least one of nativeLanguages
        var languageQueries = [Query<PrsProfile>]()
        for language in currentProfile.learnLanguages {
            languageQueries.append(PrsProfile.query(containsAll(key: "nlg", array: [language])))
            languageQueries.append(PrsProfile.query(containsAll(key: "slg", array: [language])))
        }
        constraints.append(or(queries: languageQueries))

        //append query to array for combined query call
        let tandemQuery = PrsProfile.query(constraints)
        specificQueries.append(tandemQuery)
    }


    // ----- travel buddy query -----
    if currentProfile.travel, currentProfile.tripsGeos.count > 0 {
        var constraints = [QueryConstraint]()

        //look for people that are looking for other gender or notDefined. If field "trvl" is nil, then the users do not look for travel tandem buddy
        constraints.append(currentGender == .woman ? "trvl" < Gender.man.rawValue : "trvl" > Gender.woman.rawValue)
        //if device user specify certain gender, look only for users of that gender
        if currentProfile.travelData != Gender.notDefined.rawValue {
            constraints.append("gndr" == currentProfile.travelData)
        }
        //go through current user travel destinations and search for users that hase same Geohash in profile and:
        // 1) end date of their trip is larger than current user start date
        // 2) start date of their trip is smaller than current user end date
        //that guarantee to fetch only people that intersect with current user trip
        var travelDestinationsQueries = [Query<PrsProfile>]()
        for geoHash in currentProfile.tripsGeos {
            let startDateString = geoHash.geohash + String(describing: geoHash.startDate)
            let endDateString = geoHash.geohash + String(describing: geoHash.endDate)
            //TODO: How to fetch?
            //profile can have two arrays "startGeos" & "endGeos"
            //but then I would need to combine somehow "hasPrefix" with "largerThan"/"smallerThan"
            //Saving tripGeos as a separate object type is an option, but assuming that each user has 5 tripGeos in average, this creates another huge table eventually
        }

        //append query to array for combined query call
        let travelQuery = PrsProfile.query(constraints)
        specificQueries.append(travelQuery)
    }


    // ----- compatriots query -----
    // TODO: guard that user is not living in a country where his native language is official language --> could be misused for more match count
    if currentProfile.compatriots, currentProfile.nativeLanguages.count > 0 {
        var constraints = [QueryConstraint]()

        //look for people that are looking for other gender or notDefined. If field "comp" is nil, then the users do not look for compatriots
        constraints.append(currentGender == .woman ? "comp" < Gender.man.rawValue : "comp" > Gender.woman.rawValue)
        //if device user specify certain gender, look only for users of that gender
        if currentProfile.compatriotsData != Gender.notDefined.rawValue {
            constraints.append("gndr" == currentProfile.compatriotsData)
        }
        //check for people speaking same native language. It has to ne "or" to fetch user that speak at least one of nativeLanguages
        var nativeLanguageQueries = [Query<PrsProfile>]()
        for language in currentProfile.nativeLanguages {
            nativeLanguageQueries.append(PrsProfile.query(containsAll(key: "nlg", array: [language])))
        }
        constraints.append(or(queries: nativeLanguageQueries))
        //restrict search only within given geoBox. Query type "near" cannot be used in combine query (must be top query)
        //TODO: investigate performance difference between "near" and "withinGeoBox"
        constraints.append(withinGeoBox(key: "loc", fromSouthWest: ParseGeoPoint(latitude: 45, longitude: 45), toNortheast: ParseGeoPoint(latitude: 55, longitude: 55)))

        //append query to array for combined query call
        let compatriotsQuery = PrsProfile.query(constraints)
        specificQueries.append(compatriotsQuery)
    }

    guard specificQueries.count > 0 else {
        print("cannot run combined query on empty Queries array")
        return
    }

    //generates combined query where it returns profiles that match any of the 4 query types
    let combinedSpecificQuery = PrsProfile.query(or(queries: specificQueries))
    
    //creates final query
    let finalQuery = PrsProfile.query(and(queries: [basicQuery, combinedSpecificQuery]))
    print("---> \n \(combinedSpecificQuery) \n <----")
    finalQuery.find { result in
        switch result {
        case .success(let profiles):
            print("parse profiles found: \(profiles)")
        case .failure(let errror):
            fatalError(errror.localizedDescription)
        }
    }
    
    
}

printing the query shows the length of the request that I do not find a huge:

Query(method: “GET”, limit: 100, skip: 0, keys: nil, include: nil, order: nil, isCount: nil, explain: nil, hint: nil, where: ParseSwift.QueryWhere(constraints: ["$or": [ParseSwift.QueryConstraint(key: “$or”, value: [ParseSwift.OrAndQuery<Felse.PrsProfile>(query: ParseSwift.Query<Felse.PrsProfile>(method: “GET”, limit: 100, skip: 0, keys: nil, include: nil, order: nil, isCount: nil, explain: nil, hint: nil, where: ParseSwift.QueryWhere(constraints: [“ons”: [ParseSwift.QueryConstraint(key: “ons”, value: true, comparator: nil)], “date”: [ParseSwift.QueryConstraint(key: “date”, value: 0, comparator: Optional(Comparator(stringValue: “$gt”, intValue: nil)))], “gndr”: [ParseSwift.QueryConstraint(key: “gndr”, value: 0, comparator: nil)], “loc”: [ParseSwift.QueryConstraint(key: “loc”, value: ["$box": [GeoPoint ({"__type":“GeoPoint”,“longitude”:45,“latitude”:45}), GeoPoint ({"__type":“GeoPoint”,“longitude”:55,“latitude”:55})]], comparator: Optional(Comparator(stringValue: “$within”, intValue: nil)))]]), excludeKeys: nil, readPreference: nil, includeReadPreference: nil, subqueryReadPreference: nil, distinct: nil, fields: nil)), ParseSwift.OrAndQuery<Felse.PrsProfile>(query: ParseSwift.Query<Felse.PrsProfile>(method: “GET”, limit: 100, skip: 0, keys: nil, include: nil, order: nil, isCount: nil, explain: nil, hint: nil, where: ParseSwift.QueryWhere(constraints: [“trvl”: [ParseSwift.QueryConstraint(key: “trvl”, value: 0, comparator: Optional(Comparator(stringValue: “$gt”, intValue: nil)))]]), excludeKeys: nil, readPreference: nil, includeReadPreference: nil, subqueryReadPreference: nil, distinct: nil, fields: nil))], comparator: nil)]]), excludeKeys: nil, readPreference: nil, includeReadPreference: nil, subqueryReadPreference: nil, distinct: nil, fields: nil)

I understand that this is very broad topic and there might be many ways how to achieve desired behaviour. The reason why I ask is to prevent any “technology debt” and my project falling apart after it reaches certain number of users (I invested 2 years of my free time in it so far). To make long story short here comes my questions:

  1. Any hints on how to search through the “tripGeos” and return only profiles that has intersection with my tripGeo (geohash, startDate, endDate)?

  2. how to prevent refetching the same profiles again and again? I would need to pass an Array to basicConstraints.append(notContainedIn(key: <String>, array: <[Encodable]>)) but this Array might get 1000+ objectId after some time

  3. Is it realistic to use such query on database that would grow larger? it works somehow now when there are 10 entries in the database table. Has anyone experience with 10.000 or 100.000 or more entries?

  4. As I am a beginner I can’t clearly decide if I should implement ElasticSearch to handle this functionality - as this adds a lot of complexity (I am currently using back4app) and I have no experience with ElasticSearch

Any opinion or comment is welcome!

Thank you kindly!

Congrats on you app! While I wont directly answer your questions, I will share my experiences that I gained last 2 years.

I’m developing a social media app. You can see it on Google Play Store. While I developed this app I also worked on complex queries.

I should say, Parse doesn’t use best way on queries. For example parse has matchesQuery option.

Lets say you have users and users can create Trips. You will have two tables. _User and Trip. And Trip table will also have field called user. So you can know which user created the trip.
And let say users can set their profile to private. So only their friends will know the upcoming trips.
Imagine you want to search trips. And you can only show trips from public profiles. How do you do it?

In parse way, you create 2 query.

//First create User Query
const userQuery = new Parse.Query("_User");
userQuery.equalTo("is_public",true); //You only get public profiles

//Then Create Trip Query
const tripQuery = new Parse.Query("Trip");
tripQuery. matchesQuery("user", userQuery);

This will work at the beggining. But everytime you run this query, all users with public profile will be fetched. İf you have couple of hundred users this will work but what if you have 100k users? Query will timeout. You will tire server for nothing. So for large database this is not ideal solution.

My solution to this is making my database similar to Graph databases. Which means every objects has relevant information from relative objects.
Which means Trip objects know if User profile is public or not.

You can do this on beforeSave trigger.

Parse.Cloud.beforeSave("Trip", async (request) => {
  if(!request.original){
    const user = request.user;
    const trip = request.object;
    trip.set("is_public",user.get("is_public"));
    return trip;
  }
});

This way you can re-design your query. And final query will look like this:

const tripQuery = new Parse.Query("Trip");
tripQuery.equalTo("is_public",true);

This query doesnt fetch every users. And only returns relevant objects and its fast.

My second advice is move your app logic to cloud code. For an app like yours, I think moving app logic to cloud code is a must. I moved my logic to cloud code. And if something doesnt work or I want to change app behaviour, I just update cloud code and users dont have to update app.

Uzaysan,

many thanks for your answer. It gives me valuable inputs to consider. I am already trying to embed all necessary data in to the object to avoid query on multiple object types. As I am reading through the Parse and MongoDB documentation I have not found if it is possible to solve the travel buddy query with Parse without relation to other objects…

  • according to MongoDB documentation for nested documents and array it seems possible to do a query on embedded objects in array with “lt”, “gt” and combine these also in “lt” AND “gt”…
  • Unfortunately I have not found any example in Parse documentation on this and thus is for me pretty difficult to guess the swift (client side) or javaScript (cloud code) syntax

If there would be such query option I could make the scheme like this:

PrsProfile {
"various fields" : ... ,
"startGeohash": [String],
"endGeohash": [String],
}

if I would set geohash to string combination of geohash and timestamtp, let’s say user has:

“startGeohash”: [axyz_2021-04-01 12:00:0, bxyz_2021-04-01 12:00:00, cxyz_2021-07-01 12:00:00]
“endGeohash”: [axyz_2021-04-10 12:00:0, bxyz_2021-04-10 12:00:00, cxyz_2021-07-10 12:00:00]

and my profile has:

“startGeohash”: [bxyz_2021-04-04 14:00:00]
“endGeohash”: [bxyz_2021-04-05 19:00:00]

and then use string comparators:

query.find( { startGeohash: { $elemMatch: { $gt: “bxyz”, $lt: “bxyz_2021-04-05 19:00:00” } } , endGeohash: { $elemMatch: { $gt: “bxyz_2021-04-04 14:00:00”, $lt: “bxyz_2021-04-04 23:59:59” } } } )

limiting comparison range by geohash set as prefix should check, if the array contains such geohash and than also confirm that my startDate is lower than the other user’s endDate AND my endDate is greater than the other user’s startDate, returning only profiles intersecting the dates I will be in that destination.

As I have not found any embedded document queries in Parse documents I am not sure how to build such query or if it is even possible. Again, any hint are highly appreciated!

After some investigation I found that it is indeed possible to build such complex query with above mentioned “startGeo” and “endGeo.” Unfortunately it seems that there is no way how to build efficient indexes on such query and using .explain() I get at the best only 2-50% of scanned documents returned. This seems to be not a good scenario for a 1+ million user scale, even if I limit query result count.

Has anyone experience with implementing ElasticSearch and could give me a hint if that could perform better or more flexible for future adaptations?