A definitive topic for the "Followers"/"Following" problem

BobyIlea · December 4, 2024, 1:41pm

After searching for best practices for the classic “Followers”/“Following” problem using Parse I ended up with more questions than answers. So, I am hoping that by listing here all the potential solutions we can come up with a definitive summary of the upsides and downsides of every approach and figure out which would be the best way to implement this very common scenario.

I have to say that I am not a database expert, so some of my assumptions might be stupid, but that would hopefully help in clearing things up by people who understand much much more than I do.

So, from what I found so far there are a few approaches that can be used to implement a system like this, basically an Instagram clone with potential hundred of thousands or even millions of users, followers and followees.

1. Put everything in the _User table

This is probably the worst option and I don’t expect many explanations as to why, I think most people agree to this. Nevertheless, let’s explore the ways this could be achieved:
a) Use two Array columns

Table _User {
/* Everything that comes out of the box */
followers: Array<String(objectId)>;
following: Array<String(objectId)>;
}

b) Use two Relation columns

Table _User {
/* Everything that comes out of the box */
followers: Relation<_User>;
following: Relation<_user>;
}

2. Create a “One user one line” type of relation table

I know that “One user one line” sounds like a very noob-ish way to put it, but I believe it’s an easier way for people to understand. Anyway, this would looks like this:

Table Following {
user: USER_TYPE;
followers: FOLLOW_TYPE;
following: FOLLOW_TYPE;
}

Now, here we have a few options for the USER_TYPE and FOLLOW_TYPE:
a) USER_TYPE: String(objectId) and FOLLOW_TYPE: Array<String(objectId)>
b) USER_TYPE: String(objectId) and FOLLOW_TYPE: Relation<_User>
c) USER_TYPE: Pointer<_User> and FOLLOW_TYPE: Array<String(objectId)>
d) USER_TYPE: Pointer<_User> and FOLLOW_TYPE: Relation<_User>

3. Create a “Multiline” type of relation table

Table Following {
user: USER_TYPE;
follower: FOLLOW_TYPE;
following: FOLLOW_TYPE;
}

a) USER_TYPE: String(objectId) and FOLLOW_TYPE: String(objectId)
b) USER_TYPE: String(objectId) and FOLLOW_TYPE: Pointer<_User>
c) USER_TYPE: Pointer<_User> and FOLLOW_TYPE: String(objectId)
d) USER_TYPE: Pointer<_User> and FOLLOW_TYPE: Pointer<_User>

Conclusion

I think there is a clear difference between option 1. and the other two.
I also think it is clear how the coding would differ if we went with either options 2. or 3.
But, it is not clear for many people what are the downsides and upsides when choosing between options 2. and 3. when it comes to performance.

Also, going a bit deeper, which is the better/most performant approach of the a), b), c) and d) “sub-options” when going for options 2. and 3.? Topics like this one(a fairly recent one) Querying a “Following” Table seem to introduce even more confusion as it implies that using Pointer <> type, even if at first glance it seems to be the preferred type in Parse, is less performant than simply querying against strings of objectIds.

Manuel · December 4, 2024, 1:46pm

Data schema design heavily depends on how you will later query the data. You’d want to optimize for data extraction costs. Then there are secondary considerations like security (e.g. don’t mix different data tiers in a single class like user login credentials and follower information like in your first suggestion).

BobyIlea · December 4, 2024, 1:59pm

Yep, that is why I mentioned from the start that I’m mostly focusing between the second and third options. I think that part is clear. What’s not clear though is what are the advantages and disadvantages between using, for example parse pointers or direct "objectId"s as strings. I know that the code design impacts this decision, but if you’re at a point where your architecture is flexible(basically during the design phase, not during implementation), towards which solution would you gravitate?

Same goes for the use of Parse.Relation vs. manually mimicking a join table.

Manuel · December 4, 2024, 3:03pm

If you use a Pointer, then it’s easy to reference the document in queries. If you use a field of type String, then you’d have to do the conversion manually. A Pointer takes slightly more space for data storage and transfer as well, as it’s of syntax <CLASSNAME>$<OBJECTID>, as you can see in the database. When using Parse Dashboard, you can simply click on a Pointer and it will take you to the document, which wouldn’t be the case for a String.

BobyIlea · December 4, 2024, 3:32pm

Got it. So, in term of querying speed there shouldn’t be a notable difference between doing a pointer-based query or manually feeding the strings for “objectId” and “className” to a query. But we pay some extra storage and network traffic for the benefits we get when coding or browsing through the dashboard.

The reason why this got me confused, even though I was expecting this to be the case, is because in the JS documentation, in the Performance section, there is this mention:
Screenshot 2024-12-04 at 17.16.07
This kind of states that performance-wise Pointers are better than Strings. Or I’m getting the statement completely wrong…

And what about my #2 option, the Parse.Relation use case?
For the ‘simple’ operations of get ALL followers/following, follow/unfollow someone it seems to me that you would need at least 3 columns for the best query performance using Parse.Relation(the user column, the following column and the followers column).
In this case every time someone follows/unfollows you would have to alter two lines in the table while option #3 would require the removal of just one table line.
But again, my main concern is not actually how the coding would differ in complexity and style, but rather what approach, with the correct code behind it, has the potential of maximum performance.

Thanks for the answers so far, much appreciated!

Manuel · December 5, 2024, 1:42pm

The part in the documentation that you’re referring to is related to database index effectiveness. It simply means that when you create a database index, you’d want to use fields with high entropy, i.e. a high number of different values.

Index creation is a topic on its own, but in short: an index is a sorted representation of your data, like a phone book where people are ordered by names. Names have a reasonable entropy to make it practical to find the person you’re looking for. If the phone book was only ordered by district, it would be more difficult to find a person, as there are fewer districts than names, therefore district having a lower entropy. In other words, the database engine would need to search through more records to find a person, increasing the metric “browsed records per returned records”, resulting in higher query costs.

Regarding Relations there is a significant caveat, which is that you cannot use Parse.Query.include and filter by fields of the included object, all in a single query. Getting the object of a Relation requires an additional query.