How a rating system is and how it should be - my personal thoughts. And shortcomings of Tophs existing system

It’s been a while since I have seen a rating system being added in Toph. I guess it’s an attempt to make Toph more like a sports programming platform. However, a question arises what makes a platform a good sports programming platform. While the answer may vary from person to person I also have my own version of the answer. In my opinion, a sports programming platform should have the following traits:

A. It should host its own set of regular contests.
B. It should have a good rating system.

Very common right? Well, that’s all I think.

Actually, even though it seems easy, continuous hosting regular contests with a decent problem set keeping users engaged in the community are not as easy as it seems. Especially a good rating system is important cause it reflects the amount of time and effort a contestant has spent behind the platform.

Well yes, my real intention behind this thread is to talk about the rating system in Toph. Now since we just had a big intro it’s time to dive a little deeper into the actual topic. How a rating system should be. To get that answer let’s first ask how rating systems are. In chess, the rating systems are constructed by keeping in mind certain common facts:

  1. If you win a game your rating should increase while if you lose a game it should be decreased.
  2. Defeating a stronger opponent is of more merit than defeating a weaker opponent. Thus you would get more ratings by defeating stronger opponents than defeating a weaker opponent.
  3. No. 2 can also happen oppositely. If you lose to an opponent whose rating is less than you more rating than what you would lose to a stronger opponent.
  4. Just because you have defeated someone stronger than you once doesn’t mean you should be rated higher than your opponent. Cause a rating only reflects the chance of a player winning. It doesn’t mean that a rating is absolute.
  5. Gaining a higher should be exponentially harder.

These are generally the basic rules when developing a rating system in chess. Now in competitive programming how a contestant should be evaluated? In my opinion, it should depend on Statistics.

  1. If you solve more problems than average then it means that you should gain some points. If you solve less your points will be deducted.
  2. It is expected that you perform as your rating. If you can’t perform better than a person rated less then your rating should decrease. If you perform better than expected then your rating should also increase.
  3. Getting higher points should be exponentially harder. So, the more you are rated the less you get points. And It’s easier for you to lose points.

But in every system, there is a problem that is discussed heavily. The question is simple. How would you judge a new user against the system of whose you have no data you have read until here you should have guessed that a rating is supposed to reflect likely it is for a user to perform better or worse than another user. It’s like probability. So, the more data you have, the better the prediction. And the less data you have the more the uncertainty. Now how will you judge a new user? Typically it is done in the following way. Since we don’t have any data about a new user the system assigns a less than average rating to the user and tries to calculate the rating Against the existing users. But again since we don’t have enough data a new user gains and lose more points by doing better or worse than similarly rated users until a certain amount of data about the user is collected. What I mean by that unless the user has played a certain amount of chess games or has done a certain amount of contest.

Yes here is where the current rating system is lacking. The system just gives an insane amount of rating to a new user even if he/she had not attended enough contests. Although it being uncertain exactly how good a new user is just by analyzing a very short amount of data Against a short amount of user.

Anyways, I think the Tophs rating system should evolve to provide a better competitive environment for Its users.

@touhidur I don’t understand your point.

You mentioned that in your opinion ratings should depend on statistics and that:

… the more you are rated the less you get points. …

Which can also be interpreted as: if someone has a lower rating, he/she should be rewarded more.

But then you are saying:

The system just gives an insane amount of rating to a new user even if he/she had not attended enough contests.

Do you mean new users should not be rated until they have participated in a predefined number of contests? If so, why (in a lot less words)?

I should have said it like this. The less you are rated it should be easier to increase your rating. Cause in real-life situations it’s harder to improve your ability exponentially. So, a rating system should reflect this.
For example, assume that 100 students attended an exam. Now 30 of them got more than 80. So what will be the number of students getting above 95? If you want to calculate this linearly you can do something like (30 * 80) / 95 which is ~25. Here we assumed that the number of students getting a mark linearly decreases depending on the increased mark. However, in real-life situations its not true. In real life, most 5 students would get such a mark which is actually not linear but exponential.

Let’s think this another way. Assume that a student got 60 in the previous exam and wants to get 70 in the next exam. Another student got 89 in this exam so he wants to get a 99 in the next exam. So, both of them want to increase their mark by 10. But if you think about the amount of effort they would need to put in to achieve this in real life then the effort needed to increase your mark from 60 to 70 and the effort needed to do it From 89 to 99 is not equal. In fact, the amount of effort required to do that doesn’t even increase linearly but exponentially. If the 1st student needs to study 2 hours more in a day then the second student would need to study 8 hours more a day to achieve the same increase in mark.

That is what I mean by getting more rating should be exponentially harder. If you have ever played a competitive game with a rating system you might have seen this already. For example PUBG, Fortnite every popular game Does it.

Another reason to do it is of course evil. Its to make the users spent more time behind the product. You will see that you only need to get a few points to achieve another tier or title not knowing that you have to put exponentially more effort and time to get that

Anyways, my reason to say that gaining more rating should be exponentially harder was to make the system seem more realistic.

Well yes. For example, some popular chess sites don’t place a new user in the leaderboard unless he/she has played at least 20 games or so. In the site I play, they have a more complex way to do that. They literally have an uncertainty value for every user (they call it Ratings Deviation. Which is a mathematical assumption of how your rating deviates from your absolute rating). If this uncertainty value is not lower than a certain predefined amount then you would not be placed on the leaderboard.

Glicko-2 does this better. Which is what Toph uses.

https://help.toph.co/toph/rating-system/

And your suggestions (other than not showing new entries on the leaderboard) are already covered by this algorithm.

I was actually talking about Glicko-2 when I mentioned uncertainty vaule or ratings deviation. However, Glicko-2 suggests that users should not be included in the leaderboard unless their uncertainty value is low. At least the implementation I was talking about did it. Well although I personally think that Glicko-2 is not perfect for programming platforms.

Let’s talk about Toph. If some new user ranks 1st in the leaderboard should you place him in the top of the leaderboard immediately? At least every the people in top in Toph leaderboard all have one or two contest records. CF has a better implementation. The give a dummy average rating to new users. (It’s something like 1400 maybe) and then increase or decrease it depending on activity. Maybe it uses ELO. Yes continuously doing good will certainly put you on top at a point but you need to have a minimum amount of contests before that. At least that’s what I see CF doing. It more like the International chess rating system.

Anyways, Tophs implementation of Glicko-2 feels somewhat weird.

Just to make sure, do you confirm that Toph is already doing what you were trying to suggest about using statistics to calculate rating in your very first post?

I still don’t get your point:

Are you suggesting that Toph should not use Glicko-2? Should Toph use Elo? Or, some other algorithm?

Or, should Toph just hide some people from the top of the leaderboard?

My personal thought is Elo is a better system than Glicko-2 in programming contest. Well, Glicko-2 is also an interesting system.

About the current rating system of Toph. I was just saying that how all Top rankers are insanely rated by doing only one or two contests. And how Tophs implementation of Glicko-2 feels a little weird and if Toph wants to use it, it should carefully think how to do it. I just pointed out the things I thought were not right. That’s why I put this thread in the Social category for people to discuss.

When the rating system was first implemented I thought that it was only an average of a user’s contest performance. That’s why users who had one or two good contests were on top. So, I created a thread on the social category about my ideal rating system. Thanks.

But, both are essentially the same rating system: both converge to the same right rating. A difference between Glicko and Elo is that Glicko reaches the right rating faster.

The social category is meant for getting to know people in the community. Topics for features and bugs have dedicated sections that everyone also sees and can discuss about there.

Hey, @hjr265! I got an idea / suggestion today. Yesterday I was talking about uncertainty value or ratings deviation (RD). Now what exactly this RD means? If your RD is 300 and Rating is 1700, it would mean that your real rating is assumed to be somewhere between 1400 and 2000. The more you do contest and the more frequently you do contests the less the value of your RD becomes. Less RD means more stable Rating.

Now yesterday I was also talking about how to treat new users. New users or irregular users would mean their Ratings Deviation would be high. Now instead of hiding them form leaderboard which I didn’t want to suggest cause everyone wants see them in the leaderboard I am suggesting that the rating displayed in the leaderboard to be the least assumed rating assumed by the Glicko-2. So, if a persons rating is 1700 and RD is 200, the displayed rating would be 1500.

And it would not be only for new users but for every user. Cause the users who do contests more frequently would naturally have less RD. I will leave it to the Tophs consideration.

1 Like

That is a good suggestion! Let me try this out locally and see what the leaderboard would look like.