New rating formula
Rating is a good way to determine how good a video is by looking at other peoples rating. The problem is bias. For example, one new video with just one upvote has 100%, but it is probably worse than a video with 9 upvotes and 1 downvote(90%).
I sugest a new formula that I have been working on. Although the proofing is complex the resulting formula is quite simple.
rating = (#likes+1) / (#likes+#dislikes+2)
(Just multiply by 100 to get in percentage)
I run some simulations, and this system improves the rating accuracy SO much. If someone is not convinced please, test it.
-
Adminlynx (Admin, xHamster) commented
@Ricardo
Thank you for the suggestion - duly noted and sent to the Product team.
Yours,
xHamster
-
Ricardo commented
Laplace was once confronted with the question:
"How to estimate the probability that the sun will rise tomorrow?"
Even though the sun has rised for as long as we can remember, we can't infer that the probability of the sun rising tomorrow is 100%.
This problem is know as "Sunrise Problem", and the solution is exactly the formula: (k+1)/(k+2), which is a specific case of the formula I recommended xhamster starts using.
The most general formula form is known as "Additive Smoothing", or Laplace Smoothing, and is supposed to remove sampling error when the amount of data is small.
Estimating video ratings wrongly is very damaging to the xhamster community, since videos with 2-3 likes and no dislikes overshadow videos with a good like/dislike ratio.
Since xhamster has a huge collection of videos, which implies that a lot of them will have less like/dislike data input, it is important to give the best estimator given the limited data. -
Ricardo commented
I did a python code to experiment the formula.
I selected 1000 random videos (videos from number 7169000 to 7170000)
The results of the scores are in the pastebin:ps: If the video seems to have an inconsistent rating it is because it has inconsistently high number of likes, the formula just gives the best rating estimate given likes/dislikes data.