# What dictates popularity in search results?



## Konda (Jan 29, 2011)

When you use the search function on FA you can choose to display results by relevancy, popularity, and date. I've been using popularity a lot as I found that it's reasonably accurate (taking into account faves/views) but I wasn't able to figure out the exact criteria the site goes by in determining the order of search results.

Does anyone know? One of the FA staff perhaps?


----------



## Monster. (Jan 29, 2011)

It's usually based on how many views, favs, comments, etc. that the piece has gotten (I think). There's many factors to why something would be more popular than others, I'm sure.


----------



## LizardKing (Jan 29, 2011)

After looking at the favs/comments/views of the top 7 results in a search, it certainly seems to take all 3 into account. I can't figure out a weighting system that gets them to line up though, unless it's using the dates as well, though that seems unlikely as #6 is a year older than #7, while having only 3 more favs but 15 less comments and ~500 less views.


```
RANK	FAV	COMM	VIEW
1	1248	127	32580	
2	1759	185	15841
3	991	123	14688
4	908	257	8786
5	1034	168	8435
6	592	62	11271
7	589	77	11768
```

The closest I could get was something like Fav * 20 + Comment * 10 + views.


----------



## Konda (Feb 18, 2011)

That equation is not correct. But back when I first started using popularity search I tried to figure out the formula myself, and this is what I ended up with:
[views + (faves x 5) + (comments x 5) = popularity]

It wasn't correct either, although like yours I felt it was in the ball park.
Dunno what causes some of the submissions to appear "out of place".
There must be some reason for it, but what...


----------



## Summercat (Feb 18, 2011)

The magic 8-ball. It, combined with Captcha, tells the server what to do.

In all seriousness? No clue, but posting this for Yak. n.n


----------



## yak (Feb 18, 2011)

FA uses sphinx for it's search backend, the documentation for which can be found here: http://sphinxsearch.com/docs/manual-0.9.9.html#sorting-modes.
The sorting mode in use is SPH_SORT_EXTENDED (no strategically places hole jokes please), which allows me to write an expression that produces a number, by which the search results end up being ordered by. The expressions are the following:

"date" = date
"popularity" = views*0.5+favorites*2+comments*4
"relevance" = @relevance + ln(views*0.5+favorites*4+comments)*100

Admittedly, there was little thought behind those formulas. They were written without any kind of statistical analysis, by trial and error, until they more or less worked.  Feel free to suggest a better replacement.
The variables that can take part in the expression are the following:
* relevance
* user_id
* favorites
* comments
* views
* width
* height
* adultsubmission (0-general, 1-mature, 2-adult)
* type (3-flash, 4-music, 5-story, 6-poetry, 2-photography)
* date
* isscrap

You can also suggest new sorting modes if you like.


----------



## BRN (Feb 18, 2011)

Yak, would it be possible to work with ([Number of days material has been uploaded]/{Number of views}) for view 'density', then [(View density)*{Matched tags/2}] for relevance? Popularity (Comments and favourites) shouldn't come into relevance.


----------



## LizardKing (Feb 18, 2011)

yak said:


> "popularity" = views*0.5+favorites*2+comments*4



I think that would be better if the views were weighted even lower, maybe down to 0.2 or 0.1, since most seem to get at least 10 times as many views as anything else, or even more. Simply getting something seen is far easier than getting people to favourite or comment on it. 

Being able to search by size might be cool, to look for suitable candidates for desktop backgrounds :3



SIX said:


> Popularity (Comments and favourites) shouldn't come into relevance.


 
I think it should be there to at least some degree. No one wants to see a bunch of 'relevant' results that's just some crap in MS Paint with 4 views.


----------



## yak (Feb 18, 2011)

"relevance" is a dynamic variable on it's own, calculated by sphinx, that reflects how closely the result matches the search terms provided. With the matching modes FA uses it is assumed that all search results match the search terms at least somehow. I enhanced that number by a small fraction (~10% max deviance) with a product of favorites, views and comments to make older works get a bit of priority. 

But I understand what you're saying. Yes, it's possible to do what you're saying and I'll put that in as "relevance2" for now.


----------



## Bluefluffy (Feb 10, 2022)

yak said:


> FA uses sphinx for it's search backend, the documentation for which can be found here: http://sphinxsearch.com/docs/manual-0.9.9.html#sorting-modes.
> The sorting mode in use is SPH_SORT_EXTENDED (no strategically places hole jokes please), which allows me to write an expression that produces a number, by which the search results end up being ordered by. The expressions are the following:
> 
> "date" = date
> ...


Hmmm, personally the way popularity is calculated is still rather lacking, I see posts with hard any views on like page 3, it's good for the artist but like, the art itself wasn't what I would call super good. Personally, I would do it more like this:

Let's go for something that separates the big viewed ones from the little-viewed ones like you might have a post of 60 views and 10 favs, it's great, but it doesn't mean it's popular, popular means many people like it, so like and many. Maybe something like (views /100)*(favs/100)

*Example 1:* (25000/100)*(650/100)= 25*6.5=1625 
*Example 2:* (650/100)*(53/100)= 0.65*0.53 = 3.445
*Example 3:* (63/100)*(6/100)= 0.0378
*Example 4:* (6000/100)*(430/100)= 258

Then sort from high to low

You could always add comments and age in days to the equation, however, comments don't necessarily indicate popularity, there are amazing pieces with 20 comments, and ych pieces less good, with 200 comments. however it'll still make a difference, as popular, less good, but cheap ych's will be easy to find as well for people with a small budget.

It would look something like this
(((comments/100)+1)*(((views/1000)+1)*((favs/100)+1))/((age in days/100)+1))
+1 because lets say there are 6 comments.. the value will be very low and affect the overall result negatively (because you will multiply by 0.x) adding 1 will avoid this without affecting high comment posts a lot and also avoid divided by 0 error.
for simplicity i put examples 1 through 6 on 1 for ((age in days/100)+1)
*Example 1 (a lot of views, favs and comments):* (((570/100)+1)*(((27632/1000)+1)*((3500/100)+1))/1)= 6906.0384
*Example 2 (a lot of views and favs but not many comments):* (((23/100)+1)*(((24653/1000)+1)*((1957/100)+1))/1)= 649.0491
*Exmaple 3 (a raffle, many comments, many views but not insane, and not many favs): *(((510/100)+1)*(((3500/1000)+1*((98/100)+1))/1)= 33.428
*Example 4 (a post with not much of either 3): *(((12/100)+1)*(((145/1000)+1)*((43/100)+1))/1)= 1.8338
*Example 5 (alteration of example 1 with 0 comments to show how it affects the result):* (((0/100)+1)*(((27632/1000)+1)*((3500/100)+1))/1)= 1030.752
*Example 6 (to prove ex. 2 can beat ex. 1 if 1 has 0 comments and 2 has 230 comments): *(((230/100)+1)*(((24653/1000)+1)*((1957/100)+1))/1)= 1741.351293

example 1 being old vs being new:
(((570/100)+1)*(((27632/1000)+1)*((3500/100)+1))/((3564/100)+1)= 188.483580 (this post is about 10 years old)
(((570/100)+1)*(((27632/1000)+1)*((3500/100)+1))/((436/100)+1)= 1288.44 (this post is a little over a year old)
(((570/100)+1)*(((27632/1000)+1)*((3500/100)+1))/((123/100)+1)= 3096.878206 (this post is only a few months old)
As you can see, age affects its outcome, which means very old submissions with insane view counts will no longer beat new but less viewed submissions.
Let's go nuts and prove this by making the view count of the 10-year-old one something like 150k
(((570/100)+1)*(((153632/1000)+1)*((3500/100)+1))/((3564/100)+1)= 1017.93772 (as you can see the score is higher now, because of the view count, this usually happens with submissions that remain popular over the years)

I hope this formula will serve well, as long as there is a way to use the numbers behind the period, otherwise, all submissions that have smaller results, like 5.166 and 5.499 or 1.0123 and 1.586 will suddenly be both 5 and both 1, which breaks the purpose of the formula.
As I deeply hope this comment gets viewed by someone who can notify the developers of the site, and hope this formula is integrateable as it took me about a good hour to come up with this from thin air.


----------

