# Search Engines banned from indexing /view?



## kshade (Nov 14, 2007)

Hi,

I just noticed that I can't search FA with Google anymore because their (and others) spiders are banned from indexing most of the site. A few weeks ago this was still possible and me and my roommate really miss this way of searching FA for species that aren't listed. 

Greetings,

kshade


----------



## supercutefurri58 (Nov 14, 2007)

why fa?  google only wants to be your friend!


----------



## codewolf (Nov 14, 2007)

i dont know of the proper reason, but to be perfectly honest i think it is better to just browse through the site, who knows you might find some really good art on pages you nevre been to before


----------



## imnohbody (Nov 14, 2007)

I have even less insight on the issue than codewolf, since I'm not anywhere near being on the staff (nor want to be, frankly), but at a guess it has something to do with the bajillion hits from search engines taking a chunk out of the bandwidth that could be better put towards delivering content for users to view/hear/etc.

Each individual hit may not be much, but when you get into millions of hits (more common than one may think; IME a lot of people underestimate how much actually goes on "in the background" for the internet in general), that "not much" adds up really quickly.


----------



## Stratelier (Nov 14, 2007)

Blocking search-engine robots is harmless, at least when FA has an in-site search function...

.
.
.

Er, nevermind.


----------



## nrr (Nov 14, 2007)

imnohbody said:
			
		

> I have even less insight on the issue than codewolf, since I'm not anywhere near being on the staff (nor want to be, frankly), but at a guess it has something to do with the bajillion hits from search engines taking a chunk out of the bandwidth that could be better put towards delivering content for users to view/hear/etc.


Bandwidth is not the main concern here. If I recall correctly, a lot of FA is still very much I/O-bound, and each request to some resource that FA hosts is another few things to add to the I/O queue.

I will agree, however, that fewer requests are better.


----------



## kshade (Nov 15, 2007)

codewolf said:
			
		

> i dont know of the proper reason, but to be perfectly honest i think it is better to just browse through the site, who knows you might find some really good art on pages you nevre been to before


I already do that frequently, but sometimes I want something specific or search for a picture I've seen some weeks ago to show somebody.



			
				nrr said:
			
		

> I will agree, however, that fewer requests are better.


Well, if there's no option to search the site, there will be fewer requests by users, too.


----------



## nrr (Nov 15, 2007)

kshade said:
			
		

> Well, if there's no option to search the site, there will be fewer requests by users, too.


Well, that's true, but if the site doesn't want to grow and prosper...


----------



## Janglur (Nov 15, 2007)

Someday, FA will have basic functionality.


----------



## TehSean (Nov 15, 2007)

Janglur said:
			
		

> Someday, FA will have basic functionality.



We need paid coders. Perhaps that's what the next shameless Donation Drive should be for.


----------



## Eevee (Nov 15, 2007)

Sure, I'm all in favor of being paid  8)


----------



## yak (Nov 15, 2007)

I'd be happy if i get a cent every time i tell people to clear their cache.


----------



## codewolf (Nov 15, 2007)

yak said:
			
		

> I'd be happy if i get a cent every time i tell people to clear their cache.


would'nt we all


----------



## TehSean (Nov 15, 2007)

Of the dollar times it was recommended in the past, I'd have about 3 cents for every time re-caching worked. )


----------



## Dragoneer (Nov 16, 2007)

Y'know, if you guys wanted, I could start a donation drive so that, y'know, people could donate to the coders. I'd be happy to split up all the money between them so you could get the features you guys want, improvements you need. I'd love to pay them myself, but... y'know, I don't make that much money. =P


----------



## nrr (Nov 16, 2007)

Preyfar said:
			
		

> Y'know, if you guys wanted, I could start a donation drive so that, y'know, people could donate to the coders.


... a sane idea from Preyfar?  Can it be true?


----------



## yak (Nov 16, 2007)

TehSean said:
			
		

> Of the dollar times it was recommended in the past, I'd have about 3 cents for every time re-caching worked. )


Hay, i don't work for tech support here. I actually _investigate_ the problem before telleing people to clear their cache.
I mean, it's been the most common problem on the internet since ages and people haven't learned it still.


----------



## kshade (Nov 16, 2007)

Uhm, that's a nice idea, but I'd still like to know why search engines are banned


----------



## yak (Nov 16, 2007)

Oh, and google was banned from FA for a reason. I can't remember which exactly, but there actually was a reason for it at some point.


----------



## Dragoneer (Nov 16, 2007)

yak said:
			
		

> Oh, and google was banned from FA for a reason. I can't remember which exactly, but there actually was a reason for it at some point.


Yeah, I banned portions of google.com. Basically, google was doing robot searches everyday. This was back when FA was still on a single server and was having massive issues even generating a single page (late 2006/early 2007). Everytime Googlebot came to visit, the site... stopped.

I guess Google was going through every single page on FA to categorize all the information, and I mean EVERY SINGLE PAGE. Over half a million entries in the span of a day. When FA ran slow, the Googlebot search ran slow, and further contributed to the slowdowns FA was experience dramatically.

Blocking spiders helped reduce the load somewhat dramatically. Google was since removed as a block, but doesn't appear to have picked pack up its searches.


----------



## yak (Nov 16, 2007)

Well no, for the sake of consistency i added the missing robots.txt back when i saw it missing.


----------



## kshade (Nov 16, 2007)

yak said:
			
		

> Well no, for the sake of consistency i added the missing robots.txt back when i saw it missing.



Um, could you remove /view, /user and probably /browse from it? I really miss searching FA


----------



## yak (Nov 16, 2007)

At this point i don't see this a reasonable thing to do.

/browse - each time it has a new content, which makes no sense when it comes to indexing because when you come back to it from google, the contet of the page will be different.

/view  - could work, but will provide very little relevence factor because of the comment noise. You'd probably want to search by submission title and description, but most probably you'll hit the matching text in it's comments.

/user  - again, too much text noise to make searches with decent relevance. the only benefit i see is being able to find an artist by his name or other site aliases, if they given it in ther "description".

Normally, it's not a bad idea to allow google to crawl your website. But at this time, and given the current circumstances the main of which is hight server load, it is not reasonable to allow google to leach our bandwidth and induce an even higher server load.


----------



## kshade (Nov 16, 2007)

yak said:
			
		

> /browse - each time it has a new content, which makes no sense when it comes to indexing because when you come back to it from google, the contet of the page will be different.


Yes, but it has links to /view, that's why I listed it.



			
				yak said:
			
		

> /view  - could work, but will provide very little relevence factor because of the comment noise. You'd probably want to search by submission title and description, but most probably you'll hit the matching text in it's comments.


If I want to search by submission title I just add "intitle:"What I want".



			
				yak said:
			
		

> Normally, it's not a bad idea to allow google to crawl your website. But at this time, and given the current circumstances the main of which is hight server load, it is not reasonable to allow google to leach our bandwidth and induce an even higher server load.


Okay.


----------



## net-cat (Nov 16, 2007)

Of course, you could always detect when googlebot hits the page and make it spit out a stripped down version of the page...


```
<html>
<head>
<title>{TITLE} by {AUTHOR} -- Fur Affinity [dot] net</title>
</head>
<body>
<h1>{TITLE}</h1>
<h2>by {AUTHOR}</h2>
<p>{DESCRIPTION}</p>
</body>
</html>
```


----------



## Stratelier (Nov 16, 2007)

Net-cat has an intriguing idea there.  It would require a little extra page-generation work, and of course the copy the search engine caches wouldn't totally match what you see when viewing the same yourself.

So unless there is a way to "anti-robot" comments from being search indexed...


----------



## kshade (Nov 17, 2007)

Stratadrake said:
			
		

> Net-cat has an intriguing idea there.


Indeed he does, it would be nice to have this feature.



			
				Stratadrake said:
			
		

> So unless there is a way to "anti-robot" comments from being search indexed...


AFAIk there is none.


----------



## fastturtle (Nov 23, 2007)

Why not?
Simply put em into a seperate directory and in robots.txt tell it not to search those directories.

Which is what Robots.Txt is supposed to be for.

Of course any search bot that's not honoring the robots.txt gets banned for bad behaviour then taken out back and tarred/feathered on General Principles.


----------



## Andromalius (Nov 23, 2007)

Honestly, I find it a good thing Google can't index us. When I was in school this site would have been blocked in every which way if it was found in the Google index. Aside from that, I'm sure some folks don't want to be global about FA. Helps with the privacy and exclusivity I think.


----------

