# Download an entire artists gallery in a few click (not spam, srsly)



## benanderson (Sep 27, 2011)

I've been working on an image scraper to download an entire person's gallery in a few clicks. Its very rough around the edges and so far it only does it one gallery page (36 images) at a time.

Basically, grab the source code of a gallery page, type in the artist's user name (since automatic discovery isn't implemented yet), press go and it downloads all the images to the images directory on the server.
http://www.youtube.com/watch?v=-4EqpkRC84w
It requires a server or desktop computer with Apache2, PHP5 and a *Nix OS (MacOS, Linux, BSD et al). Currently running this on Ubuntu Server 10.04LTS.

I was bored and I thought it would be something people would find useful on FA, since the d.facdn.net domain throws a 403 when you try to scrape it for image files b:

*Future Plans:*
Auto discovery and download of MP3 and SWF files (currently skipped).
Some god damn CSS.
Auto discovery and scraping of multiple gallery pages for a given artist.
Auto discovery of artist name.
Auto discovery of source code, will need only the artist name to start downloading.
Ability to limit scraping to a certain number of gallery pages.




Let me know what you think so far 
I've definitely gotten use out of it, 108 images in about 10~15 minutes.


----------



## Accountability (Sep 28, 2011)

You should know that the technical staff's *top* priority (above fixing bugs and coding new things) is blocking scrapers. This is probably the last place you should post about this.


----------



## DarkMettaur (Sep 28, 2011)

Accountability said:


> You should know that the technical staff's *top* priority (above fixing bugs and coding new things) is blocking scrapers. This is probably the last place you should post about this.



There's several scrapers out there that aren't open to the public and/or are kept secret between a few circles because of Yak going insane on them.


----------



## benanderson (Sep 28, 2011)

Scraper is too strong a word IMHO. All it is actually doing is just parsing text (hence the copy an paste source code) - completely harmless and puts 0 load on FA's servers since images are download one by one. Simple.


----------



## DarkMettaur (Sep 28, 2011)

benanderson said:


> Scraper is too strong a word IMHO. All it is actually doing is just parsing text (hence the copy an paste source code) - completely harmless and puts 0 load on FA's servers since images are download one by one. Simple.



Well good luck, you'll need it.


----------



## Verin Asper (Sep 28, 2011)

we had folks in the past talk about their scrapers, even those saying it wont bog down the site...but still to FA its BADONG


----------



## benanderson (Sep 28, 2011)

DarkMettaur said:
			
		

> Well good luck, you'll need it.


Why? Will Yak visit my house and bean me with a cricket bat? XD
All he has to do is ask politely and I'll keep it for personal use only.



Crysix Fousen said:


> we had folks in the past talk about their scrapers, even those saying it wont bog down the site...but still to FA its BADONG


I can understand spiders and scrapers, this is nothing more than a simple text parser that creates a list of image URLs. The downloading is just standard HTTP GET requests as with any web browser.


----------



## Summercat (Sep 28, 2011)

I'm not a tech, but until Yak or Carenath or net-cat say anything, I'm just going to leave this here.

Ben, is it possibly to add in a delay with the downloading? I'm not going to ask you to cripple the software, just set it so it wouldn't be a burden of downloading all the images at once (maybe a sequential list?)


----------



## benanderson (Sep 28, 2011)

Summercat said:


> I'm not a tech, but until Yak or Carenath or net-cat say anything, I'm just going to leave this here.
> 
> Ben, is it possibly to add in a delay with the downloading? I'm not going to ask you to cripple the software, just set it so it wouldn't be a burden of downloading all the images at once (maybe a sequential list?)



It's already sequential.
It guesses the location of the images and puts all the URLs into an array.
It then downloads one image, then moves onto the next in the array.

Basically, it works like this:

```
//Pseudo Code
$array = find instances of("src='domain.filenumber.thumbnail.artistname_filename.extension'") in $html
loop through $array as $img
strip html code from $img
write $img to disk
next entry in $array
```

I could put in a pause before it starts the next entry in the array:

```
//Pseudo Code
$array = find instances of("src='domain.filenumber.thumbnail.artistname_filename.extension'") in $html
loop through $array as $img
strip html code from $img
write $img to disk
wait 2 seconds
next entry in $array
```


----------

