February 17, 2006

Twiddling the FPU

I wrote an article for a website a friend of mine keeps. I'll warn you now: its preeeeettttyyy geeky. It had been sitting in a state of partial completion for years, but I've been in a lot of meetings lately, and so I finally got a chance to finish it :P

Its definitely in the
"See how smart I am/educational" category, and not a commercially material type of thing, but, still, I think there are some nice tips there.

Embarassingly - in the course of writing the article - I discovered a long standing bug (*cough* - my bug) that I'd been blaming on the
Boxely team. You can see the bug in action pretty easily in AIM Triton 1.0, AOL Explorer 1.2, AOL Safety and Security Center, and the AOL Suite Preview.

What's the bug you say?


And sadly, you have to get all the way to the bottom to see what it is. See? Even us Programmers can learn from Programmers :)


Anonymous said...

hey Sree ...'nuf on OCP...let's hear your thoughts on the combined Platform ...or are you not in a position to say anything yet? ;-)

Sree Kotay said...

Not to worry - I started with the OCP, because, well, that's where I started.

There will be a LOT I'll be saying about Open Services and AOL in the very near future.

But not just yet :)

And in any case, "Twiddling the FPU" wasn't an OCP post?

Anonymous said...

lol - yeah commented on the wrong post. FPU article was "fascinating" - to think that something as trivial as converting 64-bit to 16-bit numbers could have caused the Ariane-5 crash! Anyways...we all learn about IEEE specs for floating point numbers in Digital Logic 101, but I never realized casting them is so slow in compiler generated code.

Would love to read more about your 3D renderer ... how did you do hidden surface removal? I vaguely recall using BSP trees in some Graphics assignment.

Sree Kotay said...

3D Rendering? I used a custom HSR technique I called a "Slat buffer" - basically it was sort of like what came to be known as a hierarchical z-buffer - but per row hierarchy (i.e. "slats" instead of "tiles").

But truthfully, being fill-bound wasn't really the issue for most of the scenes I was doing: it was usually geometry bound, especially given the procedural antialiasing technique I favored.

There's a tiny bit more info here, and you can still see a (slightly munged) version in action.

I have a software rendered Quake3 renderer somewhere I should dig up and post...

Jason Doucette said...

Hi Sree Kotay, great FPU article! A few comments: I've read both the old and new versions, and I think it may need another update, as MSVC++ 8.0 has deprecated the /QIfist (Suppress _ftol) compiler option because they have "made significant improvements in float to int conversion speed." Depending on the test, your functions are sometimes much faster, but also sometimes slightly slower. I am most interested in xs_FloorToInt(), since I'm a graphics programmer. Also, a minor issue is that they have accuracy issues when dealing with numbers > 2^28 magnitude, and near integers, which makes unit testing (to ensure they actually work) rather cumbersome, but floating point inaccuracies are expected anyway, so any code using your functions should be able to deal with them. Also, in the #else clause of xs_FloorToInt(), you need to convert the result of floor() to an int. Nonetheless, it's an amazing article, with a ton of detail, and it's helped explain a lot to me. Thanks for your time in writing it, and thanks for listening! Take care,
P.S. You have no contact info in your article.

Waterbug said...

You'll be happy to know that bug in xs_RoundToInt is still kicking people in the nut sack 7 years later. A few days out of my life to track it down. Thanks, I guess.

Sree Kotay said...

@Jason sreekotay on Twitter. And yes -- probably should update...
@Waterbug --- sorry; should update that. Same type of graphics issue? Or different use case?

Sree Kotay said...

Also @Jason -- the performance value is primarily in conversion to fixed directly -- but will take a look at times with latest MSVC...