So betwixt my last work environment and my current one, I had begun going down the path of (again) building my own software. I had (have) a number of ideas - but that's really the easy part. And I'd say "good ideas", but, in my humble opinion, the difference between a "good idea" and a "bad idea" is in the execution.
Which is to say, its value you accrue in arrears, not a judgement you can really make a priori.
I suppose - best case - you hope its a good idea before you start doing it.
In any case, I had 3 things I did actual work on to a real degree (in terms of execution): some middleware, a web app-y thing, and a tool. They had some overlapping technology components but with reasonably distinct market segment targets. Given where I am now, I'm thinking I'll release them each in some form over the next few months - no sense letting them rot on my harddrive.
So - check back; more info soon.
Showing posts with label Code. Show all posts
Showing posts with label Code. Show all posts
July 30, 2007
Filling the Gap (AKA Ideas are cheap)
April 10, 2007
Shift Registers and De Bruijn Sequences
I tell ya' - everything time you think you've got a novel idea, turns out somebody's been there already... doesn't even seem to matter how small it might be.
For example: I have this minor function I wrote ages ago, which I had recently rewritten/had to re-derive. Its purpose was fairly mundane - I wanted to compute the (integer) log base 2 for a power of 2 integer value. Pretty simple and esoteric problem, but it pops up fairly often in graphics/sound programming (especially 3d stuff).
There are lots of solutions, and its not really a significant performance bottleneck or anything, but I always thought my solution was rather novel. The code looks like this : (link)
The idea is pretty simple, but clever (er... if I do say so myself, I suppose :)).
Very (very) briefly: multiplying by a power of 2 is the same as left-shifting by that power (determining that power being the log we're looking for). Therefore, we can construct a bit pattern that uniquely encodes the possible patterns sequence 0 - 31, such that any 5 bit sequence is uniquely of that range. Then multiplying by a power of two number puts a unique sequence into the top 5 bits of the result, which we can then shift down and use a table to "reverse" into our result. We need the table, because of course the encoding isn't linear with respect to the domain.
There are a number of constants you could use to fulfill this criteria, and I always thought of it as a compression/huffman tree encoding problem. The current constant I computed is 0x04ad19df.
Turns out not only is this a well known idea, but, further, its actually a special class of space filling curves (amongst other things) know as De Bruijn sequences. See the great "Bit Twiddling Hacks" page by Sean Eron Anderson for more information. The constant they computed, which of course also uses a different table, is 0x077CB531.
Results are the same.
Ah, well. Time to move on, I think..... :)
For example: I have this minor function I wrote ages ago, which I had recently rewritten/had to re-derive. Its purpose was fairly mundane - I wanted to compute the (integer) log base 2 for a power of 2 integer value. Pretty simple and esoteric problem, but it pops up fairly often in graphics/sound programming (especially 3d stuff).
There are lots of solutions, and its not really a significant performance bottleneck or anything, but I always thought my solution was rather novel. The code looks like this : (link)
The idea is pretty simple, but clever (er... if I do say so myself, I suppose :)).
Very (very) briefly: multiplying by a power of 2 is the same as left-shifting by that power (determining that power being the log we're looking for). Therefore, we can construct a bit pattern that uniquely encodes the possible patterns sequence 0 - 31, such that any 5 bit sequence is uniquely of that range. Then multiplying by a power of two number puts a unique sequence into the top 5 bits of the result, which we can then shift down and use a table to "reverse" into our result. We need the table, because of course the encoding isn't linear with respect to the domain.
There are a number of constants you could use to fulfill this criteria, and I always thought of it as a compression/huffman tree encoding problem. The current constant I computed is 0x04ad19df.
Turns out not only is this a well known idea, but, further, its actually a special class of space filling curves (amongst other things) know as De Bruijn sequences. See the great "Bit Twiddling Hacks" page by Sean Eron Anderson for more information. The constant they computed, which of course also uses a different table, is 0x077CB531.
Results are the same.
Ah, well. Time to move on, I think..... :)
December 4, 2006
Unicode is a pain in the @$$
I was playing around with some code-y things, and wanted to find a simple, cross platform way to deal with Unicode strings. Good luck!
The closest thing I could find for C/C++ was the ICU library from IBM. While its VERY complete and functionlly rich - oh my god is it huge and complicated!
In the past, I've had to write my own (simple) functions to deal with really exactly the cases I needed, but, this being 2006, I thought I could find some nice publicly available routines to deal with this.
Not so much.
There's definitely code out there, but either by license or implementation, its bound up in ways that make it not so useful... so...
...I think I'm going to have to roll my own on this (again), which I'll then release source code for publicly - something simple, but hopefully more useful than what I've seen. Anybody have/use anything they like better that's already out there, let me know please... save me the hassle :)
As far as I can tell, here's what it takes to decode a single UTF-8 character:
(code snippet)
Ouy!! ...and its not even "fully" right for the invalid/stict case - there are invalid bit distributions within the byte group ranges.
And UTF-16 isn't much better... it's a little less code, but FAR more complicated in logic. Originally, it was intended to be 1 code point per unit (an unsigned short), but that become untenable over (a short amount of) time, and so they introduced support for multibyte surrogates, just like UTF-8. But because it had not orginally been designed for UTF-16, it used some odd ranges to support them (one of many reasons I don't like it). So THAT code, to decode a code point from a UTF-16 stream, looks like this:
(code snippet)
And all this just to decode ONE codepoint... of course, if you've ever actually looked at the algorithmic complexity of converting a floating point number to a string (even most veteran code monkeys haven't)... well, you start to appreciate the value of a good code library.
Anyway, I want length, count, cat, copy, compare (case sensitive and not), at least, and probably one or two other classes of functions (conversions betwixt UTF types certainly, but that's reasonably there already, if a little "stricter" than necessary). Locale code pages I can do without .... I'm looking for "pure" Unicode functionality in a lightweight package.
Pointers welcome.
The closest thing I could find for C/C++ was the ICU library from IBM. While its VERY complete and functionlly rich - oh my god is it huge and complicated!
In the past, I've had to write my own (simple) functions to deal with really exactly the cases I needed, but, this being 2006, I thought I could find some nice publicly available routines to deal with this.
Not so much.
There's definitely code out there, but either by license or implementation, its bound up in ways that make it not so useful... so...
...I think I'm going to have to roll my own on this (again), which I'll then release source code for publicly - something simple, but hopefully more useful than what I've seen. Anybody have/use anything they like better that's already out there, let me know please... save me the hassle :)
As far as I can tell, here's what it takes to decode a single UTF-8 character:
(code snippet)
Ouy!! ...and its not even "fully" right for the invalid/stict case - there are invalid bit distributions within the byte group ranges.
And UTF-16 isn't much better... it's a little less code, but FAR more complicated in logic. Originally, it was intended to be 1 code point per unit (an unsigned short), but that become untenable over (a short amount of) time, and so they introduced support for multibyte surrogates, just like UTF-8. But because it had not orginally been designed for UTF-16, it used some odd ranges to support them (one of many reasons I don't like it). So THAT code, to decode a code point from a UTF-16 stream, looks like this:
(code snippet)
And all this just to decode ONE codepoint... of course, if you've ever actually looked at the algorithmic complexity of converting a floating point number to a string (even most veteran code monkeys haven't)... well, you start to appreciate the value of a good code library.
Anyway, I want length, count, cat, copy, compare (case sensitive and not), at least, and probably one or two other classes of functions (conversions betwixt UTF types certainly, but that's reasonably there already, if a little "stricter" than necessary). Locale code pages I can do without .... I'm looking for "pure" Unicode functionality in a lightweight package.
Pointers welcome.
Subscribe to:
Posts (Atom)