Updated: Fast(est?) portable simple solution outlined in Part 3
Continued in Part 2.
Updated: I see Linus Torvalds (of ye verily Linux fame) crying about the same problem way back when - did he ever find a better solution than the MS one?
I was checking out a new blog site that the Microsoft Vista Shell team has (a former friend of mine runs that group there), and, digging around, happened upon an old post from Raymond Chen (that guy is amazing - LOVE his blog)
This particular post was about integer overflow and resultant subtle problems. Its a trickier thing than most people think, and has all sorts of well documented long standing "famous" bugs.
That led me to Microsoft's very nice SafeInt class - but yowza - safe addition takes 3 compares and 3 branches? In the trivial case? (i.e. positive number plus a positive number)
I saw that there's been an update that deals with this by casting to int64, doing the add, and then two compares and 1 branch, and a cast into a int32, but yoinks... someone must have a better solution than that.
I'm going to do a little research - but someone? anyone? Faster/better solution?
(My timing tests show - I wrote a tiny benchmark - with the "safe int" exception throwing on, that simple addition takes twice as long (almost) - that really the state of the art?)
13 comments:
carry flag?
It is late and I'm just writing this by memory, but:
x = a + b;
__asm jae nofault
// uh oh it broke?
__asm nofault:
I'm using jae instead of jb so that "assume branch taken" is the common case. This is sometimes faster than the other way around, but I can't remember the details.
yeah - so two problems (I *must* be doing something wrong):
a) its x86 assembly - I was looking a more portable code path (not necessarily "MAXIMALLY" portable - you can presume wrapping math and two's complement pretty well everywhere)
b) I tried the x86 asm, and it was wierdly slow - I'll post a benchmark (or just send it you), but I tried
__forceinline long asmAdd(long a, long b)
{
long c=a+b;
__asm jo OVERFLOW_ERROR;
return c;
OVERFLOW_ERROR:
SafeIntOnOverflow();
}
and
__forceinline long asmAdd(long a, long b)
{
long c;
__asm
{
mov eax, a
mov ecx, b
add eax, ecx
jo OVERFLOW_ERROR
mov c, eax
}
return c;
OVERFLOW_ERROR:
SafeIntOnOverflow();
}
and BOTH were slower than the MS SafeInt.
Basically, times were:
raw add: 550ms
ms safeint: 900ms
asm safeadd: 1200ms
go figure...
forceinline is breaking for that case (calling a function, hmm)
Here's another branchless version (should be faster than this but inline asm makes it harder).
(This is 1.5x slower than add.)
__forceinline uint32 __safeAdd(uint32 a, uint32 b, uint32 *ov)
{
uint32 c = a + b;
__asm mov ebx, ov
__asm mov eax, 0
__asm adc eax, 0
__asm mov [ebx], eax
return c;
}
hmmm - its WAAAY faster (naturally :P) but I'm not sure its right? sending you a testbed right now... along with my C version (you and your asm! :D)
After being piqued and standing on your shoulders, and with a little overflow/opcode sleuthing, I got:
__forceinline long safeAdd ( long a, long b, unsigned char *ov )
{
long c = a + b;
__asm mov ebx, ov
__asm seto [ebx]
return c;
}
This appeared to "work" and to be about 14% slower at times, sometimes the same, on my Dell with a QueryPerformanceCounter() check. BTW, Sree, these posts could use the angle-code-angle tag. :)
Okay...time to get a life again.
also appears (somewhat) incorrect - I'll post my test shortly... :)
plus - I was kinda hoping for a C solution, though the asm one as "baseline" for performance is good to have...
You know, *if* you're going to throw an exception anyway, it is really hard to beat a single compare in C (at least for the unsigned case).
In that case, you're doing one more add than the best assembly you can write, and it's portable.
If you're willing to have an error variable for several ops that you check later, avoiding the branch can be nice.
Unfortunately, seto can't give an error for multiple ops because it will zero out previous errors. adc is better because it can report errors (and the number!) for multiple ops.
mike
Its not so much that I want to throw an exception as take some action on overflow... exception's sort of a proxy for this benchmark (as that's what the MS one does).
Why not just use the compilers built-in 64 bit type. On msvc its
#ifdef GCC
typedef unsigned long long u64;
#else
typedef unsigned __int64 u64;
#endif
u32 safeAdd( u32 a, u32 b, u32& ov )
{
u64 val = u64(a) + u64( b );
ov = ( val >> 32 );
return u32( val )
}
Well (and I don't mean this to be sound snippy) but that will be both slow AND incorrect :)
just shifting the large result won't tell if you there was overflow (and you'd like to clamp or do some other action)
the MS technique (which is pretty reasonable - but also NOT the one documented in their article; you'll see it if you read their code) does indeed use 64 bith math, and then test the 64 bit result against 32-bit limits to determine if overflow occurred.
That's good - but seemed like it could be beat for performance.
Trying to be impressive!deeply wonderful here!
g2gmart/rsloads is an professional store for
runescape gold
runescape items
runescape money
runescape accounts
runescape powerleveling
runescape accounts
runescape gold
runescape items
runescape powerleveling
Maple Story Mesos
MapleStory Gold
Maple Story Items
guildwars gold
gw gold
guildwars items
gw items
runes and some other goods with fast delivery and world class service.
aaaapowerleveling wow powerleveling world of warcraft power leveling 汽车空调 runescape money rs2 money wow gold 中国福利彩票 双色球 印刷机械 液压机 涂布机 分切机 粉末冶金 plastic machine packing machine
flexible connectors powerleveling wow powerleveling world of warcraft powerleveling briefcase 包装机械 液压机 铝型材 活塞 激光礼品 鞋业 环保空调 吹膜机 锻件 点火开关 工装夹具
香炉 上海翻译公司 google排名 google左侧排名 google排名 餐饮软件 embroidery machines runescape gold rs2 gold thermoforming machine bag making machine
tungsten carbide tungsten plate tungsten electrode tungsten wire tungsten alloy tungsten rod tungsten product molybdenum sheet molybdenum product molybdenum wire molybdenum rod thermoforming machine thermoforming Equipment Plastic Machinery Plastic Thermoforming Machine Plastic Thermoforming Machinery Plastic Machine jordan shoes prada shoes Gucci shoes adidas shoes nike shoes ugg boots evisu jeans true religion jeans Gucci handbags lv handbags ed hardy new era
包装带设备 模切机 压痕机 切纸机 压纹机 上光机,过油上光机,开槽机,V槽机,折盒机 开槽机 V槽机 折盒机 覆膜机 覆面机 气动马达 气动搅拌机 制袋机 手套机 收卷机 吹膜机 连线机 粉碎机 脱水机 搅拌机 造粒机 团粒机 卷绕机 拉丝机 织带机 包覆丝机 圆织机 裁料机 冲口机 下料机 压合机 纸杯机 纸碗机 纸碟机 热成型机 片材机 制杯机 牵引机 压底机 挤出机 冲压机 包装机 贴窗机 涂胶机 信封机 捆扎机 打包机 切袋机 喷码机 刻字机 打标机 标示机 缠绕机 灌装机 封箱机 丝印机 封口机 裹包机 整理机 滚齿机 封面机 包边机 折入机 整平机 冷压机 镂铣机 贴角机 贴膜机 纸巾机 湿巾机 折叠机 充填机 抛光机 装盒机 调头机 折边机 修边机 上光机 压光机 压纹机 压花机 分切机 分条机 涂布机 覆面机 裱纸机 除粉机 糊盒机 打孔机 磨刀机 切割机 钻孔机 胶水机 圆角机 压平机 划线机 纠编机 插边机 淋膜机 切片机 开槽机,V槽机 底封机 上糊机
本日は専務からどやされました。ほんと上ってやつは、
新橋 風俗
小岩 風俗
なんにも理解していないのに口出しだけはよくするもんなあ。ほんと腹が立つ。
栄町 風俗
大塚 風俗
話題は変わって、最近興味があるのが風俗。
厚木 風俗
吉祥寺 風俗
気になったものは早速調べるというわけでパソコン使っていろいろ調べていました。
鶯谷 デリヘル
巣鴨 風俗
いやあしかしネットっていい。風俗だけでなく、
品川 デリヘル
船橋 風俗
たいがい調べられます。風俗のページが多数発掘できました。この件はまた今度。
錦糸町 デリヘル
八王子 風俗
蒲田 風俗
柏 風俗
日暮里 風俗
池袋 デリヘル
上野 デリヘル
成田 風俗
五反田 デリヘル
葛西 デリヘル
市川 風俗
町田 デリヘル
秋葉原 風俗
松戸 風俗
池袋駅 風俗
立川 デリヘル
目黒 風俗
吉原 ソープ
北千住 風俗
Post a Comment