Archive for October, 2006

this one might be a doozy

Monday, October 9th, 2006

I work on my thesis project on both my laptop (a three+ year old 12″ PB G4 @867 Mhz, 640 MB of RAM) and my desktop (a dual-core 2Ghz G5, 2.5 GB of RAM). The difference in performance is, eh, a little compelling. But the laptop is ever so slightly more portable, so I’m usually on it.

Anyway. I synch the two via subversion. Yesterday I updated my G5’s copy, built* and launched and … crash! EXC_BAD_ACCESS. Okay, I must’ve done something dumb ’cause this latest revision was fine on my lapt … wait, WTF? Here’s the stack trace:



We’re** crashing in -[NSOpenGLContext setView:], and eventually in gldGetString (which is apparently being called recursively, which seems a bit suspicious on its own).

That’s all very odd because I don’t have any control over that code and I don’t remember changing any of the set-up leading up to that call. AND it works on my laptop.

To be sure, I checked out revision 16 on my desktop — from several days ago, the last time I had been working on my desktop — built and ran … and no crash. So clearly something has changed in the code or its resources. But the app still launches fine on my Powerbook. Crap.

A diff of (what I think would be the) relevant code between revision 16 and the latest revision reveals nothing that (I would expect) could possibly cause this crash. It can’t possibly be crashing. Except that it is crashing.

One possibility is that there is a timing bug of some sort which is now being exposed on the dual-core machine. My app itself is single-threaded, but the services it uses likely spawn additional threads. In this case, I’ve only been lucky that the app has been running at all so far. To test this, I delayed the final NSOpenGLContext setup until well into the app’s main event loop. Nope: the setView: call still crashes. This doesn’t rule out some sort of threading/timing issue, but makes it less likely.

A google search has not revealed anything really immediately obviously helpful, which makes this post a little beacon for others who may run into something like this. Will report back when I make progress.

Update: Okay, this turned out to be something dumb. I had been experiencing some infinite recursion at one point and had lowered the stack size of my app (the figuring out of which is possibly worthy of its own problem+solution post, being more than an obvious google-search away) so that I wouldn’t have to wait for ten bazillion stack frames to load after a crash. This was mostly for my poor under-PoweredBook’s sake. But the setting got transferred to the G5 and for some (still mysterious) reason, was causing stack corruption.

I had in fact thought of this early on and turned the setting off and re-built, but it didn’t have any effect. I had not, however, built clean. Cleaning the target and re-building with the default stack size seemed to do the trick.


* A topic for another problems+solutions post: A couple of static libraries I’m using for the app must be run through ranlib whenever I switch machines, or the linker complains. (Or else take them out of the repository, but I’d rather not.) This is kind of a bummer. I’m guessing I need to build them with XCode instead of via their make scripts. To get that happening sounds like a potentially great big pain in the ass.

** I was going to comment on how I always use the royal we when talking about code, but it turns out to be pluralis modestiae, not majestatis. Cool. I do not have tapeworms, so far as I know.