Archive for the ‘tips’ Category.

XML in Python – What If You Need XPath?

One of the lovely things about Python is that there are so many free libraries to choose from. But sometimes that’s a bad thing, because people love to reinvent the wheel, thinking that they can make it somehow rounder and more efficient. This of course results in a lot of dead code and modules that haven’t been updated since the Stone Age.

Recently at work I found myself looking at a new package to replace one really old dead one: PyXML. We’ve got a good chunk of code that reads and writes data to an XML file basically as a flat-file database for when we (or our customers) are not using PostgreSQL, Oracle, etc.

Normally that wouldn’t be such a big deal, except that we have one requirement: XPath.

There are an awful lot of good XML parsers out there in the world. CPython now even comes with one build in: ElementTree. …And cElementTree (the compiled version of ElementTree) which is the same API, just a whole lot faster – unless you’re using PyPy, which we’re not – but I digress.

The problem with ElementTree however is that it doesn’t fully support XPath. In fact, it barely does at all. It’d be nice if it did, but it doesn’t much, so there you are.

The following is a run-down of my research into a few other Python libraries that do fully support XPath XML coding standards. I wish I could release the benchmark code that my performance evaluations are based on, but the benchmark code is based on real use cases of proprietary code. Meaning that they also don’t necessarily represent the on-paper perfect-world performance of these libraries, but more useful real-world use-cases. These were run under CPython 2.7.3 on a Windows 7 64-bit Intel Xeon workstation.

PyXML: The Baseline

You’d think that PyXML, having last been updated in 2004, would be long dead. And it is. But if you don’t mind fixing some minor things in this all-Python XML library, it actually does still work. You just have to find-and-replace the two places where “as” is used as a variable name, since Python now protects keywords with a vengeance. Simple changing the name to “_as” is enough to fix the problem, and then you can continue to use the long-dead PyXML on Python 2.7.3. Since this is the library that our code used in the past for Python XPath XML support, it’s what I used as a baseline to compare other libraries to. We also used Python’s minidom with PyXML, which is not exactly known for speed…

FourThought 4Suite: 2X

This is another dead project, having last been updated in 2006 as far as I can tell. It’s written by the same company that wrote most of what is in PyXML. (According to Wikipedia, it’s also the same company that brought you PowerPoint?) Unfortunately their corporate website seems to be down, meaning that they’re likely just as dead as their 4 Suite package.  Fortunately the great thing about places like SourceForge, besides the whole open-source thing, is that they’re also a great repository for dead packages and code.

The advantage of 4Suite is that because it was written by the same people as PyXML, 4Suite can be used with minimal code changes. It contains the PyXML API with very few differences. It just adds a whole lot more. But the one big differences is that you don’t use Python’s minidom, you use 4Suite’s cDomlette. Cute. And yes, it’s compiled code. And at least on CPython, it runs faster. It’s a little more than twice as fast as PyXML.

libxml2: 2.5X

Finally, a living breathing project! Based on a Python wrapping of the Gnome XML parser written in C, libxml2 is a breath of fresh air. Err … sort of. There’s a nice object-oriented wrapper written in Python. Which would be good … if it were documented. But darned if I can find any API documentation. And since the libxml2 wrapper changes the API dramatically from the original C code that it came from, it takes a bit of figuring out to use. It’s also slow. Oh, sure, the two-and-a-half times the performance of PyXML seems great. It’s even better than 4Suite. Barely. But if you’ll read through the rest, you’ll see it’s not so impressive after all, and there’s actually a very useful alternative right under its nose.

libxml2.libxml2mod: 6X

Yes, that’s right, it’s still technically the same libxml2 module as above. But if you load the libxml2mod.pyd file directly and skip the object-oriented Pythonic wrapper, going straight to the literal Gnome libxml2 APIs, you’ll have a lot more programming work (as the API is a lot more effort to code to) with a much better performance of six times the speed of PyXML. And it fully supports XPath. Who could ask for more?

Well, I could, actually. I don’t know if it’s the distribution that I got, or if it’s just not fully wrapped, or what, but there were some pieces of the Gnome C-code’s API missing from the Python libxml2mod.pyd file. The largest omission to me was XPath’s compile operation was completely missing. Since this can be vital to improving performance of executing an evaluate query across multiple nodes, it makes the 6X speed improvement even more impressive, as I was forced to do things the slow way, without compile. Which can of course be done. But it just makes you wonder, because the Gnome library definitely has this API, so it’s a mystery why the libxml2.pyd file didn’t.

PyQt4: -2.5X

If you’re using Qt4 as your Python GUI, you might as well use the Qt4 XML parser … right?

Well, maybe not.

Now don’t get me wrong. I love Qt.

Or at least, I loved the Qt that Trolltech put out.

But ever since Nokia bought Qt, it’s gone downhill. Fast. And this is a perfect example, right here.

The Qt4 XML parser is the darndest most complicated pile of API I’ve ever run across. Oh, it’s highly flexible. In theory. And it fully supports XPath. … In theory.  (I certainly haven’t tested every last feature.) But darned if I didn’t run into all sorts of mess just trying to convert the benchmark code to using Qt4’s XML parser. It was even worse when a bug (I don’t know if it’s in Qt4 or PyQt4) prevented me from evaluating to a QString, like you’re supposed to be able to do. So simple property lookups required the full QXmlResulItems overkill where I resolve the first result item from my results class instance, get the model index from that item, then use the model pointer from the index with the index to resolve it into a string.  Instead of just getting the first string, like I’d wanted and like it should have been able to do.  And not only is the API a mess (a highly flexible mess, but still a mess all the same), but it’s also two and a half times slower than PyXML. I honestly didn’t even think that it would be possible to write an XML parser slower for CPython than a pure-Python implementation that uses a DOM no less. Surely the C++ compiled-code PyQt4 would have a much faster XML parser than PyXML, right?

Well, apparently not!  As my benchmarks showed.

It was slow. Really slow.

Three-legged horse at the racetrack slow!

So I would highly suggest, to anyone using Qt4, DO NOT USE QT4’s XML PARSER! It’s that bad. To code for, and in performance. Find yourself another library for your XML needs. Trust me, you’ll be much happier that way.

I can only hope now that Digia owns Qt that some of these horrendous trainwrecks that have plagued Qt4 can finally be sorted out over time. Not likely to be seen in Qt5 though, as that’s still Nokia’s aborted afterbirth. Digia probably won’t get things straightened out until Qt6. And goodness knows how many years away that could end up being! :(

It’s hard to believe that with these lovely landmines in Qt that I still love it.  But the thing is, as bad as some parts of Qt are, no one has ever come close to doing anything better as an all-around solution to platform independent computer programming.  I just wish the original integrity of Trolltech had even remotely carried on to Nokia.  I just hope that Digia can give back some of the polish that Qt once had.

ElementTree: 3X

So I know, I already said that ElementTree doesn’t really support XPath properly yet. I really wish that it did. It’d be nice if I could just use the libraries built into Python for everything, and a good XML parser seems like a no-brainer. But for whatever reason, XPath is not really a part of ElementTree. They have kind of added beginning support to XPath type evaluate strings into the ElementTree find/findall queries, but a full implementation of XPath it is not. It doesn’t even support the full XPath string standard there.

Still, at least for enough of our use case, I was able to code for ElementTree. Converting the code from a full on XPath PyXML implementation to ElementTree and its lame partial implementation of XPath-based queries wasn’t as much work as it could have been. It’s nowhere near as much work as, say, converting the code to PyQt4, or even to libxml2. Which was pleasantly surprising. It’s a nice simple API, so I can see why people love it.

And the performance? It’s about three times faster than PyXML, making it a fair improvement. For a pure-Python implementation it’s actually quite amazing to squeeze that much out. But then, there’s a reason people don’t use DOM anymore. But the real treat comes next.

cElementTree: 18X

And here we have a real winner! Also included in Python, it’s the same API as ElementTree, just a very well written compiled-code implementation wrapped for Python. The same code that ran my ElementTree port also ran cElementTree with only the library name changing. Exactly like it should.

And the results were astounding. The real-use-case benchmark of XML parsing was a whopping eighteen times faster than our old PyXML code. Ding-dong, the DOM is dead!

Of course the problem is, all of our existing code is written for DOM using PyXML, so it’ll take a while to convert all of that to cElementTree.

As a side note, if any PyPy enthusiasts want to know why CPython programmers can’t convert to PyPy just yet (maybe not ever) here’s the reason why. A well-wrapped compiled code library runs like a champ in CPython. As a result, a lot of us big data/number crunchers have lots of compiled code in our Python projects. And since PyPy only just barely even runs compiled code, slowing things down far worse in that than native Python code, this leaves a lot of us CPython folks out in the cold. If you want the serious data crunchers to switch to PyPy then you have to start taking that compiled code lag more seriously or else we’re never going to be able to join you in your fancy little JIT Python interpreter’s dance.

Conclusion

So if you’re a Python 2.7 programmer looking for the best XPath XML parser ever, well, if you’re staying true to XPath at least, I’d say go with libxml2.

However, if you can swing it (and you’ll need to really evaluate your code to determine this) you might be able to get away with the rather unfinished XPath implementation in cElementTree. In which case you won’t need to install any third-party package for XML parsing and you’ll get blinding performance out the asterisk. (And obviously, if you’re coming at a whole new XML parser, and you don’t need XPath at all, then go with cElementTree since it’s what everyone in Python land is using and it’s got great performance.)

Hopefully the all-Python ElementTree runs just as great on PyPy, giving the world a pretty well rounded solution.

If any ElementTree authors catch this, hey, could you please work on supporting XPath a little more seriously?

And finally, dear god of all things software, whatever you do, avoid Qt4’s XML parser like the plague! Unfortunately I can’t speak to Qt5 yet as there’s still a lot of untested theory there that, professionally, we just don’t want to even approach mucking about with yet. Let things get a few more minor version numbers under the hood and then we can re-evaluate a PyQt5 upgrade path. (Or maybe even PySide.) But even if Qt’s XML parsing gets a major performance improvement, the API is still just as likely to suck wet donkey fur for being so “flexible”. Seriously, what committee designed that API? It’s everything that you could ever need … without being anything that you’d ever want! Yeesh!

Today’s Tech Tip – When Losing It Is Winning

Tired of dialing a phone number only to be assailed by some business’s idea of The Most Useless Menu System Ever? Are you put off by endless levels of “Press ___ for ___,” madlibs that often don’t even fit whatever it is you called for? Well, here are some simple solutions that’ll keep you from dialing M for Murder:

1 ) Try dialing 0. You remember 0, don’t you? Back when dinosaurs roamed the Earth you could dial this simple number to get ahold of the operator. And in some systems this actually still works!

2 ) If 0 doesn’t do the trick, try saying “human being”. A number of phone systems will recognize these key words as meaning you’re not impressed.

3 ) And if all else fails? Swear! That’s right, dropping the F-bomb instead of dialing or saying anything menu-related will do the trick in many of these Annoyance Systems these days. The addition of “indicators of stress” is a helpful technique to get you to a real live person instead of Yet Another Layer Of Useless Menu Options. And in theory, yelling it angrily also helps.

But remember, just because swearing at the automated menu system might get you help sooner, swearing at the real person it connects you to likely will not.

Have your own tip for quickly navigating useless phone menus? Leave a comment to let us know!

Python – Transitioning from Numeric to NumPy Part 2 – What Exactly IS The Point Of Oldnumeric Again?

Okay, so here’s an update for you. In theory most of the Numeric to NumPy conversion using NumPy’s oldnumeric compatibility layer works just as detailed previously. There are however two exceptions / problems that I’ve found since then.

Problem 1 – Savespace

If you used Numeric array.savespace, you’re SoL. NumPy supports that like it supports returning the USA to British rule. Or in other words, not at all. Not even slightly. And if you ask why, you’ll no doubt catch all sorts of flak for even daring to ask, apparently. As if consuming half of the memory isn’t desirable when you don’t need the accuracy. And as if using graphics cards as GPGPUs isn’t catching on in exactly the kinds of fields you’d want a numerical library, where single precision FLOPs still greatly outperform double precision FLOPs to sickening proportions. So no, no reasons whatsoever for NumPy to support savespace whatsoever. (And yes, I am indeed rolling my eyes here.)

If you heavily used the savespace feature your spacesaver arrays are all upconverted to double now, regardless. Which might not matter to you, other than consuming vast amounts of memory.

Or … if you dared to ever typecheck (I know it’s not really a “Python thing” to typecheck, but sometimes needs must, especially if you dared to C++ Boost your Python library), it just might mean banging your head against a wall for who knows how many lines of code.

If you’re 1) insane and 2) a Python flexibility extremist, you can theoretically create a workaround for this problem, assuming you don’t mind performance penalties. By inheriting from the numpy.ndarray class you can create your own class that does support savespace and spacesaver properly. Which is a lot of work, frankly. Because every math operator needs an overwrite because spacesaver is like a virus, infecting any arrays it comes into contact with. And then all you’ve really done is just fixed the class. The extreme part comes next: You also rewrite every single function that creates an array (though really if you’re converting from Numeric to NumPy, you probably only need to do the ones that Numeric had, and only the arguments that Numeric supported) to return your class instead of a straight ndarray. Oh, and the fun Pythonic step, then replace the class and method pointers in oldnumeric with your fixed versions. Which if you do at the very beginning of your code (say, in your own module where you import numpy.oldnumeric) as long as you keep that modified version of oldnumeric in Python’s memory, Python will “cheat” by loading that one instead of reloading the module, so your fix will affect all of your application. Or you could just fix the oldnumeric import in the NumPy side. Or, if you were especially daring, you could fix NumPy itself to add back this feature that just about anyone, except for the NumPy authors, seems to comprehend having value not just in the past, but also forward-looking towards the days when you CUDA had a V8.

Problem 2 – Contiguous

Here’s another one that’ll catch you by surprise, but probably only matter if you foolishly wrote compiled code such as in C++ with Boost to speed up your Python. For some reason there seems to be a bug in NumPy ndarray where even though array data should be contiguous … it just sometimes isn’t. So if you actually check for that / require that in your compiled code, you just might be surprised at failures that by all reasoning shouldn’t be failing that particular check.

Of course there is a nice easy way to work with non-contiguous array data using PyArray_ContiguousFromObject. It’s pretty simple, but does require cleaning up your new Python object with Py_DECREF if you want to prevent memory leaks. Which could mean restructuring your whole method, depending on if you used return to exit before the end of the function. On the plus side though, if your array is contiguous (which if you’re running into this bug, it should have been in the first place) then there’s only an almost negligible performance hit as PyArray_ContiguousFromObject won’t actually copy your data into a new array. Of course if you did run into this bug, then you’re right, you’re going to somewhere hit the performance penalty as this approach copies your array data. But hey, at least it’ll still run. Whereas not doing this workaround could leave you in all sorts of trouble if your NumPy ndarray data isn’t contiguous when it should have been.

So in conclusion … WTF?!?!?!

Can you upgrade your Python code from Numeric to NumPy easily? Err … maybe. Hopefully? Kind of. It all depends on just what features of Numeric you used, because NumPy (and with it oldnumeric) clearly does not encapsulate 100% of Numeric. In the end, you might just find that rewriting a million lines of mixed Python and C++ code to use straight NumPy is about as much work as trying to cheat by using NumPy’s rather sorely incompatible oldnumeric. Why oldnumeric wasn’t written to be a lot more compatible to Numeric is beyond me. You’d expect that kind of incompatibility with straight NumPy, but not from a layer that’s sole purpose is to offer you backward compatibility. :( What was even the point?

Still, someone who didn’t use Numeric to extremes might find NumPy’s oldnumeric an easy solution. Maybe. I guess. Though I can’t imagine why you’d have been using Numeric in the first place instead of straight-up Python arrays … err … lists and tuples … if that’s the case.

Python and C++ Boost – Tips For A Numeric To NumPy Conversion

Chances are that if this matters to you, it’s something that you’ve already gone through. After all, anyone still using Numeric in Python in this day and age is working with an incredibly outdated environment. Still, sometimes it happens. Sometimes in business settings validating a new environment is not such an easy thing to do as it is in academic or hobby worlds. So just in case, here are my experiences of upgrading from Numeric to NumPy:

Tip 1 ) Replace Numeric with NumPy’s Old Numeric

The first trick is that NumPy contains most of what you need already in the numpy.oldnumeric module. This saves an awful lot of effort as you don’t have to rewrite random portions of some million lines of code. The incredible vast majority of the work involved is one simple Pythonic twist:

    import numpy.oldnumeric as Numeric

And if you’re concerned about remaining backward compatible with your old environment then, you can even add an exception handler to choose which is the right one to use:

    try:
        import Numeric
    except ImportError:
        import numpy.oldnumeric as Numeric

Now, similarly, if you used some of the modules in Numeric, such as LinearAlgebra, it’s still almost as simple. You can do:

    from numpy.oldnumeric import linear_algebra as LinearAlgebra

Or, again, if you need backward compatibility:

    try:
        import Numeric
        import LinearAlgebra
    except ImportError:
        import numpy.oldnumeric as Numeric
        from numpy.oldnumeric import linear_algebra as LinearAlgebra

Tip 2 ) Fix minor type inconsistencies

There are some differences between Numeric and NumPy’s Old Numeric, and those are primarily in how Old Numeric doesn’t handle types in quite the same way. The biggest is character versus string arrays and floating point arrays. Now, say you used a string as data for your array. In Numeric this results in an array of a character type, AKA Numeric.Character or ‘c’ with a size of the number of characters in your string. But in NumPy this results in an array of a string type with a size of 1! That’s not very compatible. The solution? Just specify the type when constructing your array. So instead of Numeric.array(“datadatadata”) use Numeric.array(“datadatadata”, Numeric.Character). Yep, that one is really that simple.

Slightly less simple, though you may not even realize it is happening, is similarly related. Say you had a Numeric array of a float type using “f” to define it. Something like Numeric.array([1., 2., 3.], “f”). In Numeric this “f” type specifier results in a match to Float64. Something you may or may not have expected. This is because Numeric has an interesting string matching algorithm. In Numeric you can have a Float0, a Float8, a Float16, a Float32, a Float64, or even a Float128. Each could be specified in a string if you desired instead of using Numeric’s constants. Which means that if you specify a string of “float”, it leaves Numeric to try and decide which length float you want. And Numeric, thinking smart, matches the default float type used by Python, so it’s a nice match to your data. Which, by the way, is a Float64, otherwise known as a double. And so in the above example, if you specify type, “f” to numeric, yep, you guessed it, you end up with an array of Float64.

Where things get tricky is that NumPy doesn’t have the same float types like Numeric does. It just has float32 and float64, AKA, “f” and “d” respectively. So the same “f” that gave you a Float64 in Numeric will give you a Float32 in NumPy!

Now this might not be much of a problem to you if you’re sticking with Python-only code. Then again, with less accuracy it might. But where it can really kick your asterisk is if you did something silly like wrote a compiled module (Who would do a silly thing like that?) and pass it your array as one of the arguments. If you added type checking into your compiled code, if for no other reason than just good coding practices, this can end up throwing you for a loop when suddenly your arrays are no longer matching a Float64 type! The fix, of course, is simply to use the more descriptive Numeric.Float64 type instead of “f”. Or if you’re lazy and that’s too much typing, at least switch to “d” which both Numeric and numpy.oldnumeric will interpret as Float64.

Assuredly, if you’re really into variable typing, chances are there are other places where NumPy’s Old Numeric did not match Numeric as closely as perhaps it should have. For example Numpy.Int16 is “s”, where as numpy.oldnumeric.Int16 is “h”. I’m not sure what affect that has on anything, having not used it. But I noticed it. Goodness knows what else there may be too.  Your best bet is to first not use strings to define your types, but to go to the defined constants, and second to test test test.

Tip 3 ) Fix Numeric / NumPy inconsistencies

Besides just minor type inconsistencies when creating arrays are the bigger inconsistencies, namely in the method of types. In Numeric type constants are characters. Your Numeric.Int32 is literally the character ‘i’. Whereas in NumPy a dtype class was created for handling types. You can construct a numpy.dtype(‘i’), but it’s nothing as simple as just a character. But NumPy is even worse than that in practice, because there’s also a ‘type’ type used in NumPy, and that’s what the numpy.int32 constant is a type of, for example. It’s not the same as a dtype. And as you can see, it’s already getting messy when it comes to choosing your type constants.

Oh, wait, it gets worse.

Because in Numeric an array has a .typecode() that defines what type the array is of. And because Numeric just uses characters as type definitions, it simply returns a character of the type. It’s easy and straight forward.

NumPy ndarrays have no such method. Oh, they do have a type specifier built in. But it’s not a method named .typecode(). It’s a variable named .dtype. And it is a numpy.dtype class instance. But the NumPy type constants are ‘type’, not dtype. Yes, NumPy just by itself is messy. But now add in that all of your Numeric code is looking to compare “if array.typecode() == ‘i’:” and you just walked into a whole mess of incompatibility.

If you don’t care at all about backward compatibility, then you’re fairly well set. You can just replace array.typecode() with array.dtype.char. Yes, that’s right, the dtype class has a char member variable that (mostly, except for differences outlined in tip 2) you can compare against. If you’re more brave you can try even replacing that character with a constant so “if array.dtype == numpy.int32:” is a little more descriptive and cleaner. If you’re just moving forward.

However, for those who need to continue to support both environments with Numeric and environments where NumPy has replaced Numeric, using numpy.oldnumeric or not, you’ve walked into a world of hurt where you’re best making your own module of helper functions, because this gross incompatibility will make doing a simple type comparison in a backward-compatible way very messy, especially if your old environment doesn’t have any version of NumPy to help bridge the gap.

And this is especially true because NumPy is compiled code, so you can’t just sneakily add a .typecode() method to the ndarray class like you normally could in Python. I suppose you could try to do so to the base code and recompile all of NumPy for it. But the bigger question remains why in the world it wasn’t put there in the first place just to remain backward compatible? Welcome to a minor headache. But still, all considered, a pretty small problem compared to what you could be going through right now.

Tip 4 ) Want C++ NumPy? How about a Boost?

If you’ve got that nasty compiled code mixed in with your Python, you just may be using C++, and that means you might even be using Boost. (That’s what I’ve been using anyway.) If so, you might have been disappointed to find that Boost has no native NumPy support built in. And that the authors of NumPy seem to have no interest whatsoever in Boost, preferring Fortran to C++. I guess for a numerical package, I can’t really blame them for that, as that is Fortran’s forte. But it can be awfully inconvenient to the other 99.99% of the world who have forgotten that Fortran is even a programming language. (If they ever even knew it.) Well, cry not, for an unofficial Boost layer for NumPy exists. Enter a GitHub project for ndarray.

Mostly, it’s pretty straight forward and you’ll barely have to change your C++ code at all to use it. One important difference however is in your module export you’ll have to add a boost::numpy::initialize() call immediately at the top so that Boost knows how to template NumPy in order to match your Python to your C++.

And if you’re maintaining backward compatibility, well, it’s not the cleanest thing to do with C++ being not quite as flexible about that as Python. I think the best way to go about it is to turn a function into a template function on the cpp side, double up on an overloaded declaration on the h side, and then in the cpp side again have the overloaded implementations just call the template function with the array type. But wait, the pain isn’t over yet, because then you have to change your export definitions to specify which of the overloaded methods to use, which gets a wee bit messy. Or if you don’t want that mess, simply append a _NUMERIC and _NUMPY to the ends of your declarations to keep them separate instead of overloading. That makes things a lot easier in the export, but doesn’t look quite as smooth.

But then there’s one more monkey wrench in the works if you use Microsoft Visual C++, in that the Boost.NumPy layer needs some tweaking to work in Microsoft Land because M$ has yet to implement templates correctly. So you’ll have to add a cxxflags=/DBOOST_ALL_NO_LIB to your Boost build or else you’ll get multiply defined functions in your libraries because Microsoft still isn’t smart enough to weed out duplicate definitions when using templates, so Boost.NumPy ends up fighting with Boost.Python on MSVC. Doh!

Oh, wait, it actually gets worse because I almost forgot that you won’t even get that far in the first place. Because Microsoft also hasn’t implemented variable length arrays yet. So you’ll have to fix a couple of places in the Boost.NumPy code where they do Py_intptr_t dims[nd]; to become Py_intptr_t* dims = new Py_intptr_t[nd]; And, of course, not immediately return because if you don’t sneak that delete[] dims; line in there you’re going to have memory leaks. Yay!

The problem being that, well, sadly, not many Python users are on Windows, so the Boost.NumPy authors of that GitHub project just haven’t tested it on MSVC, apparently. But it all can be made to work. Honest.

You can even go the extra step and add the Boost.NumPy code straight into the Boost codebase locally before you build it. I mean you’re going to build it anyway, right? Might as well. To get it to work with the Boost build system I had to rename the src directory to build so that bjam could find the Jamfile in there. No biggie.

Of course if you do try to use the Boost.NumPy on Windows, don’t even bother trying to use SCons.  It’s not that SCons won’t work on Windows, because it will.  That’s the point.  It’s that Boost.NumPy’s SCons script won’t.  Why doesn’t it worn on Windows?  Well, you can kind of put that on Python, and kind of on the Boost.NumPy authors.  They chose to use the one part of distutils that doesn’t have full functionality on Windows: distutils.sysconfig.  Now, looking at the latest documentation on Python, you wouldn’t even know that there’s a problem with distutils.sysconfig on Windows.  But if you look at the base sysconfig Python module that distutils (strangely) gets distutils.sysconfig from (why the same Python module is basically duplicated is beyond me) you find this almost non-existent warning in the sysconfig documentation about configuration variables, “Notice that on Windows, it’s a much smaller set.“  What that almost impossible to find warning means is basically that on Windows pretty much None of the configuration variables exist, so sysutils.get_config_var and sysutils.get_config_vars are pretty much useless on Windows.  Thereby causing the Boost.NumPy SCons script to fail horribly.  So just don’t even try.  Use Boost’s bjam instead on Windows.

Conclusion

So, it’s “just” that easy. Uh-huh. In that there’s pretty much nada in the way of documentation, you can imagine it took me a while to sort some of this out. Hence why I’m putting it on Ye Olde Interwebs now, so that hopefully if you’re stuck doing the same thing you won’t waste nearly so much time coming up with answers. Numeric may be dead, but with NumPy’s Old Numeric your Python code doesn’t need to go through a massive rewrite. A bit more work though if you had C++ code too, but it can be done, and almost as cleanly.

The Bard’s Tale – Windows 7 Or Bust!

inXile - The Bard's Tale - CD version - on Windows 7

inXile - The Bard's Tale - CD version - on Windows 7

Speaking recently of not re-paying for video games, music, etc. that you’ve already payed for just because someone doesn’t feel like supporting it on some new platform that should support it, that brings us to this Thanksgiving’s strangest epic adventures, the battle to install inXile’s The Bard’s Tale on my Windows 7 PC.

You see, many a year ago, when The Bard’s Tale came out, it was Windows XP that PCs used.  But as we all know, Windows 7, whilst theoretically supporting all kinds of backward compatibility options, doesn’t necessarily make things all that easy in reality.  (And Windows Vista, being, well, basically, Win 7 Alpha, has the same problems.  But I like to pretend that Windows Vista AKA Win ME2 doesn’t exist any more than Windows Millennium Edition did.  Some versions of Windows are just a bane upon the world, and Win ME and Win Vista are the two worst, by far.)

So hearken, fellow readers, to this the unofficial penultimate guide to playing The Bard’s Tale on a modern PC.

So what, exactly, is the problem with getting The Bard’s Tale to run on Windows 7?  To start with?  Installing it!

You see, in their infinite wisdom, Microsoft seems to have blocked some good old Widows Media Player APIs / DLLs from Windows 7.  The Bard’s Tale installer checks for them, and cannot find them, and so claims that you don’t have the right version of Windows Media Player 9 needed.  It helpfully offers you an install on the CD, but of course that won’t do you any good whatsoever.  We’re up to Windows Media Player 11, and Microsoft actually blocks those old versions that it doesn’t like anymore on Windows 7.

Not that you actually need the old version of Windows Media Player.  Oh no.  The few .WMV movies that The Bard’s Tale contains will run just fine with the latest version.  It’s just that the installer for The Bard’s Tale is too stupid to know that.  And inXile is too lazy to fix it.

You may have noticed, for a mere ten bucks, you can get The Bard’s Tale on Steam.  And presumably this will install on Windows 7.  But why re-pay for something you already bought?  Surely there’s a solution, right?

Well, there are two roads that you can go down.  (Three, I guess, if you count repaying for what you already own to download it on Steam.)

The key is that there are no registry settings or any such magic (like some programs need) necessary to get The Bard’s Tale running.  All that you actually need are the files.  Files files files.  The installer is supposed to give you those files, but if the installer won’t run, then we just have to do it ourselves.

Method 1) Copy An Install Off Of A Windows XP Box

I know, I know.  That requires you to have a Windows XP box handy.  I verified that this works quite well in a very painful way.  I shared the Blu-Ray drive on my PC over the network to my Viliv S5 UMPC “laptop” so that I could run the installer.  Painful, but doable.  And then having successfully installed The Bard’s Tale I copied the files back over to my Windows 7 box and nuked the install on my Windows XP laptop.  Slow and painful, but successful.  If you have an old Windows XP box around somewhere, you can do it this way too.  (Even more easily if you don’t have to share your CD drive because your Windows XP box doesn’t have one.)

OR there’s always Microsoft’s ultimate solution to Windows XP compatibility:  Windows XP Mode.  It’s essentially a virtual PC running a copy of Windows XP.  Slow and painful, but you don’t need another computer if you have this set up.

Method 2) Unpackage The Files From The CDs Manually

You don’t actually need a Windows XP box or Windows XP Mode to get The Bard’s Tale installed though.  Because you can do the work of the installer yourself!  Each CD has on it a Disk1.cab file.  (Well, okay, technically CD1 has Disk1.cab, CD2 has Disk2.cab, etc.)  And you can extract the files from these cab files easily enough.  I used 7-Zip, which is a great free tool that I use all the time, but I’m sure there are other options too.  Just create your The Bard’s Tale directory and extract the CAB files from each CD into it and you’re almost good to go.

Almost.

First, to help things along, make your The Bard’s Tale directory somewhere not in the typical Program Files structure, because Windows 7 likes to be very protective of these and that can both cause problems extracting your files into there as well as getting The Bard’s Tale to run.  (Though running The Bard’s Tale in Administrator mode fixes that.  Actually, so does running 7-Zip in Administrator mode too, which is easy enough.  But so is just not using the Program Files directory structure in the first place.)

And second, there’s still a vital step missing.  The Bard’s Tale will run, but most of the voices will be missing.  (Even though the sound effects and music are there.)  And since most of what makes The Bard’s Tale entertaining are the conversations, you’ll want to go through this next step.

On Disk 6 is another folder as well, named Sounds.  You have to manually copy these .VWB files into your “The Bard’s Tale\Res\Sounds” folder.  And the WBCWin.exe into it as well.  And then the even more annoying part, these .VWB files are compressed versions that you use WBCWin.exe to uncompress.  You have to go into a command prompt window to do it.  (You’ll want to start your command prompt in administrator mode too, most likely.)  CD to the “The Bard’s Tale\Res\Sounds” directory where you dumped the .VWB files and WBCWin.exe, and then enter the command “WBCWin.exe *.vwb .\” … without the quotes, obviously.  This will take a good while as it uncompresses the .VWB files into much larger .XWB files.

And there you have it, all of the voiceovers for the many many many in-game cutscenes.  You’re now “installed”.

DVD Version Note:

Supposedly the DVD disk version of The Bard’s Tale is much easier to use, as it just has all of the files there for you to copy, flat out.  No painful de-CABing.  No even more painful manual .VWB to .XWB decompression.  Just copy and go.  Not having the DVD version I can’t verify this personally, but that’s the scuttlebutt on Ye Olde Interwebs.

But what else do I need to know?

There’s a few other important steps to playing The Bard’s Tale now that you’ve installed it.  The first is that you’ll probably want to right-click on the EXE and set it up to run in Windows XP SP3 compatibility mode.  Just in case.  And heck, while you’re at it, you’ll probably want to create some desktop and/or Start Menu shortcuts or something for yourself too.

The second is that  in “The Bard’s Tale\Config” directory is The Bard’s Setup.exe which you will most definitely want to run.  This is where you change the resolution used by The Bard’s Tale.  The default is a meager 640×480, which looks like crap.  You’ll want to up it as close to your resolution as you can get, most likely.  (Keeping in mind the aspect ratio of your native resolution if you can’t match yours exactly.)

In game you’ll also want to crank down the particles in the video options.  There are even manual hex fixes for this out there, supposedly, as even the lowest particle setting can still grind some scenes down to some very low frames per second.  For the most part though you’ll be fine with just simply reducing this to the lowest setting.

Another important thing to note is that if you play the game and it feels like it’s in slow motion, it is!  But don’t worry, it’s easily fixed.  You can quickly test this by listening to the drunks sing their beer song in the very beginning.  If “follow the bouncing ball” is choppy because it can’t keep up, you’re in slow motion.  The fault of this is … V-Sync.  It reportedly happens more with GeForce video cards than any others.  Fortunately it should be easy enough to turn off.  (These days you should even be able to manually turn V-Sync off just for The Bard’s Tale if it’s something you want on all the rest of the time.)

There’s also a patch for The Bard’s Tale floating about.  Does it do anything?  I don’t know.  I’ve tried it.  Meh.  Heck if I can tell if it actually even fixes anything.  If it even installs.  It’s not very informative.

And last, but certainly not least, there is also floating about the internet a no-CD crack for The Bard’s Tale.  Now being unofficial I can’t say as I support using it.  Because, you know, in theory it violates your warranty?  (On your obviously already out-of-warranty game since it’s six years old by this point.)  Your mileage may vary.  The choice is up to you.  And such crap.  No idea what legality issues using the no-CD crack may constitute, but technologically it’s there, and I’m sure it works just fine.

So there you have it!  The Bard’s Tale on Windows 7!

Even after all these years, it’s still a great game.  You too can install and play The Bard’s Tale on Windows 7, even though it isn’t even remotely supported, and the installer won’t run, and inXile won’t help you in the slightest.  (Let alone actually spend ten seconds fixing their installer with some kind of simple downloadable installer that looks at your CDs/DVDs the same as the original installer should.)  You don’t have to pay on Steam to download the game if you already bought it years ago and still have the disks around.  You just have to be a little more hands-on to get it installed.  Once done, it works just fine.  :)