Chances are that if this matters to you, it’s something that you’ve already gone through. After all, anyone still using Numeric in Python in this day and age is working with an incredibly outdated environment. Still, sometimes it happens. Sometimes in business settings validating a new environment is not such an easy thing to do as it is in academic or hobby worlds. So just in case, here are my experiences of upgrading from Numeric to NumPy:
Tip 1 ) Replace Numeric with NumPy’s Old Numeric
The first trick is that NumPy contains most of what you need already in the numpy.oldnumeric module. This saves an awful lot of effort as you don’t have to rewrite random portions of some million lines of code. The incredible vast majority of the work involved is one simple Pythonic twist:
import numpy.oldnumeric as Numeric
And if you’re concerned about remaining backward compatible with your old environment then, you can even add an exception handler to choose which is the right one to use:
try:
import Numeric
except ImportError:
import numpy.oldnumeric as Numeric
Now, similarly, if you used some of the modules in Numeric, such as LinearAlgebra, it’s still almost as simple. You can do:
from numpy.oldnumeric import linear_algebra as LinearAlgebra
Or, again, if you need backward compatibility:
try:
import Numeric
import LinearAlgebra
except ImportError:
import numpy.oldnumeric as Numeric
from numpy.oldnumeric import linear_algebra as LinearAlgebra
Tip 2 ) Fix minor type inconsistencies
There are some differences between Numeric and NumPy’s Old Numeric, and those are primarily in how Old Numeric doesn’t handle types in quite the same way. The biggest is character versus string arrays and floating point arrays. Now, say you used a string as data for your array. In Numeric this results in an array of a character type, AKA Numeric.Character or ‘c’ with a size of the number of characters in your string. But in NumPy this results in an array of a string type with a size of 1! That’s not very compatible. The solution? Just specify the type when constructing your array. So instead of Numeric.array(“datadatadata”) use Numeric.array(“datadatadata”, Numeric.Character). Yep, that one is really that simple.
Slightly less simple, though you may not even realize it is happening, is similarly related. Say you had a Numeric array of a float type using “f” to define it. Something like Numeric.array([1., 2., 3.], “f”). In Numeric this “f” type specifier results in a match to Float64. Something you may or may not have expected. This is because Numeric has an interesting string matching algorithm. In Numeric you can have a Float0, a Float8, a Float16, a Float32, a Float64, or even a Float128. Each could be specified in a string if you desired instead of using Numeric’s constants. Which means that if you specify a string of “float”, it leaves Numeric to try and decide which length float you want. And Numeric, thinking smart, matches the default float type used by Python, so it’s a nice match to your data. Which, by the way, is a Float64, otherwise known as a double. And so in the above example, if you specify type, “f” to numeric, yep, you guessed it, you end up with an array of Float64.
Where things get tricky is that NumPy doesn’t have the same float types like Numeric does. It just has float32 and float64, AKA, “f” and “d” respectively. So the same “f” that gave you a Float64 in Numeric will give you a Float32 in NumPy!
Now this might not be much of a problem to you if you’re sticking with Python-only code. Then again, with less accuracy it might. But where it can really kick your asterisk is if you did something silly like wrote a compiled module (Who would do a silly thing like that?) and pass it your array as one of the arguments. If you added type checking into your compiled code, if for no other reason than just good coding practices, this can end up throwing you for a loop when suddenly your arrays are no longer matching a Float64 type! The fix, of course, is simply to use the more descriptive Numeric.Float64 type instead of “f”. Or if you’re lazy and that’s too much typing, at least switch to “d” which both Numeric and numpy.oldnumeric will interpret as Float64.
Assuredly, if you’re really into variable typing, chances are there are other places where NumPy’s Old Numeric did not match Numeric as closely as perhaps it should have. For example Numpy.Int16 is “s”, where as numpy.oldnumeric.Int16 is “h”. I’m not sure what affect that has on anything, having not used it. But I noticed it. Goodness knows what else there may be too. Your best bet is to first not use strings to define your types, but to go to the defined constants, and second to test test test.
Tip 3 ) Fix Numeric / NumPy inconsistencies
Besides just minor type inconsistencies when creating arrays are the bigger inconsistencies, namely in the method of types. In Numeric type constants are characters. Your Numeric.Int32 is literally the character ‘i’. Whereas in NumPy a dtype class was created for handling types. You can construct a numpy.dtype(‘i’), but it’s nothing as simple as just a character. But NumPy is even worse than that in practice, because there’s also a ‘type’ type used in NumPy, and that’s what the numpy.int32 constant is a type of, for example. It’s not the same as a dtype. And as you can see, it’s already getting messy when it comes to choosing your type constants.
Oh, wait, it gets worse.
Because in Numeric an array has a .typecode() that defines what type the array is of. And because Numeric just uses characters as type definitions, it simply returns a character of the type. It’s easy and straight forward.
NumPy ndarrays have no such method. Oh, they do have a type specifier built in. But it’s not a method named .typecode(). It’s a variable named .dtype. And it is a numpy.dtype class instance. But the NumPy type constants are ‘type’, not dtype. Yes, NumPy just by itself is messy. But now add in that all of your Numeric code is looking to compare “if array.typecode() == ‘i’:” and you just walked into a whole mess of incompatibility.
If you don’t care at all about backward compatibility, then you’re fairly well set. You can just replace array.typecode() with array.dtype.char. Yes, that’s right, the dtype class has a char member variable that (mostly, except for differences outlined in tip 2) you can compare against. If you’re more brave you can try even replacing that character with a constant so “if array.dtype == numpy.int32:” is a little more descriptive and cleaner. If you’re just moving forward.
However, for those who need to continue to support both environments with Numeric and environments where NumPy has replaced Numeric, using numpy.oldnumeric or not, you’ve walked into a world of hurt where you’re best making your own module of helper functions, because this gross incompatibility will make doing a simple type comparison in a backward-compatible way very messy, especially if your old environment doesn’t have any version of NumPy to help bridge the gap.
And this is especially true because NumPy is compiled code, so you can’t just sneakily add a .typecode() method to the ndarray class like you normally could in Python. I suppose you could try to do so to the base code and recompile all of NumPy for it. But the bigger question remains why in the world it wasn’t put there in the first place just to remain backward compatible? Welcome to a minor headache. But still, all considered, a pretty small problem compared to what you could be going through right now.
Tip 4 ) Want C++ NumPy? How about a Boost?
If you’ve got that nasty compiled code mixed in with your Python, you just may be using C++, and that means you might even be using Boost. (That’s what I’ve been using anyway.) If so, you might have been disappointed to find that Boost has no native NumPy support built in. And that the authors of NumPy seem to have no interest whatsoever in Boost, preferring Fortran to C++. I guess for a numerical package, I can’t really blame them for that, as that is Fortran’s forte. But it can be awfully inconvenient to the other 99.99% of the world who have forgotten that Fortran is even a programming language. (If they ever even knew it.) Well, cry not, for an unofficial Boost layer for NumPy exists. Enter a GitHub project for ndarray.
Mostly, it’s pretty straight forward and you’ll barely have to change your C++ code at all to use it. One important difference however is in your module export you’ll have to add a boost::numpy::initialize() call immediately at the top so that Boost knows how to template NumPy in order to match your Python to your C++.
And if you’re maintaining backward compatibility, well, it’s not the cleanest thing to do with C++ being not quite as flexible about that as Python. I think the best way to go about it is to turn a function into a template function on the cpp side, double up on an overloaded declaration on the h side, and then in the cpp side again have the overloaded implementations just call the template function with the array type. But wait, the pain isn’t over yet, because then you have to change your export definitions to specify which of the overloaded methods to use, which gets a wee bit messy. Or if you don’t want that mess, simply append a _NUMERIC and _NUMPY to the ends of your declarations to keep them separate instead of overloading. That makes things a lot easier in the export, but doesn’t look quite as smooth.
But then there’s one more monkey wrench in the works if you use Microsoft Visual C++, in that the Boost.NumPy layer needs some tweaking to work in Microsoft Land because M$ has yet to implement templates correctly. So you’ll have to add a cxxflags=/DBOOST_ALL_NO_LIB to your Boost build or else you’ll get multiply defined functions in your libraries because Microsoft still isn’t smart enough to weed out duplicate definitions when using templates, so Boost.NumPy ends up fighting with Boost.Python on MSVC. Doh!
Oh, wait, it actually gets worse because I almost forgot that you won’t even get that far in the first place. Because Microsoft also hasn’t implemented variable length arrays yet. So you’ll have to fix a couple of places in the Boost.NumPy code where they do Py_intptr_t dims[nd]; to become Py_intptr_t* dims = new Py_intptr_t[nd]; And, of course, not immediately return because if you don’t sneak that delete[] dims; line in there you’re going to have memory leaks. Yay!
The problem being that, well, sadly, not many Python users are on Windows, so the Boost.NumPy authors of that GitHub project just haven’t tested it on MSVC, apparently. But it all can be made to work. Honest.
You can even go the extra step and add the Boost.NumPy code straight into the Boost codebase locally before you build it. I mean you’re going to build it anyway, right? Might as well. To get it to work with the Boost build system I had to rename the src directory to build so that bjam could find the Jamfile in there. No biggie.
Of course if you do try to use the Boost.NumPy on Windows, don’t even bother trying to use SCons. It’s not that SCons won’t work on Windows, because it will. That’s the point. It’s that Boost.NumPy’s SCons script won’t. Why doesn’t it worn on Windows? Well, you can kind of put that on Python, and kind of on the Boost.NumPy authors. They chose to use the one part of distutils that doesn’t have full functionality on Windows: distutils.sysconfig. Now, looking at the latest documentation on Python, you wouldn’t even know that there’s a problem with distutils.sysconfig on Windows. But if you look at the base sysconfig Python module that distutils (strangely) gets distutils.sysconfig from (why the same Python module is basically duplicated is beyond me) you find this almost non-existent warning in the sysconfig documentation about configuration variables, “Notice that on Windows, it’s a much smaller set.“ What that almost impossible to find warning means is basically that on Windows pretty much None of the configuration variables exist, so sysutils.get_config_var and sysutils.get_config_vars are pretty much useless on Windows. Thereby causing the Boost.NumPy SCons script to fail horribly. So just don’t even try. Use Boost’s bjam instead on Windows.
Conclusion
So, it’s “just” that easy. Uh-huh. In that there’s pretty much nada in the way of documentation, you can imagine it took me a while to sort some of this out. Hence why I’m putting it on Ye Olde Interwebs now, so that hopefully if you’re stuck doing the same thing you won’t waste nearly so much time coming up with answers. Numeric may be dead, but with NumPy’s Old Numeric your Python code doesn’t need to go through a massive rewrite. A bit more work though if you had C++ code too, but it can be done, and almost as cleanly.