Wednesday, May 26, 2010

The Trouble With Satchmo

Conversations on the web about Satchmo, the most popular open source Django e-commerce platform, tend to be between two types of developers. The first kind encounters some problems with installing Satchmo, or hits a snag in deploying one of the features. Naturally, they pose the question: does it have to be this hard? Are there any alternatives?

At this point, the second type of developer pipes up and says something like, "If you think it's too hard, then you're not trying hard enough. Learn more and it'll be easier for you. Satchmo is just fine, so quit your whining."

I can see where both sides of these kinds of debates are coming from. I've gone through the work of getting Satchmo running on Linux machines, deploying it to servers, and even gotten it running on Windows boxes. And it is really a pain in the tush. For all of its features, there are a ton of dependencies that have to be installed and properly configured on your system before you can get it up and running. So I can empathize with the first kind of developer.

When you get right down to it, though, the second type of developer mentioned above is also correct. As arrogant as they can be, if you spend some time learning how to set up Django development environments, how to configure your PYTHONPATH, and so on, then it becomes much easier to install and configure Satchmo. You still run into problems, but you know how to handle them, and you know how to reduce the amount of repetitious work that needs to be done.

Satchmo is an e-commerce solution that was created by programmers almost exclusively for use by other programmers. If you're an experienced Django programmer, and your spouse wants to set up an online store to sell toys for dogs, then Satchmo is the logical choice. All of Satchmo's installation headaches, catalog management difficulties through the admin interface, and documentation shortcomings can be dealt with if you know Django or you've got a Django programmer helping you out. (Maybe.)

Don't get me wrong; I like Satchmo just fine. It's got a very large community of developers behind it, and it's certainly better than anything that one individual could create by themselves. But it's definitely not a great way to start learning Django.

The kind of developer who looks at Satchmo and thinks, "Man, can't we make this easier?" recognizes that while programmers can learn programming solutions in order to make their jobs easier, not everyone who wants to do e-commerce online is a programmer capable of learning and applying those solutions.

That's not to say that we can get to the point where programmers are unnecessary. Consider the PHP e-commerce community: for many years, PHP developers wrestled with the decision of creating their own codebase, or going with one of the many open source solutions that were available. Many of them were fraught with the same kinds of installation and maintenance headaches that plague Satchmo, most of which could be alleviated if you became more experienced with PHP.

But that wasn't quite good enough for the people who developed Magento, which, while still not perfect, is an extremely popular e-commerce framework among both developers and people in business. One of the main reasons? It's easy to install and setup. Which brings up a critical observation: I don't think ease of use is appealing only to non-programmers.

Here's the important thing to note: Magento might be open source, but like MySQL, they manage to make money from the software using a freemium business model. "Open source" does not equate to a lack of business sense or profit opportunities.

I'm not completely satisfied with Satchmo. Personally, if you like it and want to join the community that is helping develop it, don't let me stop you. Those developers are doing great work, and they deserve all the help they can get.

In the Django community, Satchmo is almost certainly the elephant in the room, but I don't think that should stop anyone who doesn't like it from creating their own e-commerce platform. If Satchmo is too much of a pain for you, why not try creating something of your own? And that was part of my motivation for writing my book about Django e-commerce: for those developers for whom Satchmo just isn't cutting it. I wanted to empower the readers of my book (who are probably way smarter than me, and better programmers) to scratch their own itch if they so desired, to run off and start their own projects. If nothing else, I hope it gives programmers the ability to get a handle on doing e-commerce with Django so that they have an easier time with Satchmo.

But I think the opportunities are many, and the potential for innovation is huge, in the Django e-commerce realm. I'm really can't wait to see in a few months' time what awesome projects Django programmers are creating in someone's basement right now.

Monday, May 24, 2010

Misspellings In Search

Ever heard of a word ladder? It's a game invented by the Lewis Carroll, the author of Alice in Wonderland. Here's how it works: you're given two words, and you start with one of them. Change one letter in the word at a time, or add or remove a letter, to arrive at the second word. At each step along the way, the changes you make have to result in valid words.

Here's an example: how do you turn "LEAD" into "GOLD"? Try it yourself right now if you're interested in solving the puzzle.

Here's the solution I devised back when I tackled this problem in middle school, changing one letter at a time:


The key here is that I changed the beginning "L" to an "H", since "LELD" and "LOLD" are not real words. It's an extra step that's needed to conform to the rules of the game. But as you can see, it takes four "steps" on the word ladder in order to get from the starting word to the ending one.

Counting the number of steps from the first word to the second can be a useful number to calculate when you want to offer spelling corrections to users who are doing internal searches on your site. You know how Google offers you polite suggestions when you misspell a word? (e.g. "Did you mean: 'electric guitar'")

The number of character insertions, deletions, and substitutions required to convert one string to another is known as the Levenshtein distance. Unlike word ladders, the Levenshtein distance doesn't care if each step along the way results in an actual word. So, the steps between "stretch" and "straight" would be as follows:


The Levenshtein distance between the two string is 4. The closer two words are together, the lower the distance score between the two of them will be.

What does this mean for misspelled search terms? If someone enters some search text on your site, and there are zero matching results returned, you might consider sweeping through your database and looking for strings or phrases with a very low Levenshtein distance, in an effort to catch misspelled words.

If someone enters "accoustic guitar", they might get no matching results. But, if your product data contains the phrase "acoustic guitar", you might be able to catch that and display a hyperlink offering the alternative "Did you mean...?" text that will repeat the search with the new word if clicked.

So how do you calculate the Levenshtein distance in Python? There are a few ready-to-use Python functions listed on this Levenshtein Distance wiki page. I've been using the first one listed in a couple of my projects and it seems to work very well.

Naturally, this is not a substitution for a complete full-text search package like Django Sphinx or your actual search algorithm; it's just something you might consider falling back on. Also, you want to be very careful about how you implement this on a live site where you're searching through thousands of records. Computing the Levenshtein distance between some search text and the words and phrases in a TextField (like, for example, in a product description field), would be a very slow, inefficient way to search for possible spelling corrections. If you do implement something, make sure you test its performance before releasing it into the wild.

However, if you have a small site in development, and want to try your hand at helping guide the user if they've possibly misspelled a search term, I found the Levenshtein distance is a great quick-and-dirty way to get started.