20.5.09

Natural Language

I took a look at Wolfram|Alpha last night. It's a new "computational knowledge engine" developed by Stephen Wolfram, British child prodigy physicist, mathematician and now entrepeneur.

I must admit I was disappointed: it works well for limited purposes.

Here's a review by cnet's Rafe Needleman:



He got a sneek preview earlier this month, and his comments match well with what I found. I've had the site bookmarked for a while, checking it every now and then to see if it was live. So like many things, the idea was better than the reality.

The idea is nothing new.

The idea is that there's an amazing amount of information on the internet, and it would be wonderful if you could simply ask a question and get an answer! Search engines (for which Google has become the benchmark as well as the shorthand) just give a list of locations that may contain the information you want. Some do it visually and others try to organise sites thematically, but they basically all do the same thing. From the results, you have to find your answer and sometimes it's not easy.

What wiki answers and other similar sites (such as yahoo!Xtra in New Zealand) try to do is link you to people who can answer your question. The answers are as fallible as the humans who provide them but the idea is that as time passes, a big enough database of answers accrues and you can start to sort the wheat from the chaff, helped by a friendly answer rating system. It's darwinian in its simplicity.

But if your question has a factual answer (some do!) surely there's a way of getting a straight answer without the need for a social network. Well, it's not as simple as you might think! Language is complex and this requires the computer to understand your question, not just dredge up locations which contain your phrase. (Read this wikipedia article on natural language processing for a quick overview of some of the difficulties involved.) START, a "natural language question answering system" set up by a crew at MIT, has been online since 1993. START's aim is "to supply users with 'just the right information' instead of merely providing a list of hits." You can ask questions like "What is the population of France?" and it will give a single answer, along with a source. The trouble is Google does pretty much the same thing, and you get to choose your source.

Wolfram|Alpha is in a different league. Take a look at Stephen Wolfram's video intro on the site and have a play around! Nova Spivack took a trial version through its paces a while ago and made extensive comments on it if you're interested in reading those. (He has a site/product called twine, which is one of these web 2.0 things designed to make the web more user-friendly by linking you to new content, based on what your interests are.) Wolfram|Alpha can compare data sets and, in that sense, it can provide answers to questions that have never been asked before.

The impression I'm left with, though, is that there's too strong a human hand in Wolfram|Alpha. The data is curated and, therefore, limited. Instead of taking you straight to the internet, it takes you to a database built up from selected information that's available online (and possibly some that's not available online). The curators will be chasing their tails trying to keep up with everything that users could possibly want to find out about. And the curators' biases inevitably come out strongly. Perhaps I would find it better if I were more mathematically inclined, for example.

Surely, the point is that the internet is a big, free-for-all, self-regulating entity and that's precisely its strength. No one can collate or organise the internet, let alone the sum of human knowledge.