« August 2007 | Main | February 2008 »

October 2007 Archives

October 22, 2007

Columbia State Historic Park

This weekend, Betsy wanted to look at a couple of horses out in Gold Country. We stayed at an old hotel in Columbia State Historic Park, which is a mostly functional gold-rush style town. They're definitely more interested in convenience than accuracy (e.g., they have indoor plumbing and electricity) but the boys enjoyed it and it at least shows you what it might look like at a high level.

I brought along my new EF 70-200mm 2.8 lense, which is big and heavy but takes good pictures. I like especially the 2.8f aperture, which allows you to have extremely narrow depth of field. I did get about fifteen minutes to wander around and do kid-free photography, and took this picture:

(Canon Rebel XTI, EF 70-200mm/f2.8 lens, ISO 100, 1/15th sec., f2.8)

Definitely learning a lot about depth of field, especially with autofocus. My old lens would do f3.5 maximum open, but zoomed in at its top of 135mm (which was most of the time) the widest open it could manage was f5.6. I'm also very happy that the extra two stops means even more low-lighting condition shots are possible.

Bottom line on this lens is that it's really heavy, but takes great pictures. Now, though, I need to get a good smaller lens (maybe the EF-S 17-55?). My old lens went down to 28mm, so I never felt like I had to carry two lenses. Personally speaking, I think Canon's missing a slot in their f2.8 lens lineup - I'd love to have something more like my old 28-135mm lens in an f2.8 for when I just want to grab the camera and go without hauling a bunch of other stuff.

October 25, 2007

Found in Translation

Many years ago - I'd guess four, or five - I read that Google was developing a new technology for machine translation of human languages. This is a famously difficult task, one that people were initially optimistic would be quite tractable, but has turned out not to be. Google's new approach was to take "billions of words" of source material that had been painstakingly translated by humans, and have the computer build statistical rules for translation. It turns out there's a lot of this source material, out there - things like UN internal documents and EU laws, to say nothing of famous old novels like Moby Dick.

I heard nothing else about it, and had almost forgotten about it. It seems I'd missed the announcement in spring of 2006 that they'd gone live with a version of this for English:Arabic and vice-versa. Today, they switched to the new system for all languages.

Their old system was provided by Systran. Like MapQuest, they're now in the unenviable position of having Google suddenly put out a product that's noticeably better than yours. Since Systran is still used by the best-name-in-class Babel Fish, it's quite possible to compare the two methods.

Some of the features in the new Google Translate may not be new, but I'm impressed with them. I've eschewed automatic translation in the past as being something that, if you tried to read Moby Dick with it, would probably be able to let you know it was about a whale. As a result, some of what I'm about to describe may not be completely new.

I do not, in any way, know French. I found a Le Monde article about the California wildfires, and ran it through both Google Translate and Babel Fish.

Unfortunately, both stumble out the gate. The story attempts to emulate the look of a print publication by having the first letter of the first sentence of the story be a large capital. Unfortunately, they do this not by increasing fonts, but by using a picture of a capital "C". This makes the opening phrase, "Cinq morts" ("Five dead") appear to the translators to be "inq morts". Both of them don't know the French word, "inq" (as it appears there is no such word) and simply leave it there, rendering it as "Cinq dead", with the "C" being an actual picture of a "C". This underscores a lot of the difficulties this field faces. Giving the benefit of the doubt, when I by-hand translated each opening phrase with "inq" replaced by "cinq", this is what I got:

Original: Cinq morts et 500 000 personnes évacuées : le bilan des incendies qui ravagent le sud de la Californie depuis trois jours ne cesse de s'alourdir, alors que le présdident Bush est attendu dans la région mercredi 24 octobre.

Babel Fish: Five died and 500 000 evacuated people: the assessment of the fires which have devastated the south of California for three days does not cease being weighed down, whereas Bush présdident it is awaited in the area Wednesday October 24.

Google: Five people dead and 500,000 evacuees an assessment of fires ravaging southern California for three days continues to grow, while the présdident Bush is expected in the region Wednesday, October 24.

The Babel Fish translation is quite difficult to follow - you can figure out that five are dead and 500k displaced, and Bush is coming on Wednesday, and the fires are in southern California. I'm pretty sure the original sentence was saying something more like "Five people dead and 500,000 evacuees: damage assessments of fires that have ravaged souther California for three days continues to grow, while President Bush is expected in the region Wedneday, October 24." The Google translation is much better, but far from perfect.

Further in the article, from Babel Fish, we learn, "The zone around San Diego was touched hard by the flames, which were propagated with whole districts. The localities of Rancho Bernardo, Fallbrook and Ramona presented, Tuesday, of the scenes of apocalypse, with houses reduced in ashes and carcasses of cars strewing the streets.

Lodging houses were installed, including one in a stage where were, Tuesday, some 20 000 moved. More than 1 660 km2 on the whole left in smoke, according to the Californian administration." I love the image of lodging houses being installed, including one in a stage. As always, it's quite possible to puzzle this all out - San Diego was "hit hard" (not "touched hard") by fire, which went across whole districts. Particular places looked apocolyptic, and they set up emergency shelters for the evacuated, while 1,660 square kilometres - I mean, kilometers - went up in smoke.

Google handles this far from flawlessly, but better: "The area around San Diego has been the hardest hit by the flames, which have spread to entire neighborhoods. The locations in Rancho Bernardo, Fallbrook and Ramona showed on Tuesday, scenes of apocalypse, with houses reduced to ashes and carcasses of cars jonchant the streets.

The shelters were set up, including one in a stadium whereabouts, Tuesday, some 20,000 displaced people. More than 1660 km2 in total went up in smoke, according to the administration of California."

This does much better with idioms! "up in smoke" and "hit hard" are both translated idiomatically, as opposed to Babel Fish's "left in smoke" and "touched hard," respectively. Google rather unaccountably misses the word "jonchant", which, in-context does seem to mean what Babel Fish suggests, "strewing". This isn't a completely bizarre word, either, Google finds 48,200 web pages using it (now, 48, 201). Indeed, this French dictionary, which I can read only because of Google's translation tool, defines it as " present participle of the verb joncher," and lists "covering, disseminating, parsemant, covering lining" as synonyms. Google doesn't know "joncher," either. I've noticed this seems to be a general failing in the current Google system - a number of words come through, untranslated. The Systran system underpinning Babel Fish seems to have a more thorough simple word-to-word dictionary.

This may be a side effect of Google's training documents. If your sources are all boring legal documents, you may not get much in the way of poetic words like "strewing". Presumably, this is improvable with more documents. Google also has an amazing interface to translated pages. I could see this being a real boon to those trying to learn a language. If you hover your mouse over a sentence, it pops up a bubble with the original language in it. This allows you to see the structure of the original sentence, which can be helpful in places where significant reordering of words has been performed by the software. Additionally, for those fluent (or for extremely obvious mistakes), Google provides a "Suggest a better translation" link. If they're proactive about getting those improvements in, the system could get a lot better, quickly.

One reasonably obvious test of automatic systems is to run the output as the input - translate from English to French, and back again. Google, again, does noticeably better. The first lines of Mr. Melville's novel in the original English are "Call me Ishmael. Some years ago - never mind how long precisely - having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world." After a round trip, Babel Fish renders this as "Call Me Ishmael. A few years ago - never the spirit how long with precision not having little or not money in my purse, and nothing particular to interest me on the shore, I thought myself would not sail about and would see the aqueous part of the world." In addition to being almost unreadable, a gratuitous "not" has been thrown in, making it seem as though the narrator had decided not to sail the seas! Google's rendering isn't perfect, but is a lot more palatable: "Call me Ishmael. A few years ago, never mind how long precisely, having little or no money in my purse, and nothing particular interest to me on the shore, I thought I would sail a little and see liquid part of the world." I think it is quite possible to figure out what this says. It's, in fact, getting on towards not being unpleasant to read. Not completely there, yet, but a great leap forward from Systran's technology.

Of course, where Google wants this to be useful is in search. And, already, you can see the results. If you search for something that primarily returns results in your non-native language, Google will helpfully, right now, include a "Translate this page" link next to each result. Unfortunately, this is a bit of a crap shoot, as the summaries are still in the native language (e.g., French), which you can't read to see if the page linked is interesting. However, they have a fascinating new feature, "Translated Search. This lets you search for, say, "Cancun restaurants" in Spanish, and see the results, translated, in English. To be more explicit, it translates "Cancun Restaurants" into "Restaurantes Cancún", performs a Google search on that, and then translates the individual results.

I read someone from Google a while back opining that they were hopeful that automated translation could make the web truly language-neutral. I scoffed. Il semble soudain beaucoup plus proche.

October 29, 2007

Just installed Leopard...

Had two panics, already. In Apple's defense, that's the first time I've seen a panic, ever, but, still. Not the best experience, so far.

Spaces are nice - work a lot better than the old Virtue Desktops, which were always a bit of a hack. And, I'm glad iChat finally lets me have more than one Jabber account, though it'd be nice if there were a way to aggregate all the buddy list panes, together...

Anyway, all in all, I like it better than Tiger, I just hope the two kernel panics I've seen aren't a trend (i.e., that they were related to one-time things happening just after the install)...

Lime Steak

Made this the other day after I'd had it at my friend, Jeff's. It's pretty easy. The recipe suggests tri-tip; I did flank steak and it was marvelous. I did not try the tomato relish with it. I'm not a big fan of cilantro, but wasn't able to detect it in the final product. Definitely recommend it as a quick way to make a Mexicany steak.

October 30, 2007

But For the Grace of God...

About two and half years ago (I think - it seems to me it was just as Vindicia moved into its current office, which would've been then) I got a call from a recruiter. He was looking for someone to be VP of Technology and/or CTO for Yahoo! Music. This isn't that surprising - I was EVP of Tech at EMusic, and, if you're looking for software development executives with downloadable music experience, there's like maybe three people with that on their resume, and I'm one of them.

Ian C. Rogers, late of Nullsoft (the guys who did Winamp) ended up taking some sort of management role with Yahoo! Music (though, I'm not sure precisely what). He has updated his personal blog with an extended anti-music-industry rant, in which declares he's done with the DRM crap Yahoo! Music has been shoving down customers' throats:

I'm here to tell you today that I for one am no longer going to fall into this trap. If the licensing labels offer their content to Yahoo! put more barriers in front of the users, I'm not interested. Do what you feel you need to do for your business, I'll be polite, say thank you, and decline to sign. I won't let Yahoo! invest any more money in consumer inconvenience. I will tell Yahoo! to give the money they were going to give me to build awesome media applications to Yahoo! Mail or Answers or some other deserving endeavor. I personally don't have any more time to give and can't bear to see any more money spent on pathetic attempts for control instead of building consumer value. Life's too short. I want to delight consumers, not bum them out.

The start of this rant is an overview of the digital music history - one which does not touch on GoodNoise, which was selling unencumbered MP3s for $.99 in 1998, nor the company it became, EMusic, which was selling subscriptions for $9.99, again, to unencumbered MP3s in 2000. Ian just tried Amazon's new digital download service, which is unencumbered, as well. It beats GoodNoise's offering by having the content of two major labels - something were never thought it would take nine years to convince them was inevitable. Ian wonders what took so long: "But now, eight years later, Amazon's finally done what was clearly the right solution in 1999. Music in the format that people actually want it in, with a Web-based experience that's simple and works with any device... PRAISE JESUS. It only took 8 years." Well, to some of us, it was obvious in 1998, but, I'm glad everyone else is finally coming around.

Ian's bottom line is right on, though: "Convenience wins, hubris loses."

Oh, and that recruiter - I had a vision pop in my head of endlessly trying to convince major labels, yet again that they should drop DRM because it was killing their sales while not preventing piracy. I told him, "There's not enough money." He was taken aback. "We're Yahoo! It's never been a problem, before."

"No," I explained, "I didn't say you don't have enough money. I said there isn't enough."

About October 2007

This page contains all entries posted to baz.com - Brett Thomas' Blog in October 2007. They are listed from oldest to newest.

August 2007 is the previous archive.

February 2008 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34