or, stuff that I dragged out of my head

Location: Moncton, New Brunswick, Canada

Friday, November 02, 2007

The Machinery

Translation is a difficult art to master. You can't just know what all the words are: you have to understand how they fit together, and you have to have a strong grasp of idiom, too.

Machine translation, or MT, has come a long way since the 1950s, but it still has a long, long way to go. Human language is just too variable, too hit-or-miss, too flexible to easily yield its secrets to a set of algorithms, however complex.

Here's a case in point. An online fragrance shop called Beautycafe sells a line of scents called Comptoir Sud Pacifique, of which I own ten or so (you can read various reviews such as this one over on my other blog if you have a mind to). French fragrance advertising is just dementedly expressive: it employs the loftiest turns of phrase to make each new scent seem miraculously different and desirable. Maybe it reads better in French, but in accurate English translation it just sounds silly, as I noted here. (They seem to use the word "vibrating" a lot.)

But when you take the French and just throw it into a machine like Google Translate or Babelfish, you get something completely insane, and not in a good way.

Here's a direct machine translation as found on Beautycafe's page for a scent called Mage d'Orient:

Of this travel by East, Comptoir Sud Pacifique captured rare and warm grades, unexpected impressions, agreements astonishing and surprising, to the accents a not very wooded, delicious spiced ones with chili, all in plumpness … Terribly enchanting as the east earths…
Un perfume racé, boisé aux multiples facettes.
An unpublished agreement of exotic fruity grades on a flower heart to the sunny accents.  The more masculine facets are given by the Sandalwood, the Vétyver and the Foams Oak, enveloped of a soft agreement crème and voluptuous, punctuated of a key enchanting Broad Beans Tonka, Vanilla and an amber breath and of Musc. 
Top Notes : Bergamote orange, Limette, Green Lemon, Pineapple, Lychee, Apple, Coconut Walnut Fresh.   
Middle Notes : Sea spray, Geranium, Jasmine, Lavender, Muguet, orange-tree Flower
Base Notes : Sandalwood, Vétyver, Cedar, Foam Oak, Pine Resin, Milk and Coconut Walnut, Broad Beans Tonka, Amber, Musc, Vanilla.

Ridiculous! Shameful!

"Coconut walnut fresh" happens because "noix" in French refers to the walnut (it also means any nut), but "noix de coco" means "coconut", which evidently the algorithm doesn't know, and because French generally puts its adjectives after its nouns: "noix de coco frais" actually means "fresh coconut", but the machine looked at it, thought "walnut of coconut fresh", and then reconstructed the genitive, as we often do in English, perhaps turning "shards of metal" into "metal shards".

"The foams oak" is equally risible: "mousse de chêne" doesn't mean "foam of oak", but "oakmoss", "mousse" serving both purposes in French. Oakmoss is a lichen widely used in perfumery to create scents called chypres, which have a honey-earthy-woody scent.

The whole thing is nonsense. Online machine translators have their place, but only as a starting point: you then have to go over the resulting text and turn it from MT-English into English English. You can't just shove the text into them and then publish the results. You'll only end up embarrassing yourself.


Blogger Frank said...

I'm actually doing freelance work with a group at the University of Pennsylvania that's part of a Department of Defense initiative to improve machine translation, mostly of Chinese and Arabic. It's really amazing how variable it is: some translations the MT comes up with are almost perfect, but others are sheer gibberish. Since my job is to make the gibberish comprehensible, it's a bit of a challenge.

Friday, November 02, 2007 10:33:00 PM  
Blogger pyramus said...

That sounds like an extremely cool job!

Saturday, November 03, 2007 12:27:00 PM  

Post a Comment

<< Home