Wednesday, May 18, 2011

What kind of linguist did they say they were?

In Montreal Linguist means "linguiste" i.e. a translator. It turns out there are others who specialize in human language.

Someone on the ling department mailing list sent this around from Foxtrot. The classic reaction from most linguists is the first below, I added some other from other linguists I've met in the field for fun :)

Ling 101 teacher :
This comic merely promotes polyglot-ism. Many very influential linguists didn't need to speak any language other than English.

Field Linguist:
Hmm, let's see the first three are clearly French, Spanish, Italian.Hmmm, the fourth based on vowel diphthong and no orthographic h might be Portuguese, the fifth is clearly different. It must be an isolated cousin like Romanian.
  • /a/ <-> /ju/,

  • /m/ <-> /b/,

  • /o/ <-> /i/ /____r.

  • The final vowels might just be a suffix.

It's possible that the guy making the comic fudged the data to make "a hard one" so laypersons feel some achievement. Those post verbal pronouns look fishy compared to other Romance languages. Wonder if Romanians usually say "Love of mine" or is this a fail on the part of the cartoon author's use of Google translate...
  • Quick chat with Romanian informant (if no informant, log into an online dating website with a Romanian profile and wait for fish to bite) reveals that hunch is on the right track, the guy did change it from salut to /buna ziwa/ probably because salut is also French.

The sixth looks like a mixture of Spanish and Portuguese, it's a toss up, maybe some reverse engineering on Google translate will tell me. I bet they started with Spanish, yep it's Catalan...

10 minutes later, all languages are identified, Google Translate is discovered and new correspondence rules for Romanian <-> Spanish added to mental storage (Spanish previously chosen as generic Romance language due to being field linguist's first Romance language and no obvious need to switch to another to date).

Computational Linguist:
Ah cool! I can send this to my computer sci/SOEN buds and they will think I'm pretty cool to be a linguist :) I'll hack together a Python script that uses the Google Translate API on my Android/Ubuntu machine and generate a few other languages to stump them even more. 3 minutes later all language identified, 3 hours later a new API learned.

Text Engineer:
Fun data! I'll run OCR on the gif on the command line, ripe out the text, send it through my language identification pipeline. 3 min later, all languages are identified (Catalan miss-identified as Spanish) and off to a bar for happy hour.

No comments: