Skip to main content


Showing posts from May, 2011

Crowd-captioning, yes please!

In 2009 YouTube announced their captioning API for developers who wanted to create tools to help users add captions to their videos. I've tried the captioning utility of YouTube a few times on my videos, of course the speech recognition service had trouble with words like Quechua and morpheme but in my perspective it took me only 20min to correct the captions and thereby give the Google Speech Recognizer more data about my domain specific terms. I know it took 5 recordings of "such that" in logical denotations on my Android phone for it to recognize what I was saying.. 5 recordings isn't bad for training. It also provided a much more accurate transcription on the second video, hopefully due to having the first video to use to train the model both in terms of phonetic transcriptions for the typical participants in my username's videos and also in terms of the lexicon of my username. While watching the Ali G video posted earlier, I was thinking about how our L

Open Source software makes a difference

In this video series on linguists in the field collecting data and providing language revival materials we can see that they are using Audacity to record their informants :) Koodos to the Audacity developers and contributors! One might observe that it announces a potential future rise in Sierra Lione nationalism if not accompanied by encouraging the kids to think about iLanguage and Sociolinguistics. Child's choice of Hebrew as an example of a successful revived language is a very interesting one...

A classic intro to Linguistics: Ali G interviews Chomsky

I still love this video, 3 minutes and digestably informative. In Quebec "linguistique" seems to mean that you've memorized lots of long proper words, and odd conjunctions and opaque grammar that no one uses It's very very hard to explain what linguistics is, and why it could be useful to view language in that way...let's let the dynamic duo of Ali G and Chomsky explain it to us!

My Gtablet goes Gingerbread!

Thanks to installing the sample speech recognition service in the android sdk on my gTablet (which was running a hacked version of Android because the software Viewsonic provides only allows English keyboards and no Market!) I started getting Android.process.acore errors so I decided it was time for an upgrade, if not to honeycomb why not to Gingerbread? I followed this tutorial , and was up and running in less than ten minutes! Gingerbread is so much fun, or at least the vegan tab version is. It has a thumb keyboard that I'm using to write this. I don't know if I'll keep using it but its fun for now ;) I had been planning on skipping from 2.2 to 3.0 but I'm glad I had the opportunity to try it... Posted using my Android

What kind of linguist did they say they were?

In Montreal Linguist means "linguiste" i.e. a translator. It turns out there are others who specialize in human language. Someone on the ling department mailing list sent this around  from Foxtrot . The classic reaction from most linguists is the first below, I added some other from other linguists I've met in the field for fun :) Ling 101 teacher : This comic merely promotes polyglot-ism. Many very influential linguists didn't need to speak any language other than English. Field Linguist: Hmm, let's see the first three are clearly French, Spanish, Italian.Hmmm, the fourth based on vowel diphthong and no orthographic h might be Portuguese, the fifth is clearly different. It must be an isolated cousin like Romanian. /a/ <-> /ju/, /m/ <-> /b/, /o/ <-> /i/ /____r. The final vowels might just be a suffix. It's possible that the guy making the comic fudged the data to make "a hard one" so laypersons feel some ac

Soon all linguists will be first generation

At a Machine Learning talk last year the speaker was presenting his neural net which, if given Wikipedia data could learn to tag parts of speech, disambiguate words and even give pretty OK parse trees for any language. My thought (as a linguist who deals with lots of data in many languages) was "Cool! Bet it's not great, but at least its less bootstrapping that I have to do when I start navigating a new language's data." But the speaker said in question period that "linguists" disapprove of his approach, generally with some concerns that we cannot determine what his model is doing. Yeah, thats certainly true, since the nodes don't correspond to human knowledge explicit knowledge models. But I don't need the computer to model the data, I just want some rough clusters and classification so that I can go through the data and do the fine grained analysis using my human brain. We should also be careful with our human brains, we often over-generalize, see

The Fresh Meat Principle

Recently I convinced one of my friends to go Salsa dancing on the beginners night, alone. I was operating on five years of observations which I would formalize below as the "Fresh Meat Principle." The Fresh Meat Principle Given any x such that x is female, x measures between 5'0'' and 5'4'', x is of medium to athletic build, and x is wearing flat practical non-salsa shoes There will be a y such that y is male, y is aged between 25 and 45, y is a reformed geek, and either y is mid intermediate to mid advanced or y is interested in becoming a salsa instructor In any w such that w is a beginner salsa night or w is an intermediate salsa night Then y will monopolize x the entire evening in w. I'm very happy to say the Fresh Meat Principle was upheld in a recent study.

English Noun Incorporation?

I was at a talk today with some Ojibwe data where invariably the claim that "English doesn't have incorporation" or at least incorporation of objects came up. We have "vacume clean" but generally we only incorporate the instrument. I remember a similar discussion coming up a few years ago in 2007 and I asked myself about apple picking. My colucators said, sure, but you can't say apple pick right? I thought about it a bit and came up with a linear string of words that might get google results. I remember I searched for "we apple picked" and found a few results, indicating to me that some people say it, generally when discussing their weekends. So, having my Android with me at the talk I googled again. This time I found a lot more examples than before, 394 to be exact, all of the first page clear examples with native speakers, speaking naturally. I've heard this claim can be traced back to Baker 1988. When I got home I googled the claim &quo