Wednesday, September 28, 2011

I'm a just a dude like any other programmer

After listening to Should Google+ require you to use your real name? one fine sunny bike ride, I was left wondered if maybe my justification for anonymity might be more common than the authors might think. My wondering stopped there, until this evening when I giggled at my one of GitHub messages. 

There are many nefarious reasons to use a handle. Some people hide behind anonymity to post nasty comments on YouTube, troll in general, say abusive things or start mass riots in countries where freedom of speech isn't common.  But not all reasons for anonymity are nefarious, some are just about having a level playing field. As "cesine" I've quietly listened on user groups while others suggest new barbie wall papers, and flirty penguins to bring in some of the female persuasion over to Linux, etc. I once made an eye fluttering Tux and put it on my website, wondering if they might catch on that there was something different about the operator of that server.  

Anonymity for me meant having people treat me like any other programmer/geek. My real name isn't at all gender neutral like Tony, Alex or Jesse. But my handle, "cesine" which has been my web identity since 1996 is everything neutral. And apparently, it's working. I think this speaks for itself.

Now that I'm officially "out of the closet" in the blogosphere, its only a question of 10 minutes research to find I'm not your typical dude, but still that's 10 minutes most people won't take. As long as I don't have to use my real name, that's a 10 minute cushion of unbiased respect.

Monday, September 26, 2011

Precision vs. Recall defined for linguists

Precision and recall are some interesting examples of terminology from computer science which will help linguists know how to divide tasks best done by a linguist, from those best done by a script or some sort of automation; in other words, when it needs to be perfect, and when good enough, is good enough.

Recall means getting back all the examples in your data which display that factor. You can get high recall by writing a script which returns a lot of results. There is always a second step, to go through the examples yourself as a human to filter out the extraneous examples. Getting high recall is generally a good first step when you start your research (think: google web search, you really want to know all the authors that have written on your topic...)

Precision means getting data that you can run stats on and get statistical significance. High precision means all the results are what you were looking for. Getting high precision is important to make any claims or generalizations, its generally the last step (and highly valued) in your research.

How do you get high recall or high precision?

You get high precision by rules or having a human check your data (or multiple humans if its hard to detect or the classification is tricky). You get high recall by making a simple script or using statistics and setting your statistics threshold to be more permissive.

How do know which one you need depending on the context?

When you are working on theory ideally you want high recall and high precision (its basicaly the equivalent of necesary and sufficient conditions to define a set). Having high recall but low precision is okay, as long as your goal is to share your research and data and get feedback on the categorization of your data.

Wednesday, September 7, 2011

Watchmes for AuBlog

I made some quick-n-dirty Watchmes

How to use AuBlog for blogging via typing

How to user AuBlog for blogging via dictations

The machine transcriptions are hilarious, and not very useful. AuBlog uses an Open Source machine transcription software (Sphinx). It needs to be trained to your "iLanguage" (vocabulary) to return quality results...

Feature Algebra in a Nutshell

Feature Algebra is an "algebraic form of representation that allows the use of variables and indices for the purposes of identity checking" (Reiss 2002).

