Skip to main content

Precision vs. Recall defined for linguists

Precision and recall are some interesting examples of terminology from computer science which will help linguists know how to divide tasks best done by a linguist, from those best done by a script or some sort of automation; in other words, when it needs to be perfect, and when good enough, is good enough.

Recall means getting back all the examples in your data which display that factor. You can get high recall by writing a script which returns a lot of results. There is always a second step, to go through the examples yourself as a human to filter out the extraneous examples. Getting high recall is generally a good first step when you start your research (think: google web search, you really want to know all the authors that have written on your topic...)

Precision means getting data that you can run stats on and get statistical significance. High precision means all the results are what you were looking for. Getting high precision is important to make any claims or generalizations, its generally the last step (and highly valued) in your research.

How do you get high recall or high precision?

You get high precision by rules or having a human check your data (or multiple humans if its hard to detect or the classification is tricky). You get high recall by making a simple script or using statistics and setting your statistics threshold to be more permissive.

How do know which one you need depending on the context?

When you are working on theory ideally you want high recall and high precision (its basicaly the equivalent of necesary and sufficient conditions to define a set). Having high recall but low precision is okay, as long as your goal is to share your research and data and get feedback on the categorization of your data.


Popular posts from this blog

10.6.8 update spells Joy for Minimacs everywhere

If, after updating to 10.6.8 you get into a reboot loop, never fear the update is the same as every other update, except there is a step involving replacing the kernel.

This is very easy to do if you either (a) download it and save it on your Minimac before you update to 10.6.8, or (b) you have a mac formated USB key that you can copy it onto after your Minimac starts looping.

Here is the super-condensed minimal effort path to get you into Minimac heaven... (no not a dead Minimac, a running one), at least until Lion comes out.

On another computer (preferably a Mac or Ubuntu)
Download the legacy kernel[mirror]Put it on a Mac formated USB key * On the Reboot Looping Minimac
Hold down Shift as you bootAt the boot loader screen type (once you start typing it will apear in black letters on the bottom of the screen)  recovery=yes, -x Once it has finished loading, plug in the USB keyCopy the legacy_kernel-10.6.8.bz2 to your MinimacDouble click on it to unzip itMove the legacy_kernel-10.6.8 to …

English Noun Incorporation?

I was at a talk today with some Ojibwe data where invariably the claim that "English doesn't have incorporation" or at least incorporation of objects came up. We have "vacume clean" but generally we only incorporate the instrument. I remember a similar discussion coming up a few years ago in 2007 and I asked myself about apple picking. My colucators said, sure, but you can't say apple pick right? I thought about it a bit and came up with a linear string of words that might get google results. I remember I searched for "we apple picked" and found a few results, indicating to me that some people say it, generally when discussing their weekends. So, having my Android with me at the talk I googled again. This time I found a lot more examples than before, 394 to be exact, all of the first page clear examples with native speakers, speaking naturally.

I've heard this claim can be traced back to Baker 1988. When I got home I googled the claim "…