Almost anytime I have to build a new corpus the Unicode Fight returns. I lived many Unicode Fight free years when Linux became 100% Unicode, but now I'm using Mac OSX.
The default file.encoding for Mac is MacRoman. I've tried a whole variety of Googling to find the keywords to find out the proper way (using the System Preferences) to set the default to UTF-8 to no avail. I really hate Google's new (~6mos ago) search algorithm that tries to guess what we mean to ask, and doesn't include all keywords we query. It makes it near impossible to find anything long-tail-ish.
This is when it started working in Java/Groovy:
created a file /etc/launchd.conf and put this into it:
setenv JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8
For general purposes:
added this to my ~/.vimrc
set encoding=utf-8
set fileencoding=utf-8 added this to my /etc/bashrc
export LC_CTYPE=en_CA.UTF-8
export JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8 Changed my Terminal > Preferences > Encodings to only UTF-8, and…