How to convert between file encoding schemes on Linux

November 9th, 2017

While working with a client, he sent me a CSV file. While trying to parse it, Python would throw an exception each time due to being unable to read certain bytes. After looking at what format it was with file -i $FILENAME, I saw that it was not saved in UTF-8. Wondering what the best way was to convert it from the format it was in to UTF-8, I research it.

My process is below:

file -i $FILENAME
lconv -f $FROM_FORMAT -t utf-8 $FILE -o $FILE_OUT

The first command gets what charset the file is currently in, the next line converts it from the current charset to UTF-8.