How to convert between file encoding schemes on Linux
November 9th, 2017
While working with a client, he sent me a CSV file. While trying to parse it, Python would
throw an exception each time due to being unable to read certain bytes. After
looking at what format it was with file -i $FILENAME
, I saw that it was not
saved in UTF-8. Wondering what the best way was to convert it from the format it was in to UTF-8, I research it.
My process is below:
file -i $FILENAME
lconv -f $FROM_FORMAT -t utf-8 $FILE -o $FILE_OUT
The first command gets what charset the file is currently in, the next line converts it from the current charset to UTF-8.