Back to Glossary
Character Encoding
Character encoding is a set of rules that converts characters into numbers (bytes) that computers can process.
Major Encoding Methods
- UTF-8: Variable-length encoding supporting all characters worldwide. The web standard.
- EUC-KR: Korean-specific encoding used in legacy Korean systems.
- UTF-16: Represents most characters in 2 bytes. Used internally by Windows.
- ISO-8859-1: Encoding for Western European languages.
Why Encoding Problems Occur
When the encoding used to save a file differs from the encoding used to open it, characters appear garbled. This is especially common with non-Latin characters like Korean, Chinese, and Japanese.
Encoding Handling in DiffMate
DiffMate automatically detects encoding in this order: BOM detection → UTF-8 → EUC-KR → ISO-8859-1 → UTF-16. This ensures most files with non-Latin characters are compared correctly.