DiffMate

Back to Glossary

Character Encoding

Character encoding is a set of rules that converts characters into numbers (bytes) that computers can process.

Major Encoding Methods

  • UTF-8: Variable-length encoding supporting all characters worldwide. The web standard.
  • EUC-KR: Korean-specific encoding used in legacy Korean systems.
  • UTF-16: Represents most characters in 2 bytes. Used internally by Windows.
  • ISO-8859-1: Encoding for Western European languages.

Why Encoding Problems Occur

When the encoding used to save a file differs from the encoding used to open it, characters appear garbled. This is especially common with non-Latin characters like Korean, Chinese, and Japanese.

Encoding Handling in DiffMate

DiffMate automatically detects encoding in this order: BOM detection → UTF-8 → EUC-KR → ISO-8859-1 → UTF-16. This ensures most files with non-Latin characters are compared correctly.

Try Comparing Now