Are you happy with your logging solution? Would you help us out by taking a 30-second survey? Click here

utf8.js

A robust JavaScript implementation of a UTF-8 encoder/decoder, as defined by the Encoding Standard.

Subscribe to updates I use utf8.js


Statistics on utf8.js

Number of watchers on Github 294
Number of open issues 17
Average time to close an issue 2 months
Main language JavaScript
Open pull requests 9+
Closed pull requests 7+
Last commit almost 2 years ago
Repo Created over 6 years ago
Repo Last Updated over 1 year ago
Size 61 KB
Homepage https://git.io/ut...
Organization / Authormathiasbynens
Contributors1
Page Updated
Do you use utf8.js? Leave a review!
View open issues (17)
View utf8.js activity
View on github
Fresh, new opensource launches 🚀🚀🚀
Trendy new open source projects in your inbox! View examples

Subscribe to our mailing list

Evaluating utf8.js for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)

utf8.js Build status Code coverage status Dependency status

utf8.js is a well-tested UTF-8 encoder/decoder written in JavaScript. Unlike many other JavaScript solutions, it is designed to be a proper UTF-8 encoder/decoder: it can encode/decode any scalar Unicode code point values, as per the Encoding Standard. Heres an online demo.

Feel free to fork if you see possible improvements!

Installation

Via npm:

npm install utf8

In a browser:

<script src="utf8.js"></script>

In Node.js:

const utf8 = require('utf8');

API

utf8.encode(string)

Encodes any given JavaScript string (string) as UTF-8, and returns the UTF-8-encoded version of the string. It throws an error if the input string contains a non-scalar value, i.e. a lone surrogate. (If you need to be able to encode non-scalar values as well, use WTF-8 instead.)

// U+00A9 COPYRIGHT SIGN; see http://codepoints.net/U+00A9
utf8.encode('\xA9');
//  '\xC2\xA9'
// U+10001 LINEAR B SYLLABLE B038 E; see http://codepoints.net/U+10001
utf8.encode('\uD800\uDC01');
//  '\xF0\x90\x80\x81'

utf8.decode(byteString)

Decodes any given UTF-8-encoded string (byteString) as UTF-8, and returns the UTF-8-decoded version of the string. It throws an error when malformed UTF-8 is detected. (If you need to be able to decode encoded non-scalar values as well, use WTF-8 instead.)

utf8.decode('\xC2\xA9');
//  '\xA9'

utf8.decode('\xF0\x90\x80\x81');
//  '\uD800\uDC01'
//  U+10001 LINEAR B SYLLABLE B038 E

utf8.version

A string representing the semantic version number.

Support

utf8.js has been tested in at least Chrome 27-39, Firefox 3-34, Safari 4-8, Opera 10-28, IE 6-11, Node.js v0.10.0, Narwhal 0.3.2, RingoJS 0.8-0.11, PhantomJS 1.9.0, and Rhino 1.7RC4.

Unit tests & code coverage

After cloning this repository, run npm install to install the dependencies needed for development and testing. You may want to install Istanbul globally using npm install istanbul -g.

Once thats done, you can run the unit tests in Node using npm test or node tests/tests.js. To run the tests in Rhino, Ringo, Narwhal, PhantomJS, and web browsers as well, use grunt test.

To generate the code coverage report, use grunt cover.

FAQ

Why is the first release named v2.0.0? Havent you heard of semantic versioning?

Long before utf8.js was created, the utf8 module on npm was registered and used by another (slightly buggy) library. @ryanmcgrath was kind enough to give me access to the utf8 package on npm when I told him about utf8.js. Since there has already been a v1.0.0 release of the old library, and to avoid breaking backwards compatibility with projects that rely on the utf8 npm package, I decided the tag the first release of utf8.js as v2.0.0 and take it from there.

Author

twitter/mathias
Mathias Bynens

License

utf8.js is available under the MIT license.

utf8.js open issues Ask a question     (View All Issues)
  • over 3 years bower.json is not up-to-date
  • almost 5 years Codepoint arrays and binary strings
  • over 5 years Performance?
  • over 6 years Add error-tolerant mode
  • over 6 years Add more tests
utf8.js open pull requests (View All Pulls)
  • utf8 encoder speedup for large amounts of data processing
  • Fix loading as browserified in AMD style
  • Created forceFinish Param as to bypass decoding problems.
  • Fixed decoding of four-byte code points
  • Add error-tolerant mode
  • decoder option to ignore truncated symbol on the end of input
  • enable es2015 modules
  • remove redundant version information
  • Add support for encoding to and decoding from byte arrays
utf8.js questions on Stackoverflow (View All Questions)
  • character get garbled with utf8 JS ajax and spring - only in chrome
utf8.js list of languages used
Other projects in JavaScript