Are you happy with your logging solution? Would you help us out by taking a 30-second survey? Click here

he

A robust HTML entity encoder/decoder written in JavaScript.

Subscribe to updates I use he


Statistics on he

Number of watchers on Github 1423
Number of open issues 8
Average time to close an issue 15 days
Main language JavaScript
Open pull requests 3+
Closed pull requests 8+
Last commit about 3 years ago
Repo Created almost 7 years ago
Repo Last Updated about 2 years ago
Size 625 KB
Homepage https://mths.be/he
Organization / Authormathiasbynens
Contributors2
Page Updated
Do you use he? Leave a review!
View open issues (8)
View he activity
View on github
Fresh, new opensource launches 🚀🚀🚀
Trendy new open source projects in your inbox! View examples

Subscribe to our mailing list

Evaluating he for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)

he Build status Code coverage status Dependency status

he (for HTML entities) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would, has an extensive test suite, and contrary to many other JavaScript solutions he handles astral Unicode symbols just fine. An online demo is available.

Installation

Via npm:

npm install he

Via Bower:

bower install he

Via Component:

component install mathiasbynens/he

In a browser:

<script src="he.js"></script>

In Node.js, io.js, Narwhal, and RingoJS:

var he = require('he');

In Rhino:

load('he.js');

Using an AMD loader like RequireJS:

require(
  {
    'paths': {
      'he': 'path/to/he'
    }
  },
  ['he'],
  function(he) {
    console.log(he);
  }
);

API

he.version

A string representing the semantic version number.

he.encode(text, options)

This function takes a string of text and encodes (by default) any symbols that arent printable ASCII symbols and &, <, >, ", ', and `, replacing them with character references.

he.encode('foo  bar  baz  qux');
//  'foo &#xA9; bar &#x2260; baz &#x1D306; qux'

As long as the input string contains allowed code points only, the return value of this function is always valid HTML. Any (invalid) code points that cannot be represented using a character reference in the input are not encoded:

he.encode('foo \0 bar');
//  'foo \0 bar'

However, enabling the strict option causes invalid code points to throw an exception. With strict enabled, he.encode either throws (if the input contains invalid code points) or returns a string of valid HTML.

The options object is optional. It recognizes the following properties:

useNamedReferences

The default value for the useNamedReferences option is false. This means that encode() will not use any named character references (e.g. &copy;) in the output hexadecimal escapes (e.g. &#xA9;) will be used instead. Set it to true to enable the use of named references.

Note that if compatibility with older browsers is a concern, this option should remain disabled.

// Using the global default setting (defaults to `false`):
he.encode('foo  bar  baz  qux');
//  'foo &#xA9; bar &#x2260; baz &#x1D306; qux'

// Passing an `options` object to `encode`, to explicitly disallow named references:
he.encode('foo  bar  baz  qux', {
  'useNamedReferences': false
});
//  'foo &#xA9; bar &#x2260; baz &#x1D306; qux'

// Passing an `options` object to `encode`, to explicitly allow named references:
he.encode('foo  bar  baz  qux', {
  'useNamedReferences': true
});
//  'foo &copy; bar &ne; baz &#x1D306; qux'

decimal

The default value for the decimal option is false. If the option is enabled, encode will generally use decimal escapes (e.g. &#169;) rather than hexadecimal escapes (e.g. &#xA9;). Beside of this replacement, the basic behavior remains the same when combined with other options. For example: if both options useNamedReferences and decimal are enabled, named references (e.g. &copy;) are used over decimal escapes. HTML entities without a named reference are encoded using decimal escapes.

// Using the global default setting (defaults to `false`):
he.encode('foo  bar  baz  qux');
//  'foo &#xA9; bar &#x2260; baz &#x1D306; qux'

// Passing an `options` object to `encode`, to explicitly disable decimal escapes:
he.encode('foo  bar  baz  qux', {
  'decimal': false
});
//  'foo &#xA9; bar &#x2260; baz &#x1D306; qux'

// Passing an `options` object to `encode`, to explicitly enable decimal escapes:
he.encode('foo  bar  baz  qux', {
  'decimal': true
});
//  'foo &#169; bar &#8800; baz &#119558; qux'

// Passing an `options` object to `encode`, to explicitly allow named references and decimal escapes:
he.encode('foo  bar  baz  qux', {
  'useNamedReferences': true,
  'decimal': true
});
//  'foo &copy; bar &ne; baz &#119558; qux'

encodeEverything

The default value for the encodeEverything option is false. This means that encode() will not use any character references for printable ASCII symbols that dont need escaping. Set it to true to encode every symbol in the input string. When set to true, this option takes precedence over allowUnsafeSymbols (i.e. setting the latter to true in such a case has no effect).

// Using the global default setting (defaults to `false`):
he.encode('foo  bar  baz  qux');
//  'foo &#xA9; bar &#x2260; baz &#x1D306; qux'

// Passing an `options` object to `encode`, to explicitly encode all symbols:
he.encode('foo  bar  baz  qux', {
  'encodeEverything': true
});
//  '&#x66;&#x6F;&#x6F;&#x20;&#xA9;&#x20;&#x62;&#x61;&#x72;&#x20;&#x2260;&#x20;&#x62;&#x61;&#x7A;&#x20;&#x1D306;&#x20;&#x71;&#x75;&#x78;'

// This setting can be combined with the `useNamedReferences` option:
he.encode('foo  bar  baz  qux', {
  'encodeEverything': true,
  'useNamedReferences': true
});
//  '&#x66;&#x6F;&#x6F;&#x20;&copy;&#x20;&#x62;&#x61;&#x72;&#x20;&ne;&#x20;&#x62;&#x61;&#x7A;&#x20;&#x1D306;&#x20;&#x71;&#x75;&#x78;'

strict

The default value for the strict option is false. This means that encode() will encode any HTML text content you feed it, even if it contains any symbols that cause parse errors. To throw an error when such invalid HTML is encountered, set the strict option to true. This option makes it possible to use he as part of HTML parsers and HTML validators.

// Using the global default setting (defaults to `false`, i.e. error-tolerant mode):
he.encode('\x01');
//  '&#x1;'

// Passing an `options` object to `encode`, to explicitly enable error-tolerant mode:
he.encode('\x01', {
  'strict': false
});
//  '&#x1;'

// Passing an `options` object to `encode`, to explicitly enable strict mode:
he.encode('\x01', {
  'strict': true
});
//  Parse error

allowUnsafeSymbols

The default value for the allowUnsafeSymbols option is false. This means that characters that are unsafe for use in HTML content (&, <, >, ", ', and `) will be encoded. When set to true, only non-ASCII characters will be encoded. If the encodeEverything option is set to true, this option will be ignored.

he.encode('foo  and & ampersand', {
  'allowUnsafeSymbols': true
});
//  'foo &#xA9; and & ampersand'

Overriding default encode options globally

The global default setting can be overridden by modifying the he.encode.options object. This saves you from passing in an options object for every call to encode if you want to use the non-default setting.

// Read the global default setting:
he.encode.options.useNamedReferences;
//  `false` by default

// Override the global default setting:
he.encode.options.useNamedReferences = true;

// Using the global default setting, which is now `true`:
he.encode('foo  bar  baz  qux');
//  'foo &copy; bar &ne; baz &#x1D306; qux'

he.decode(html, options)

This function takes a string of HTML and decodes any named and numerical character references in it using the algorithm described in section 12.2.4.69 of the HTML spec.

he.decode('foo &copy; bar &ne; baz &#x1D306; qux');
//  'foo  bar  baz  qux'

The options object is optional. It recognizes the following properties:

isAttributeValue

The default value for the isAttributeValue option is false. This means that decode() will decode the string as if it were used in a text context in an HTML document. HTML has different rules for parsing character references in attribute values set this option to true to treat the input string as if it were used as an attribute value.

// Using the global default setting (defaults to `false`, i.e. HTML text context):
he.decode('foo&ampbar');
//  'foo&bar'

// Passing an `options` object to `decode`, to explicitly assume an HTML text context:
he.decode('foo&ampbar', {
  'isAttributeValue': false
});
//  'foo&bar'

// Passing an `options` object to `decode`, to explicitly assume an HTML attribute value context:
he.decode('foo&ampbar', {
  'isAttributeValue': true
});
//  'foo&ampbar'

strict

The default value for the strict option is false. This means that decode() will decode any HTML text content you feed it, even if it contains any entities that cause parse errors. To throw an error when such invalid HTML is encountered, set the strict option to true. This option makes it possible to use he as part of HTML parsers and HTML validators.

// Using the global default setting (defaults to `false`, i.e. error-tolerant mode):
he.decode('foo&ampbar');
//  'foo&bar'

// Passing an `options` object to `decode`, to explicitly enable error-tolerant mode:
he.decode('foo&ampbar', {
  'strict': false
});
//  'foo&bar'

// Passing an `options` object to `decode`, to explicitly enable strict mode:
he.decode('foo&ampbar', {
  'strict': true
});
//  Parse error

Overriding default decode options globally

The global default settings for the decode function can be overridden by modifying the he.decode.options object. This saves you from passing in an options object for every call to decode if you want to use a non-default setting.

// Read the global default setting:
he.decode.options.isAttributeValue;
//  `false` by default

// Override the global default setting:
he.decode.options.isAttributeValue = true;

// Using the global default setting, which is now `true`:
he.decode('foo&ampbar');
//  'foo&ampbar'

he.escape(text)

This function takes a string of text and escapes it for use in text contexts in XML or HTML documents. Only the following characters are escaped: &, <, >, ", ', and `.

he.escape('<img src=\'x\' onerror="prompt(1)">');
//  '&lt;img src=&#x27;x&#x27; onerror=&quot;prompt(1)&quot;&gt;'

he.unescape(html, options)

he.unescape is an alias for he.decode. It takes a string of HTML and decodes any named and numerical character references in it.

Using the he binary

To use the he binary in your shell, simply install he globally using npm:

npm install -g he

After that you will be able to encode/decode HTML entities from the command line:

$ he --encode 'fo  br  baz'
f&#xF6;o &#x2665; b&#xE5;r &#x1D306; baz

$ he --encode --use-named-refs 'fo  br  baz'
f&ouml;o &hearts; b&aring;r &#x1D306; baz

$ he --decode 'f&ouml;o &hearts; b&aring;r &#x1D306; baz'
fo  br  baz

Read a local text file, encode it for use in an HTML text context, and save the result to a new file:

$ he --encode < foo.txt > foo-escaped.html

Or do the same with an online text file:

$ curl -sL "http://git.io/HnfEaw" | he --encode > escaped.html

Or, the opposite read a local file containing a snippet of HTML in a text context, decode it back to plain text, and save the result to a new file:

$ he --decode < foo-escaped.html > foo.txt

Or do the same with an online HTML snippet:

$ curl -sL "http://git.io/HnfEaw" | he --decode > decoded.txt

See he --help for the full list of options.

Support

he has been tested in at least:

  • Chrome 27-50
  • Firefox 3-45
  • Safari 4-9
  • Opera 10-12, 1537
  • IE 611
  • Edge
  • Narwhal 0.3.2
  • Node.js v0.10, v0.12, v4, v5
  • PhantomJS 1.9.0
  • Rhino 1.7RC4
  • RingoJS 0.8-0.11

Unit tests & code coverage

After cloning this repository, run npm install to install the dependencies needed for he development and testing. You may want to install Istanbul globally using npm install istanbul -g.

Once thats done, you can run the unit tests in Node using npm test or node tests/tests.js. To run the tests in Rhino, Ringo, Narwhal, and web browsers as well, use grunt test.

To generate the code coverage report, use grunt cover.

Acknowledgements

Thanks to Simon Pieters (@zcorpan) for the many suggestions.

Author

twitter/mathias
Mathias Bynens

License

he is available under the MIT license.

he open issues Ask a question     (View All Issues)
  • over 3 years Incorrect error message for unknown named character references
  • over 3 years Edge case: does not decode example string on w3 spec
  • almost 4 years Make `&aaa;` a parse error
  • over 4 years Make `<a href="guitar?pedal=foo&amp=bar">x</a>` no longer be a parse error
  • over 5 years streaming implementation
  • almost 6 years Make scripts write data files + export the data
  • over 6 years Add XHTML/XML option
he open pull requests (View All Pulls)
  • Option for decimal output
  • calling with `-v` explicitly should return success
  • fix: gracefully handle falsy input
he questions on Stackoverflow (View All Questions)
  • I am trying to get the date a user was last online and which game he played last. However I get a not a single-group group function error
  • Laravel phpUnit Test Error with more than one function int he Test class
  • How we he handle requests priority in express/nodejs?
  • Here all stock name are show dynamically when data updated then he change also and all data parse to be json
  • How to validate XML against XSD using Saxon Home Edition (HE) 9.4
  • I want the user to not be able to advance to the next screen if he/she doesn't enter the required information
  • Bigquery: A "READER" but not writer. Can he/she query the table in that dataset within the project that contain the data?
  • How to let user login to your android app only when he is in a particular room?
  • I want to get Watch History of User at youtube after He/She login at my website
  • How do i redirect a user to 404 Error page after He/She puts what ever Url string or numbers NOT KNOWN TO MY WEB APPLICATION?
  • How to redirect a user from where He/She just come from after logging in?
  • How to add one to he last value in a list Python
  • How can I detect the geolocation of the user he is browsing my web page currently
  • Does Google Maps have the potential to do what he does MapData?
  • How to display email after login in he masterpage
  • how to fetch t he data from database and get connect to server and fetch data and display it in the dash board
  • I Want To check my mysql database to see if a user has donated blood in the last six months, if he has not i'd like to allow him to donate
  • How to send an item when he purchased an item? Android
  • Why does the fragment stay in the background?he
  • User enters what he wants, as well as price and amount
  • Notify user when he enters into an area with a radius
  • Detect when user finished speaking, when he uses the android speech to text from the keyboard
  • Animating a dead sprite after he collides with something
  • Only trigger proximity detector when object enters the range, not when he moves within range
  • User submit form, but don't save until he register or login
  • If a user is alredy logged on. then he should be moved to home page but not working
  • A co-worker resigned and now I'm trying to find/change a piece of code he wrote
  • clink pagination in he view
  • User profile request failed. Most likely the user is not connected to the provider and he should to authenticate again in HybridAuth
  • npm is putting all dependencies and sub-dependecies int he same folder
he list of languages used
Other projects in JavaScript