nom

Rust parser combinator framework

Subscribe to updates I use nom


Statistics on nom

Number of watchers on Github 1907
Number of open issues 124
Average time to close an issue 24 days
Main language Rust
Average time to merge a PR 11 days
Open pull requests 105+
Closed pull requests 41+
Last commit 8 months ago
Repo Created almost 4 years ago
Repo Last Updated 8 months ago
Size 7.49 MB
Organization / Authorgeal
Contributors35
Page Updated
Do you use nom? Leave a review!
View open issues (124)
View nom activity
View on github
Latest Open Source Launches
Trendy new open source projects in your inbox! View examples

Subscribe to our mailing list

Evaluating nom for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)

nom, eating data byte by byte

LICENSE Join the chat at https://gitter.im/Geal/nom Build Status Coverage Status Crates.io Version

nom is a parser combinators library written in Rust. Its goal is to provide tools to build safe parsers without compromising the speed or memory consumption. To that end, it uses extensively Rust's strong typing and memory safety to produce fast and correct parsers, and provides macros and traits to abstract most of the error prone plumbing.

Example

Hexadecimal color parser:

#[macro_use]
extern crate nom;

#[derive(Debug,PartialEq)]
pub struct Color {
  pub red:   u8,
  pub green: u8,
  pub blue:  u8,
}

fn from_hex(input: &str) -> Result<u8, std::num::ParseIntError> {
  u8::from_str_radix(input, 16)
}

fn is_hex_digit(c: char) -> bool {
  let c = c as u8;
  (c >= 0x30 && c <= 0x39) || (c >= 0x41 && c <= 0x46) || (c >= 0x61 && c <= 0x66)
}

named!(hex_primary<&str, u8>,
  map_res!(take_while_m_n!(2, 2, is_hex_digit), from_hex)
);

named!(hex_color<&str, Color>,
  do_parse!(
           tag!("#")   >>
    red:   hex_primary >>
    green: hex_primary >>
    blue:  hex_primary >>
    (Color { red, green, blue })
  )
);

#[test]
fn parse_color() {
  assert_eq!(hex_color("#2F14DF"), Ok(("", Color {
    red: 47,
    green: 20,
    blue: 223,
  })));
}

Documentation

If you need any help developing your parsers, please ping geal on IRC (mozilla, freenode, geeknode, oftc), go to #nom on Mozilla IRC, or on the Gitter chat room.

Why use nom

If you want to write:

binary format parsers

nom was designed to properly parse binary formats from the beginning. Compared to the usual handwritten C parsers, nom parsers are just as fast, free from buffer overflow vulnerabilities, and handle common patterns for you:

  • TLV
  • bit level parsing
  • hexadecimal viewer in the debugging macros for easy data analysis
  • streaming parsers for network formats and huge files

Example projects:

Text format parsers

While nom was made for binary format at first, but it soon grew to work just as well with text formats. From line based formats like CSV, to more complex, nested formats such as JSON, nom can manage it, and provides you with useful tools:

  • fast case insensitive comparison
  • recognizers for escaped strings
  • regular expressions can be embedded in nom parsers to represent complex character patterns succintly
  • special care has been given to managing non ASCII characters properly

Example projects:

Programming language parsers

While programming language parsers are usually written manually for more flexibility and performance, nom can be (and has been successfully) used as a prototyping parser for a language.

nom will get you started quickly with powerful custom error types, that you can leverage with nom_locate to pinpoint the exact line and column of the error. No need for separate tokenizing, lexing and parsing phases: nom can automatically handle whitespace parsing, and construct an AST in place.

Example projects:

Streaming formats

While a lot of formats (and the code handling them) assume that they can fit the complete data in memory, there are formats for which we only get a part of the data at once, like network formats, or huge files. nom has been designed for a correct behaviour with partial data: if there is not enough data to decide, nom will tell you it needs more instead of silently returning a wrong result. Whether your data comes entirely or in chunks, the result should be the same.

It allows you to build powerful, deterministic state machines for your protocols.

Example projects:

Parser combinators

Parser combinators are an approach to parsers that is very different from software like lex and yacc. Instead of writing the grammar in a separate file and generating the corresponding code, you use very small functions with very specific purpose, like take 5 bytes, or recognize the word 'HTTP', and assemble then in meaningful patterns like recognize 'HTTP', then a space, then a version. The resulting code is small, and looks like the grammar you would have written with other parser approaches.

This has a few advantages:

  • the parsers are small and easy to write
  • the parsers components are easy to reuse (if they're general enough, please add them to nom!)
  • the parsers components are easy to test separately (unit tests and property-based tests)
  • the parser combination code looks close to the grammar you would have written
  • you can build partial parsers, specific to the data you need at the moment, and ignore the rest

Technical features

nom parsers are for:

  • [x] byte-oriented: the basic type is &[u8] and parsers will work as much as possible on byte array slices (but are not limited to them)
  • [x] bit-oriented: nom can address a byte slice as a bit stream
  • [x] string-oriented: the same kind of combinators can apply on UTF-8 strings as well
  • [x] zero-copy: if a parser returns a subset of its input data, it will return a slice of that input, without copying
  • [x] streaming: nom can work on partial data and detect when it needs more data to produce a correct result
  • [x] macro based syntax: easier parser building through macro usage
  • [x] descriptive errors: the parsers can aggregate a list of error codes with pointers to the incriminated input slice. Those error lists can be pattern matched to provide useful messages.
  • [x] custom error types: you can provide a specific type to improve errors returned by parsers
  • [x] safe parsing: nom leverages Rust's safe memory handling and powerful types, and parsers are routinely fuzzed and tested with real world data. So far, the only flaws found by fuzzing were in code written outside of nom
  • [x] speed: benchmarks have shown that nom parsers often outperform many parser combinators library like Parsec and attoparsec, some regular expression engines and even handwritten C parsers

Some benchmarks are available on Github.

Installation

nom is available on crates.io and can be included in your Cargo enabled project like this:

[dependencies]
nom = "^3.2"

Then include it in your code like this:

#[macro_use]
extern crate nom;

NOTE: if you have existing code using nom below the 4.0 version, please take a look at the upgrade documentation to handle the breaking changes.

There are a few compilation features:

  • std: (activated by default) if disabled, nom can work in no_std builds
  • regexp: enables regular expression parsers with the regex crate
  • regexp_macros: enables regular expression parsers with the regex and regex_macros crates. Regular expressions can be defined at compile time, but it requires a nightly version of rustc
  • verbose-errors: accumulate error codes and input positions as you backtrack through the parser tree. This gives you precise information about which part of the parser was affected by which part of the input

You can activate those features like this:

[dependencies.nom]
version = "^3.2"
features = ["regexp"]

Parsers written with nom

Here is a list of known projects using nom:

Want to create a new parser using nom? A list of not yet implemented formats is available here.

Want to add your parser here? Create a pull request for it!

nom open issues Ask a question     (View All Issues)
  • about 2 years Why are `apply!` and `apply_m!` separate?
  • about 2 years flat_map should not replace the custom error type
  • about 2 years IResult VS Result: making Incomplete part of errors?
  • about 2 years investigate mem transmute combinator
  • about 2 years char! does not support &str
  • about 2 years FileProducer gets confused by large Incomplete
  • about 2 years No rules expected the token `,` for nom::alphanumeric is used in alt! macro
  • about 2 years Use correct case folding for case insensitive match
  • about 2 years Easy way to do case-insensitive tag!() matching?
  • about 2 years Documentation: missing examples for some macros
  • about 2 years char_match!(pattern)
  • about 2 years char_if!(Fn(char) -> bool)
  • about 2 years Making a function generic over producers
  • about 2 years FileProducer ignores Move(Consume(..)) if the ConsumerState is Done
  • about 2 years Document breaking changes and upgrade path
  • about 2 years remove usages of `chain` from documentation and examples
  • about 2 years example for parsing indented languages
  • about 2 years example for parsing a simple programming language
  • about 2 years nom 2.0 tracking issue
  • about 2 years Documentation for simple VS verbose error management
  • about 2 years Documentation for do_parse, permutation, whitespace parsing
  • about 2 years Implement whitespace parsing for do_parse and permutation
  • about 2 years Investigate using another formulation to remove bounds checks
  • about 2 years Problem using switch!() with exhaustive patterns
  • about 2 years Stdin producer
  • about 2 years Consumer and State
  • about 2 years value! - unable to infer enough type information about `_`
  • about 2 years Using cond! together with count! unable to infer enough type information about `_`
  • about 2 years escaped_*! don't support &str
  • about 2 years take_bits documentation bug
nom open pull requests (View All Pulls)
  • Minor standardizations
  • Added IResult:: map_inc (map incomplete)
  • Added map (and is_known) function for enum Needed
  • cond_wrapping test overhaul
  • length value test overhaul
  • Addition cond_with_error
  • separated_list: macro overhaul, aditional test
  • Many0: overhaul, additional test, dedicated error
  • Partial fix for using named parsers in switch! macro
  • Add AtEof trait to determine when input has finished.
  • Proof of concept: change default error bound to Box<std::error::Error>
  • Allow expressions in field bindings in `chain!`
  • Add parsers list which use nom
  • Make take_until_s! work with strings that are not 3 characters long
  • Fix typo in `expr_opt` doc comment.
  • Add fold_many combinators
  • Add le_f32 and le_f64 functions.
  • Fix invalid doc
  • Fix type inference failing with custom error types on str macros
  • Don't consume list separator prematurely
  • Absolute macro references
  • Arithmetic expression parser to AST.
  • Add crushtool repo
  • Fix the doc string of take_bits
  • Fixed take_till! to be usable from the outside of nom crate
  • Test: Fix a wrong test case for `is_oct_digit`
  • Fix API documentation
  • Make MemProducer produce Input::Eof records like FileProducer does
  • Add XFS crate
  • Add negative lookahead
  • Fix clippy warnings on bit combinators
  • Fixed small typo in docs.
  • Fix #245
  • Made documentation locations more clear.
  • Rewrite of alpha, digit, hex_digit, ...
  • Add Offset trait and implementation of offset() for strings
  • add a way to convert an IResult to a std::result::Result
  • Add tag_nocase_s! to provide a case insensitive tag_s!
  • Use len() to determine size of consumed data in recognize!
  • eof: directly use InputLength, util is not public but is public use'd
  • Merge `_impl!` and `1!` macros into base macro
  • Add an example test for parsing float
  • AsChar: rename is_0_to_9 to is_dec_digit
  • Including doc comments in named! macro (Not complete)
  • Change eol to test for newline first
  • Make not_line_ending polymorphic over input.
  • Port the README to do_parse!
  • Don't use chain! in arithmetic tests
  • Add named_table! to match against many possible values
  • rewrite arithmetic tests in a more do_parse-y way
  • Port benches to do_parse
  • Improve documentation for `alt!` and `alt_complete!`
  • Add MagicaVoxel .vox file parser
  • Add bencode.rs to README.md
  • Fix documentation links
  • Add move_closure.
  • Avoid generic lifetime name to avoid collision with external lifetime parameters
  • [WIP] Propagated error type.
  • doc: Break long lines
  • Make char! more general to accept &str.
  • Make `named_args!` input slice lifetime hopefully different from provided lifetime
  • Remove AsChar requirement for Item in InputIter
  • Only mark FileProducer as Eof on full buffer
  • Implement `unwrap_or` for IResult
  • .cloned()'ed
  • Improve readability of the README
  • Makes Endianness derive from Debug, PartialEq, Eq, Clone and Copy.
  • Fix some formatting errors in documentation
  • Return Incomplete from escaped
  • Return Incomplete from separated{,_nonempty}_list
  • Simplification and fixes to the `alt!` macro
  • Implement the take_until_parser_matches macro
  • Make inner errors in many_till! visible (fix #463)
  • Support &str input for escape*
  • float() / float_s() / double() / double_s(): Accept numbers without decimals
  • Enable the str-related traits and modules when using the "core" feature
  • Added be_u24 method
  • [wip] Add example to `one_of`
  • Add `separated_list_complete` and `separated_nonempty_list_complete` macros
  • allow non primitive bits encoded integers
  • Add basic calculator language parser example to README
  • add u48 for unsigned 6 byte integer
  • Add tests for InputTake::take and InputTake::take_split
  • Allow trailing comma
  • ability to specify return type & input type for named args
  • Expand documentation for bits! combinator
  • Make escaped! and escaped_transform! usable with method!
  • Add pub(crate) support
  • Rework of FindSubstring using iterator instead
  • Ensure the docs for `do_parse!` are visible
  • [wip] Add recognise_many0 macro
  • Add native endianness.
  • fixed parsing of factors without spaces
  • Fix take_until_either/1
  • Fix typos in bytes.rs docs
  • optimize compare_no_case for str
  • WIP Add `alloc` feature guard
  • fix incomplete usage of fold_many0
  • Add CompleteByteSlice example to many1! docs
  • torrc parser has moved
  • remove not existing repos from README
  • Remove unused `use`s from take_until_either_1
  • Improve the many0 documentation
  • Add new rest_len parser.
  • Clippy fixes
nom questions on Stackoverflow (View All Questions)
  • Error DateTime::__construct(): Failed to Parse time string (nom) at position 0 (n)
  • Display nom from another table in gridview
  • DRF AttributeError type object 'QuerySet' has no attribute 'nom'
  • mr nom cannot execute
  • Stack overflow around variable "nom"
  • Mr nom won't start
  • nom de paquet incorrect - steam debian
  • Error loading modules in nodejs, node and nom installed using repos Ubuntu
  • Eating Exceptions in c# om nom nom
nom list of languages used
Other projects in Rust