Want to take your software engineering career to the next level? Join the mailing list for career tips & advice Click here


Lexing experiments in PHP

Subscribe to updates I use Phlexy

Statistics on Phlexy

Number of watchers on Github 121
Number of open issues 2
Average time to close an issue about 2 hours
Main language PHP
Average time to merge a PR less than a minute
Open pull requests 2+
Closed pull requests 0+
Last commit over 7 years ago
Repo Created almost 8 years ago
Repo Last Updated over 2 years ago
Size 193 KB
Organization / Authornikic
Latest Releasev0.1
Page Updated
Do you use Phlexy? Leave a review!
View open issues (2)
View Phlexy activity
View on github
Fresh, new opensource launches 🚀🚀🚀
Software engineers: It's time to get promoted. Starting NOW! Subscribe to my mailing list and I will equip you with tools, tips and actionable advice to grow in your career.
Evaluating Phlexy for your project? Score Explanation
Commits Score (?)
Issues & PR Score (?)


This project is a followup to my post on fast lexing in PHP. It contains a few lexer implementations (both stateless and stateful) and related performance tests.


Lexers are created from a lexer definition using a factory class.

For example, if you want to create a preg_replace based stateless CSV lexer, you can use the following code:

require 'path/to/lib/Phlexy/bootstrap.php';

$factory = new Phlexy\LexerFactory\Stateless\UsingPregReplace(
    new Phlexy\LexerDataGenerator

$lexer = $factory->createLexer(array(
    '[^",\r\n]+'                     => 0, // 0, 1, 2, 3 are the tokens
    '"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"' => 1, // they should really be constants
    ','                              => 2,
    '\r?\n'                          => 3,

$tokens = $lexer->lex("hallo world,foo bar,more foo,more bar,\"rare , escape\",some more,stuff\n...");

Similarly a stateful lexer:

require 'path/to/lib/Phlexy/bootstrap.php';

$factory = new Phlexy\LexerFactory\Stateful\UsingCompiledRegex(
    new Phlexy\LexerDataGenerator

// The "i" is an additional modifier (all createLexer methods accept it)
$lexer = $factory->createLexer($lexerDefinition, 'i');

For an example of a stateful lexer definition, you can look the definition for lexing PHP source code.


A performance comparison for the different lexer implementations can be done using the performance testing script:

$ /c/php-5.4.1/php examples/performanceTests.php

Timing lexing of CVS data:
Took 0.33259892463684 seconds (Phlexy\Lexer\Stateless\Simple)
Took 0.28691792488098 seconds (Phlexy\Lexer\Stateless\WithCapturingGroups)
Took 0.26784682273865 seconds (Phlexy\Lexer\Stateless\WithoutCapturingGroups)
Took 0.22256088256836 seconds (Phlexy\Lexer\Stateless\UsingPregReplace)

Timing alphabet lexing of all "a":
Took 0.30809283256531 seconds (Phlexy\Lexer\Stateless\Simple)
Took 0.40949702262878 seconds (Phlexy\Lexer\Stateless\WithCapturingGroups)
Took 0.38628792762756 seconds (Phlexy\Lexer\Stateless\WithoutCapturingGroups)
Took 0.31351900100708 seconds (Phlexy\Lexer\Stateless\UsingPregReplace)

Timing alphabet lexing of all "z":
Took 0.62087893486023 seconds (Phlexy\Lexer\Stateless\Simple)
Took 0.23668503761292 seconds (Phlexy\Lexer\Stateless\WithCapturingGroups)
Took 0.22538208961487 seconds (Phlexy\Lexer\Stateless\WithoutCapturingGroups)
Took 0.18682312965393 seconds (Phlexy\Lexer\Stateless\UsingPregReplace)

Timing alphabet lexing of random string:
Took 0.94398212432861 seconds (Phlexy\Lexer\Stateless\Simple)
Took 0.42041087150574 seconds (Phlexy\Lexer\Stateless\WithCapturingGroups)
Took 0.40309715270996 seconds (Phlexy\Lexer\Stateless\WithoutCapturingGroups)
Took 0.37058591842651 seconds (Phlexy\Lexer\Stateless\UsingPregReplace)

Timing PHP lexing of this file:
Took 0.098251104354858 seconds (Phlexy\Lexer\Stateful\Simple)
Took 0.020735025405884 seconds (Phlexy\Lexer\Stateful\UsingCompiledRegex)

Timing PHP lexing of larger TestAbstract file:
Took 0.268701076507570 seconds (Phlexy\Lexer\Stateful\Simple)
Took 0.065788984298706 seconds (Phlexy\Lexer\Stateful\UsingCompiledRegex)

Stateless\Simple and Stateful\Simple are trivial lexer implementations (which loop through the regular expressions).

Stateless\WithoutCapturingGroups, Stateless\WithCapturingGroups and Stateful\UsingCompiledRegex use the compiled regex approach described in the blog post mentioned above.

Stateless\UsingPregReplace is an extension of the compiled regex approach, where the looping through the regular expression is done by (mis)using preg_replace_callback.

As the above performance measurments show, the Simple approach is a good bit slower than using compiled regexes. For the CVS data it's only 1.17 times faster, but the difference significantly increases the more regular expressions there are. E.g. lexing of the alphabet on a random string is more than twice as fast. For lexing PHP the compiled approach is five times as fast.

The preg_replace trick makes the whole thing another bit faster. Sadly preg_replace can't be used for stateful lexers, at least I couldn't figure out a fast way to do the state transitions.

Phlexy open pull requests (View All Pulls)
  • added how to install via composer
  • Add travis
Phlexy list of languages used
Phlexy latest release notes
v0.1 Phlexy 0.1

Just so we have a release to reference in Composer :)

Other projects in PHP