When linear search O(N) is faster than hash table O(1) lookup

Consider quite popular in programming task of string-to-enum value parsing.

You have enum declared in your code (C/C++ but can be any modern programming language)

enum FormatterType {
  simple_formatter,
  integer_formatter,
  decimal_formatter,
  currency_formatter,
  date_formatter,
  date_local_formatter,
  time_formatter,
  time_local_formatter,
  enum_formatter,
  duration_formatter
};

And text representation of these values that are coming as tokens – strings:

"text",
"integer",
"decimal",
"currency",
"date",
"date-local",
"time", 
"time-local", 
"enum", 
"duration"

The task

To write a function that returns one of enum FormatterType constants when given a token – let it be std::string for brevity :

FormatterType t2e(std::string s) { ... }

In this article I will test three straightforward implementations of such string2enum function:

  1. linear lookup – that supposedly shall be slowest as it has O(N) performance.
  2. std::map lookup – binary search tree that is known of having O(log(N)) lookup performance.
  3. std::unordered_map lookup – hash table, should be fastest as we’ve been told it has O(1) complex lookup.

Linear search

The function is just a chain of if-elif-elif-else’es (I am using ternary operator here that is conceptually the if):

FormatterType if_t2e(std::string name)
{
  aux::chars name_chars = aux::chars_of(name);
  return
    name_chars == CHARS("text") ? simple_formatter
     : name_chars == CHARS("integer") ? integer_formatter
     : name_chars == CHARS("decimal") ? decimal_formatter
     : name_chars == CHARS("currency") ? currency_formatter
     : name_chars == CHARS("date") ? date_formatter
     : name_chars == CHARS("date-local") ? date_local_formatter
     : name_chars == CHARS("time") ? time_formatter
     : name_chars == CHARS("time-local") ? time_local_formatter
     : name_chars == CHARS("enum") ? enum_formatter
     : name_chars == CHARS("duration") ? duration_formatter
     : simple_formatter;
}

Where aux::chars is simple slice structure { const char* start; unsigned int length; } that has operator== defined as:

      bool aux::chars::operator == ( const slice& r ) const
      {
        if( length != r.length )
          return false;
        for( unsigned int i = 0; i < length; ++i )
          if( start[i] != r.start[i] )
            return false;
        return true;
      }

As you see it is dead simple – nothing in it to write home about.

Binary search tree lookup

std::map<std::string, FormatterType> map_index = {
  { "text", simple_formatter },
  { "integer", integer_formatter },
  { "decimal", decimal_formatter },
  { "currency", currency_formatter },
  { "date", date_formatter },
  { "date-local", date_local_formatter },
  { "time", time_formatter },
  { "time-local", time_local_formatter },
  { "enum", enum_formatter },
  { "duration", duration_formatter }
};

FormatterType map_t2e(std::string name)
{
  auto it = map_index.find(name);
  return (it != map_index.end()) ? it->second : simple_formatter;
}

Simple and obvious. And so is…

Hash table lookup

std::unordered_map<std::string, FormatterType> unordered_map_index = {
  { "text", simple_formatter },
  { "integer", integer_formatter },
  { "decimal", decimal_formatter },
  { "currency", currency_formatter },
  { "date", date_formatter },
  { "date-local", date_local_formatter },
  { "time", time_formatter },
  { "time-local", time_local_formatter },
  { "enum", enum_formatter },
  { "duration", duration_formatter }
};

FormatterType unordered_map_t2e(std::string name)
{
  auto it = unordered_map_index.find(name);
  return (it != unordered_map_index.end()) ? it->second : simple_formatter;
}

Testing

We will test these functions on random set of input strings that contains our tokens:

std::vector input;

void generate_input() {
  for (int n = 0; n < TOTAL_RUNS; ++n)
    input.push_back(std::string(input_tokens[std::rand() % ITEMS_IN(input_tokens)]));
}

To show the tendency I will test these functions on different sizes of enum sets : 4, 5, 6, 7, 8, 9 and 10 items per enum:

The result

As smaller the number below as faster the function performs. Yellow cells mark best cases.

And here is the surprise – linear lookup, that is supposed to be the slowest, is actually the fastest on small enums:

enum items std::map (sec) std::u_map (sec) linear (sec)
4 3.44 2.88 2.13
5 3.6 2.93 2.36
6 3.52 3.13 2.57
7 3.58 3.18 2.64
8 3.74 3.23 2.78
9 3.76 2.76 2.81
10 3.73 2.76 2.83

And the chart to visualize the results:

As lowest the line – the better. That gray line above is our linear lookup results. As you see on small enums it is the champion.

Resume

  1. On small enum sets linear lookup is better – that is O(N) outperforms O(1) !
  2. On larger enum sets hash table will work better.
  3. std::map (binary search tree) is the worst possible tool for implementation of such things.
  4. The best one is so called perfect hash function like the one produced by the  gperf . It generates FSA (finite state automata) unique to the given set of items in enum. gperf based function takes 1.86 seconds to perform lookup on enum having 9 items.

 

So why is that?

The reason is that each lookup method has its own hidden cost of lookup preparation.

  1. My linear lookup implementation has simply no preparation code – comparisons starts immediately.
  2. For the hash map to start doing lookup it needs to compute hash value of given string – O(n) task, where n is the length of the string.
  3. In binary search tree each left/right comparison is a O(n) task in worst case.
  4. And perfect hash (FSA) does not require preparation code, yet input string is scanned strictly once.

Afterwords

You can take test application terrainformatica.com/experiments/t2e.zip

aux::chars primitive is a reduced version of aux-slice.h file aux-slice.h file from Sciter SDK. Sciter uses slices instead of strings pretty much everywhere. Strings are used too but only as a storage of characters.

In fact that slice construct is what makes D language of Walter Bright the Great so fast – arrays and strings there are slices – structures containing only {start,length} fields.

Tokenizer + ::mark() = syntax colorizer

Here is selfie of syntax (tiscript) colorizer – the text below is a full source code of syntax highlighting routine.
The code has colorized itself:

syntax colorizer
syntax colorizer

Can your browser do that in 40 lines of code?

And here are styles that define style of tokens:

plaintext > text::mark(number) { color: brown; }
plaintext > text::mark(number-unit) { color: brown; }
plaintext > text::mark(string) { color: teal; }
plaintext > text::mark(keyword) { color: blue; }
plaintext > text::mark(symbol) { color: brown; }
plaintext > text::mark(literal) { color: brown; }
plaintext > text::mark(comment) { color: green; }

Easy, no?

And even shorter selfie, colorizer wrapped as an aspect component (referenced from colorizer.css):
colorizer

Repeatable: simple jQuery plugin for rendering lists, tables, etc.

While ago I’ve published simple and compact (90 lines of code) plugin for rendering, well, repeatables.

Here is its documentation and here is live demo.

The Repeatable is a mechanism of DOM population from array of objects. Repeatable template is defined directly in markup:

<ul id="people">
    <li><a href="mailto:{{this.email}}">{{this.name}}</a> <b if="this.age > 18">18+</b> </li> 
    <li>No data available</li>
</ul>

and that template gets cloned and instantiated for each record in the array.

Model-View-Whatever, the Plus engine for Sciter.

Preface

I would say that human history is a history of reinventing "wheels" of various kinds.

This time we see concept of data binding reincarnated with slightly pathetic name Model-View-Controller. Of course, as many people as many meaning they give to the MVC abbreviation but, nevertheless, it is all around basic idea of data binding – you have data (the Model these days) declaratevily bound with UI "controls" (the View). I believe Microsoft’s VisualBasic 4 and its IDE was the very first usable implementation of the idea. There was no Controller concept at that moment so their implementation was quite limiting – while you can implement 90% of your data editing needs using simple declarations you will spend 90% of your design time fighting with the rest of 10% of needed functionality.

The Plus framework for Sciter.

The Plus framework you can find in Sciter SDK is quite compact (400 LOC) and relatively simple implementation of that old data binding concept with controller means.

Note, the Plus is not an attempt to solve every html/css/script UI problem as AngularJS does. It is just a data binding mechanism with the concept of @observing functions (controllers in my interpretation).

Basics

Model in Plus interpretation is some tiscript namespace object that contains data variables (and optionally functions) to be bound with particular container in HTML DOM.

For example if you declare this script:

namespace Data {
  var correspondent = "world"; // variable to be bound
} 

and corresponding markup:

<section model="Data">
   Whom to greet: <input name="correspondent"> ?
   <p>The greeting: Hello <output name="correspondent">!</p>
</section>

and include in your document "plus.css" file you will get live data binding between Data.correspondent data variable and two DOM elements: two ways with input[name=correspondent] and one way (only view) binding with output[name=correspondent]. So when you type something in that input you will see the data also rendered in output element.  To see this alive load sdk/samples/+plus/demos/0-basic-variable-binding.htm in sciter.exe from its SDK.

The model and name DOM attributes.

Note that <section> element contains model="Data" attribute. It instructs the Plus engine to establish data binding between content of this section and members of namsepace Data {} in script. Name of the bound namespace can be any suitable, not just Data.

Any DOM element inside that section[model] may have name attribute defined. On binding initialization the Plus will try to find data element in the model with that name and if such data variable is found it will made two or one way (for <output> elements) binding between .value of that DOM element and the data variable. The name can be compound – may contain ‘.‘ (dot)-separated list of names. This way  you can bind DOM elements with object fields inside the model:

namespace Contact {
  var name = { first: "Albert", last: "Einshtein" };
  var phone = "....";
  ... 
}

and markup:

<form model="Contact" id="contact-details"> 
  <label for="name.first">First name></label> <input name="name.first">
  <label for="name.last">Last name></label> <input name="name.last">
  ...
</form>

Celsius to Fahrenheit convertor.

Controllers, the @observing function decorator.

File plus.tis (the Plus engine implementation) contains declaration of function decorator named @observing. With that decorator you can define functions that will be triggered (called by the engine) when variable(s) they are observing change.

As an example let’s define simple Celsius to Fahrenheit conversion tool that should work in two directions – when you define celcius value it will calculate its fahrenheit representation. And vice versa. Something similar to the form on the right:

First we will define our Data namespace:

      include "../plus.tis"; // model below uses @observing decorator defined in plus.tis  
      namespace Data // our model
      {  
        var celsius = 0; 
        var fahrenheit = 32;
        
        // this function is observing 'celsius' and calculates 'fahrenheit'
        @observing "celsius"
          function toFahrenheit() {
            fahrenheit = celsius * 9 / 5 + 32;
          }
        // this function is observing 'fahrenheit' and calculates 'celsius'
        @observing "fahrenheit"
          function toCelcius() {
            celsius = (fahrenheit - 32) * 5 / 9;
          }        
      }    

Note two functions above: function toFahrenheit() is observing celcius variable. When celcius variable will change, for example as a result of changes in <input|number(celsius)> field, the toFahrenheit() function will be triggered and will set value of fahrenheit variable. As we have another input bound with the fahrenheit variable:

<body model="Data">
  <p><input|number(celsius)>°C and <input|number(fahrenheit)>°F</p>
</body>

we will see in it results of calculation. This works in both directions – from fahrenheit to celcius and from celcius to fahrenheit.

To see this alive load sdk/samples/+plus/demos/1-basic-function-binding.htm sample in sciter.exe.

That’s it for now. In the next article I’ll explain use of repeatable attribute to bind arrays of objects with rpepatable sections and other samples. If you don’t want to wait check other samples in sdk/samples/+plus/demos/ folder of the SDK. They are self descriptive.

MVC or not MVC? The Formation Engine for jQuery

Various JavaScript frameworks provide data binding facilities (Knockout, AngularJS, etc.) these days.

They are based on kind-of-MVC concept: you have some data (structure, model), view rendering that data in html and some code in between that commonly named as controllers.

How successful/convenient those frameworks are subject of separate topic. For me they are too intrusive I would say. On some views/pages they make sense, on others final solution looks too ugly.

Anyway, here is an alternative idea…

Instead of separating data structure and its view we simply can create construction that is a data model and its view at the same time 🙂

I named that magic “construction” as Formation. Formation is essentially a collection of DOM elements organized in a tree that reproduces structure of the data (or model if you wish). Value of the formation is JSON data structure – the model per se.

Consider this example

Here you see collection of inputs and subsections (on the top).
On the bottom/right corner you see the Formation tree created of <section#inputs> container.

On the left you see live data of the Formation (editable textarea). Changes there reflect state of elements. In the same way changes in inputs reflect text in this textarea.

The Formation implementation does two major things:

  • creates formation trees and
  • initializes custom DOM elements (check that in HTML source – that friends list).

Couple of words about custom elements support in formations:

When formation sees custom DOM element ( any DOM element with tag name containing ‘-‘ inside ) it tries to find its initializer in registry of jQuery plugins – that famous $.fn collection. And calls it if it was found. You can check js/jquery.list-input.js – it is normal jQuery plugin with the name matching that custom element tag name: “INPUT-LIST”.

To create/get formation of some container you can call either

  1. global [window.] formation( domel_or_$_wrapper ) function or
  2. $(selector).formation() plugin.

You can store created formation in some variable and access DOM elements in it in quite effective manner:

var inputs = $("section#inputs").formation();
$(inputs.firstName).on("change", function() {...});

Accessing formation members directly is faster than accessing them using jQuery selectors as these are just direct references.

You can download complete sample to play with from here.

Future Formation plans: to implement so called repeatable formations, so if you have this markup:

<ul repeatable name="stockItems">
   <li><output name="name">  <output name="price" type="currency"></li>
</ul>

and will feed it (through formation) by this data:

[
  {name:"Apple", price: 1.05 },
  {name:"Orange", price: 0.52 } 
]

it will render two <li>s in the list.

Another ng-inspired idea is to have class switches as Formation elements, this class declaration:

<div class="{someSwitch:collapsed|expanded}" >...</div>

will cause corresponding formation to have element named “someSwitch” that can be assigned false/true or 0|1 values to change class to either <div class=”collapsed”> or to <div class=”expanded”>…

UPDATE: see discussion about the Formation on jQuery forum : forum.jquery.com/topic/mvc-or-not-mvc-the-formation-engine

Promises/A+ implementation in Sciter2

The Promises, as a concept, is generalization of callback mechanism. This pattern is quite popular these days so Sciter2 SDK contains now (sdk/samples/+promise/) pretty simple (60 lines of code) implementation of the Promises.

The promise is an object that:

  1. maintins list/chain of callback function pairs [onsuccess:function, onfailure:function] by providing .then(onsuccess, onfailure) method;
  2. promise object provides the way to "execute" the chain, either succes or failure callbacks (if an error occurs);
  3. each callback function in the chain receives input (parameters) from output (return [values]) of previous callback in the chain.

To create the promise in Sciter simply do this:

var oath = promise();

The promise() function and promise object

The promise() function in my implementation returns function/object that has .then() method defined on it. So to attach callback functions to the promise you will do this:

oath.then( function( data ) { return [data+1] } ) // #1
    .then( function( data ) { return [data+2] } ) // #2
    .then( function( data ) { stdout.println("success:", data)}, // #3 
           function( reason ) { stderr.println("error:", reason)} );

Now we have promise in variable oath that has three onsuccess functions assigned to it.

When time comes for the promise to be fulfilled, our code will do it by invoking the promise (as it is a function) with its first parameter set to true and with additional parameters that will be passed to the first callback in the chain:

oath(true, 1);

This will call first callback with 1 in data. It will return 1 + 1 -> 2.
That 2 value will be passed to second callback that will return 2 + 2 -> 4.
And finally last callback will just do println:

success:4

To reject the promise we just need to call it with first parameter set to false:

oath(false, "something went wrong!");

This will call our sole onerror callback and we will get:

error: something went wrong!

The promise.when() function, parallel execution

The promise has also defined static function promise.when(...)  that accepts list of promises and return another promise that will be fullfilled/rejected when all input promises will be completed.

function printBandC(b,c) { stdout.println(b,c) }

var BandC = 
    promise.when( self.request(#get-json, urlB),
                  self.request(#get-json, urlC)).then(printBandC);

There are quite many articles about the subject, just google for “Promises JavaScript”

Here is the full source of promise.tis module:

Continue reading “Promises/A+ implementation in Sciter2”

Sciter UI, basic principles. Calling code behind UI from worker threads.

I’ve updated the SDK with new sample: /sdk/demos/ui-framework .
This sample demonstrates principles outlined in “Sciter UI, application architecture” article.

Sciter demo screenshot

In particular it demonstrates definition of native function (view.execTask() -> window::exec_task()) and call of UI methods (callbacks in this sample) from worker threads.

I’ve explained idea of calling UI code from worker threads in this article but in this demo I am using slightly different mechanism.

The gui_exec function looks like this:

// this function is called from worker threads to 
// execute the gui_block in GUI thread
inline void gui_exec( std::function<void()> gui_block )
{
  sciter::sync::event evt;
  PostThreadMessage(gGUIThreadId, WM_EXEC, WPARAM(&evt),LPARAM(&gui_block));
  evt.wait(); // suspend worker thread until GUI will execute the block.
}

It posts the message into GUI thread input queue. Receiver of WM_EXEC message is Windows WH_GETMESSAGE hook function:

// message hook to handle WM_EXEC in GUI thread
static LRESULT CALLBACK gui_exec_hook_GetMsgProc(int code, WPARAM wParam, LPARAM lParam )
{
  MSG* pmsg = reinterpret_cast<MSG*>(lParam);
  if(pmsg->message == WM_EXEC)
  {
    sciter::sync::event* pe = reinterpret_cast<sciter::sync::event*>(pmsg->wParam);
    gui_block* pf = reinterpret_cast<gui_block*>(pmsg->lParam);
    (*pf)();      // execute the block in this GUI thread
    pe->signal(); // signal that we've done with it
                  // this will resume execution of worker thread.
  }
  return CallNextHookEx(0,code, wParam,lParam);
}

Using hooks allows this mechanism to work reliably even when application is running modal dialog loops.

Q.tis – micro port of jQuery for Sciter.

I’ve published today Sciter 2.0.2.2 with q.tis – micro-port of essential jQuery features.
Here is the list of supported functions.

It is just enough to put include "t.tis"; in your code and any existing DOM function that returns array of elements will
“automagically” produce the q-collection.

In my implementation I am using the fact that functions like Element.selectAll("selector") return array object that is instanceof ElementList.
ElementList as any class is extensible in run-time. So I’ve just added bunch of function ElementList.jqueryMethod() {} to bring that functionality to the Sciter.

The beauty of the approach is that you can use as the q() function (analog of $() in jQuery) as Sciter’s standard $$() to write something like this:

var itemsWithLinks = $$(ul.topics > li).$has(a:link);

to get list of list items that have <a>’s inside.
The same but in classic style and without use of “stringizers”:

var itemsWithLinks = q("ul.topics > li").has("a:link");

You may also find handy jquery-alike event handling. Methods element.on(), element.off(), element.one() and element.trigger() are available as for q-collections as for DOM element instances.