When linear search O(N) is faster than hash table O(1) lookup

Consider quite popular in programming task of string-to-enum value parsing.

You have enum declared in your code (C/C++ but can be any modern programming language)

enum FormatterType {
  simple_formatter,
  integer_formatter,
  decimal_formatter,
  currency_formatter,
  date_formatter,
  date_local_formatter,
  time_formatter,
  time_local_formatter,
  enum_formatter,
  duration_formatter
};

And text representation of these values that are coming as tokens – strings:

"text",
"integer",
"decimal",
"currency",
"date",
"date-local",
"time", 
"time-local", 
"enum", 
"duration"

The task

To write a function that returns one of enum FormatterType constants when given a token – let it be std::string for brevity :

FormatterType t2e(std::string s) { ... }

In this article I will test three straightforward implementations of such string2enum function:

  1. linear lookup – that supposedly shall be slowest as it has O(N) performance.
  2. std::map lookup – binary search tree that is known of having O(log(N)) lookup performance.
  3. std::unordered_map lookup – hash table, should be fastest as we’ve been told it has O(1) complex lookup.

Linear search

The function is just a chain of if-elif-elif-else’es (I am using ternary operator here that is conceptually the if):

FormatterType if_t2e(std::string name)
{
  aux::chars name_chars = aux::chars_of(name);
  return
    name_chars == CHARS("text") ? simple_formatter
     : name_chars == CHARS("integer") ? integer_formatter
     : name_chars == CHARS("decimal") ? decimal_formatter
     : name_chars == CHARS("currency") ? currency_formatter
     : name_chars == CHARS("date") ? date_formatter
     : name_chars == CHARS("date-local") ? date_local_formatter
     : name_chars == CHARS("time") ? time_formatter
     : name_chars == CHARS("time-local") ? time_local_formatter
     : name_chars == CHARS("enum") ? enum_formatter
     : name_chars == CHARS("duration") ? duration_formatter
     : simple_formatter;
}

Where aux::chars is simple slice structure { const char* start; unsigned int length; } that has operator== defined as:

      bool aux::chars::operator == ( const slice& r ) const
      {
        if( length != r.length )
          return false;
        for( unsigned int i = 0; i < length; ++i )
          if( start[i] != r.start[i] )
            return false;
        return true;
      }

As you see it is dead simple – nothing in it to write home about.

Binary search tree lookup

std::map<std::string, FormatterType> map_index = {
  { "text", simple_formatter },
  { "integer", integer_formatter },
  { "decimal", decimal_formatter },
  { "currency", currency_formatter },
  { "date", date_formatter },
  { "date-local", date_local_formatter },
  { "time", time_formatter },
  { "time-local", time_local_formatter },
  { "enum", enum_formatter },
  { "duration", duration_formatter }
};

FormatterType map_t2e(std::string name)
{
  auto it = map_index.find(name);
  return (it != map_index.end()) ? it->second : simple_formatter;
}

Simple and obvious. And so is…

Hash table lookup

std::unordered_map<std::string, FormatterType> unordered_map_index = {
  { "text", simple_formatter },
  { "integer", integer_formatter },
  { "decimal", decimal_formatter },
  { "currency", currency_formatter },
  { "date", date_formatter },
  { "date-local", date_local_formatter },
  { "time", time_formatter },
  { "time-local", time_local_formatter },
  { "enum", enum_formatter },
  { "duration", duration_formatter }
};

FormatterType unordered_map_t2e(std::string name)
{
  auto it = unordered_map_index.find(name);
  return (it != unordered_map_index.end()) ? it->second : simple_formatter;
}

Testing

We will test these functions on random set of input strings that contains our tokens:

std::vector input;

void generate_input() {
  for (int n = 0; n < TOTAL_RUNS; ++n)
    input.push_back(std::string(input_tokens[std::rand() % ITEMS_IN(input_tokens)]));
}

To show the tendency I will test these functions on different sizes of enum sets : 4, 5, 6, 7, 8, 9 and 10 items per enum:

The result

As smaller the number below as faster the function performs. Yellow cells mark best cases.

And here is the surprise – linear lookup, that is supposed to be the slowest, is actually the fastest on small enums:

enum items std::map (sec) std::u_map (sec) linear (sec)
4 3.44 2.88 2.13
5 3.6 2.93 2.36
6 3.52 3.13 2.57
7 3.58 3.18 2.64
8 3.74 3.23 2.78
9 3.76 2.76 2.81
10 3.73 2.76 2.83

And the chart to visualize the results:

As lowest the line – the better. That gray line above is our linear lookup results. As you see on small enums it is the champion.

Resume

  1. On small enum sets linear lookup is better – that is O(N) outperforms O(1) !
  2. On larger enum sets hash table will work better.
  3. std::map (binary search tree) is the worst possible tool for implementation of such things.
  4. The best one is so called perfect hash function like the one produced by the  gperf . It generates FSA (finite state automata) unique to the given set of items in enum. gperf based function takes 1.86 seconds to perform lookup on enum having 9 items.

 

So why is that?

The reason is that each lookup method has its own hidden cost of lookup preparation.

  1. My linear lookup implementation has simply no preparation code – comparisons starts immediately.
  2. For the hash map to start doing lookup it needs to compute hash value of given string – O(n) task, where n is the length of the string.
  3. In binary search tree each left/right comparison is a O(n) task in worst case.
  4. And perfect hash (FSA) does not require preparation code, yet input string is scanned strictly once.

Afterwords

You can take test application terrainformatica.com/experiments/t2e.zip

aux::chars primitive is a reduced version of aux-slice.h file aux-slice.h file from Sciter SDK. Sciter uses slices instead of strings pretty much everywhere. Strings are used too but only as a storage of characters.

In fact that slice construct is what makes D language of Walter Bright the Great so fast – arrays and strings there are slices – structures containing only {start,length} fields.

Sciter UI, basic principles. Calling code behind UI from worker threads.

I’ve updated the SDK with new sample: /sdk/demos/ui-framework .
This sample demonstrates principles outlined in “Sciter UI, application architecture” article.

Sciter demo screenshot

In particular it demonstrates definition of native function (view.execTask() -> window::exec_task()) and call of UI methods (callbacks in this sample) from worker threads.

I’ve explained idea of calling UI code from worker threads in this article but in this demo I am using slightly different mechanism.

The gui_exec function looks like this:

// this function is called from worker threads to 
// execute the gui_block in GUI thread
inline void gui_exec( std::function<void()> gui_block )
{
  sciter::sync::event evt;
  PostThreadMessage(gGUIThreadId, WM_EXEC, WPARAM(&evt),LPARAM(&gui_block));
  evt.wait(); // suspend worker thread until GUI will execute the block.
}

It posts the message into GUI thread input queue. Receiver of WM_EXEC message is Windows WH_GETMESSAGE hook function:

// message hook to handle WM_EXEC in GUI thread
static LRESULT CALLBACK gui_exec_hook_GetMsgProc(int code, WPARAM wParam, LPARAM lParam )
{
  MSG* pmsg = reinterpret_cast<MSG*>(lParam);
  if(pmsg->message == WM_EXEC)
  {
    sciter::sync::event* pe = reinterpret_cast<sciter::sync::event*>(pmsg->wParam);
    gui_block* pf = reinterpret_cast<gui_block*>(pmsg->lParam);
    (*pf)();      // execute the block in this GUI thread
    pe->signal(); // signal that we've done with it
                  // this will resume execution of worker thread.
  }
  return CallNextHookEx(0,code, wParam,lParam);
}

Using hooks allows this mechanism to work reliably even when application is running modal dialog loops.

Comments in source code considered harmful

In the file ResourceFontFileLoader.h (offical Windows SDK samples) you can see variable declared as:

static IDWriteFontFileLoader* instance_;

As you see it is a raw pointer here.

And in the file ResourceFontFileLoader.cpp it is initialized with this:

// Smart pointer to singleton instance of the font file loader.
IDWriteFontFileLoader* ResourceFontFileLoader::instance_(
    new(std::nothrow) ResourceFontFileLoader()
    );

Note the comment. “Smart pointer” there is just a good wish. I believe someone reviewed that code and according to the comment all this passed code review. But the code leaks memory. So that comment does quite opposite to what it supposed to do actually – it is not helpful but rather harmful.

Sciter UI, application architecture

Architecture of applications that use Sciter based UI can be visualized as this:

Typically such applications contain two distinct layers:

  • UI layer that uses Sciter window with loaded HTML/CSS and scripts (code behind UI);
  • Application logic layer, most of the time that is native code implementing logic of the application.

Ideally these two layers shall be split appart – isolated from each other as they use conceptually different code models and probably code styles.

UI layer uses event driven model: "on click here expand section there and send request to logic layer for some data".

Application logic layer (ALL) is more linear usually. It is is a collection of functions that accepts some parameters and return some data. Even if ALL uses threads code inside such threads is still linear.

UI and app-logic interaction principles:

Most of the time code execution in UI applications is initiated by the UI itself but sometimes application code may generate its own events. For the UI such events are not anyhow different from pure UI events like mouse/keyboard clicks and the like. Informational flow between UI and ALL conceptually fall into these three groups:

  1. "get-request" – synchronous calls from UI to logic layer to get some data:
  2. "post-request" – asynchronous calls with callback "when-ready" functions:
  3. "application events" – application detects some change and needs to notify UI to reflect it somehow:

To support all these scenarios application can use only two "entry points" :

  • UI-to-logic calls: event_handler::on_script_call(name,args,retval)
  • logic-to-UI calls:  sciter::host:call_function(name, args ) – calls scripting function from C/C++ code. The name here could be a path: "namespace.func".  

get-requests

To handle UI-to-logic calls the application defines sciter::event_handler and attaches its instance to the Sciter window (view). Its on_script_call method will be invoked each time when script executes code like this in scipt:

view.getSomeData(param1, param2);

that will end up in this C/C++ call:

event_handler::on_script_call(NULL,
         "getSomeData", 
         2 /*argc*/ , 
         argv[2], 
         SCITER_VALUE& retval /* return value */ );

Sciter SDK contains convenient macro wrapper/dispatcher for such on_script_call function:

  class window
    : public sciter::host<window>
    , public sciter::event_handler
  {
    HWND   _hwnd;
    ...
    
    json::value  debug(unsigned argc, const json::value* arg);      
    json::value  getSomeData(json::value param1, json::value param2);      

BEGIN_FUNCTION_MAP
  FUNCTION_V("debug", debug);  
  FUNCTION_2("getSomeData", getSomeData); 
END_FUNCTION_MAP
  }

Declaration FUNCTION_2("getSomeData", getSomeData); binds view.getSomeData() in script with native window::getSomeData call.  

Therefore functionality exposed to the UI layer by logic layer can be defined as a content of single BEGIN_FUNCTION_MAP/END_FUNCTION_MAP block.

If your application contains many modules that are connected dynamically then you can define single view.exec("path", params...) function that will do name/call dispatch using some other principles:

var newAccount = view.exec("accounts/new", initialBalance);
view.exec("accounts/delete", accountId);
view.exec("accounts/update", {customerName:"new name"} );

application events

Application can generate some events by itself. When some condition or state inside application changes it may want to notify the UI about it. To do that application code can simply call function in script namespace with needed parameters.

Let’s assume that script has following declaration:

namespace Accounts 
{
  function created( accountId, accountProps ) {
     $(#accountList).append(...);
  }
  function deleted( accountId, accountProps ) {
     $(#accountList li[accid={accountId}]).remove();
  }
}

Then the application code can fire such events by simply calling:

window* pw = ...
pw->call_function("Accounts.created", accId, accFields );
pw->call_function("Accounts.deleted", accId );

post-request

Need of post request arises when some of work need to be done inside worker threads. Some task either take too long to complete or data for them needs to be loaded from the Net or other remote sources. UI cannot be blocked for long time – it still shall be responsive. The same situation happens in Web applications when JavaScript needs to send AJAX request. In this case callback functions are used. Call to native code includes reference to script function that will be executed when the requested data is available.

Consider this UI script function that asks app-logic to create some account on a remote server:

function createAccount( accountProps ) 
  {
    function whenCreated( accountId ) // inner callback function
    { 
      $(#accountList).append(...);
    }
    view.exec("accounts/create", accountProps, whenCreated );
  }

It passes accountProps data and callback function reference to the "accounts/create" thread. This thread creates the account (presumably takes some time) and invokes whenCreated at the end.

class createAccount: worker_thread 
  {
    handle<window> ui;
    SCITER_VALUE props;
    SCITER_VALUE callback;

    void run()
    {  // the thread body
       // ... do some time consuming stuff ...

       SCITER_VALUE accountId = createAccount(props);

       // done, execute the callback in UI thread:
       ui->ui_exec([=]() {
         callback.call(accountId); 
       }); 
    }
}

Note about that ui_exec function above: the UI is single threaded by its nature – singly display device, single keyboard and mouse, etc. Worker threads shall not access the UI directly – the UI shall be updated from UI thread only. The ui_exec function does just that – executes block of code in UI thread. See C++0x: Running code in GUI thread from worker threads article about it.

Epilogue

Having just two "ports"  (out-bound UI-to-logic and in-bound logic-to-UI) is a good thing in principle. This allows to isolate effectively two different worlds – asynchronous UI and deterministic application logic world. Easily "debuggable" and manageable.

HTML, CSS and script (code behind UI) runs in most natural mode and application core is comfortable too – not tied with the UI and its event and threading model.


boost.coroutine vs my $generator.

Today I saw discussion about boost.coroutine on gmane:

Generators and coroutines are conceptually the same feature and I have implementation of $generator thing that can be used to implement coroutines.

My implementation is of 15 lines of code and does not require any threads/fibers and so context switches – pure C++ with macro magic, pretty straightforward though.

Anyway, here is an example how coroutine can be implemented using that $generator thing.
First, let’s define coroutine (or generator) that will establish connection with some server and will supply (yield) data chunks read from socket:

$generator(net_reader)
{
   socket_t     sock;
   byte         buffer[2048];

   net_reader(const char* addr) { sock.connect(addr); }
   
   // from $emit to $stop is a body of our coroutine:
    
   $emit(slice<byte>) // Coroutine will emit slice<byte> values. Start of coroutine body.

      while( sock.valid() )
      {
          slice<byte> r(buffer,0);
          r.length = sock.read(buffer,sizeof(buffer));
          if(r.length)
            $yield( r ); 
          else 
            break; 
      }

   $stop; // stop. End of coroutine body.
};

Having this we can write code that will read data chunk by chunk and store it in some array.


array<byte> data;
net_reader  get("data.example.com:4002");

for(slice<byte> chunk; get(chunk);)
  data.push(chunk);

That’s easy, no?

Yes, $generator thing is not free from problems (e.g. you cannot use switch() inside its body) but other than that and in 99% of coroutine cases you will get what is needed from generators/coroutines. And without those tons of code.