SLANG


What is it?

SLANG is short for Simple data description Language; it was introduced by Schürmann for the PLAYOUT project and is the main file format of MOOSE project (and for NyktOp, which was in the first time thought as a replacement for some bugs in MOOSE).


Structure

SLANG is, as the name tells us, a very simple file format; it is a very explicite textual file format, and thus is tends to cause big files.


Example

Following is the example for a SLANG file:

( SLANGVersion 2.00 )
( MCL-StoreOn
 <
         { the first toplevel instance }
  ( DotClass
   <
         { some simple attributes }
    ( KEY 123 )
    ( _x 100 )
    ( _y 100 )
         { a single relation }
    ( _partOf
     <
      ( KEY 234 PolylineClass )
     >
    )
   >
  )
  ( PolylineClass
   <
    ( KEY 234 )
         { some more complex attributes (themselfes again instances) }
    ( _start
     <
      ( _x 0 )
      ( _y 100 )
     >
    )
    ( _end
     <
      ( _x 100 )
      ( _y 0 )
     >
    )
        { a multiple relation } 
    ( _steps 1
     <
      ( KEY 123 DotClass ) 
     >
    )
   >
  )
 >
)

The Example defines a line of three dots; 2 dots are attributes of the line and one is connected via a relation.

Minimalist Lex Scanner

Below is an example for a simple lex-scanner to tokenize a SLANG file.

%%
"<"					{        return  RECBRA; }
">"					{        return  RECKET; }
"("					{        return LISTBRA; }
")"					{        return LISTKET; }
[A-Z_][A-Z_0-9.-=]*			{ STORE; return  SYMBOL; }
[+-]?[0-9]+([.][0-9]+([eE][+-][0-9]+)?)? {STORE; return  NUMBER; }
[\\][0-9]+				{ STORE; return    ENUM; }
[']([^']+|[']['])*[']			{ STORE; return  STRING; }
[{][^}]*[}]				{ /* NOP: comments   */ }
[\t ]+					{ /* NOP: whitespace */ }
[\n]					{ /* NOP: newlines   */ }
%%

The different symbols have to be defined e.g. inside the parser; the STORE-macro would need to strip the different irrelevant information from data (i.e. "\" before ENUM or '' around STRINGs).

Minimalist Yacc Parser

The following minimalist parser does just show the structure of a SLANG file; it does no semantic checking for certain types.

%token  RECBRA
%token  RECKET
%token LISTBRA
%token LISTKET
%token  SYMBOL
%token  NUMBER
%token    ENUM
%token  STRING
%%
file  : lists ;
lists : lists list | /**/ ; 
list  : LISTBRA SYMBOL data LISTKET ;
data  : data value | /**/ ; 
value : SYMBOL | NUMBER | ENUM | record ;
record: RECBRA lists RECKET ;
%%

We do also omit code to specify relations; (this has also been omitted in the scanner, that would have to scan for 'KEY' separately.

SuSE Linux Note

For an unknown reason (Ok, bison does explicitely instructs U to build such functions) no library seems to contain yywrap() nor yyerror(); more precise: there is no liby.a. Thus U might have to add something like

#include <stdio.h>
int yyerror (const char *s) { fprintf (stderr, "%s\n", s); return 1; }
int yywrap(void) { return 1; }

Notes and Problems

This file format is surely not the "best one"; but it is very simple and as long as I can't find another one, that has similar features (text-based, structured, nestable and representing data hierarchy), I can't add support for it.

Major disadvantages are for example:

The greatest disadvandage however is at the same time it's biggest advantage: The file format exactly matches a class hierachy. changes in a class hierachy cause changes in file format automatically. This is good, since using generic readers/writers there is no need to think about file I/O any more. But it is bad too, since model changes may cause our applications stopping to read elder slang files.

Alternatives

We could e.g. re-use the syntax used in FrameMaker MIF files (that is effectively using only lists, however bracketed with " < ... > "). They define references being possible as well via unique ids as via certain key-attributes. Or - better - XML, but can anyone gimme a hint about data structures in XML (dtd)? (the discussion notes on w3.org does not fit my needs; it seems not to give any possibility to re-use an attributename with a different purpose or type; furthermore IMHO implicite double-linked relations are a bit too sloppy for the file format, and I do not like references via find-by-attribute instead of index keys being the default; this slows down file-IO in an unneccessary way - IMHO - and might make it neccessary to do multi-pass reading; finally there is no obvious possibility to write Tabulators and Newlines - all whitespace seem to be treated equally in xml)


Summary of Links