json basic scripting language [update 7/18]

User projects written in or related to FreeBASIC.
rolliebollocks
Posts: 2655
Joined: Aug 28, 2008 10:54
Location: new york

Re: json basic scripting language [update 7/8]

Post by rolliebollocks »

Added escape codes:

Code: Select all

var s = "\"Hello World\"";
print( s );

s = "Hello \n World";

print( s );
 
AGS
Posts: 1284
Joined: Sep 25, 2007 0:26
Location: the Netherlands

Re: json basic scripting language [update 7/8]

Post by AGS »

I ran a simple test and got some interesting results.

Code: Select all

var a = 88
var b = 99
var c = a + b
There is a ; missing at the end of every line. The parser, however, will report that a variable
was not declared (a). If I turn a + b into b + a then I get a report that variable b was not
declared. Both could be true but ultimately the error is a missing semicolon at the end
of line 1.

The tokenizer cannot catch that error. But the parser should. Whenever you forget a ;
at the end of a line the parser may or may not given an error message.
But it will always fail to give the right result.

Leaving out the = gave some interesting results as well. The parser accepted it and gave the right
result when using the variable that was assigned to without using the =. An example.

Code: Select all

var  b  44;
print(b);
var b 44 is not correct. But jsb_parser prints 44 nevertheless. No error message. Nothing.

Another interesting issue has to do with what happens when there is an error in a line but no error
in the lines after that line. The error gets detected but the code beyond the line containing
the error gets executed nevertheless. You'd expect (or at least I would) that the interpreter would stop at the first error it finds.
Example code.

Code: Select all

var  44;
print(b); 
print(c);
print(h);
var c = 99;
print(c);
Output:
Illegal Function Call Number
b
c
h
99
0.01092517290982586
First the error message (related to line 1) and then execution continues regardless of the error.
That's actually kind of dangerous: what if code after the line with the error in it relies upon the
correct execution of the erroneous line?
The line that contains an error does not get executed but the line that relies upon correct execution of the
erroneous line does. That's not good.
rolliebollocks
Posts: 2655
Joined: Aug 28, 2008 10:54
Location: new york

Re: json basic scripting language [update 7/8]

Post by rolliebollocks »

The semi-colon issue can easily be fixed. The parser ends up interpreting the entire sequence of commands as a single line. It would be akin to typing:

var a = 88 var b = 99 var c = a + b

The interpreter then tries to evaluate what it now considers to be the RHS.

88 var b = 99 var c = a + b

And tries to evaluate that without ever having added a to the context. So it ends up being caught in the call to a+b.

I can add a simple check to the parser that will fix this and throw the proper error, and halt execution.

Code: Select all

    var  b  44;
    print(b);
     
This works but if you tried:

Code: Select all

var b 44+44;
print(b);
The result would be.. 44.

The reason for this is because there is no RHS in an expression that demands an RHS. The RHS is evaluated first, which renders the entire expression to one token and the sends it along to VAR.

When it gets into the evaluator, EVAL recognizes VAR as a function which takes 2 variables. It begins to build the arguments beginning at token 2 ( b ) until it finds two b+44. It then sends them both to the _VAR function in jsb_functions.bas and there you go. There is some special code for VAR because it has a unique signature, so I could definitely catch that problem there.

Code: Select all

var  44;
VAR expects a string to be passed as the first argument.
The line that contains an error does not get executed but the line that relies upon correct execution of the
erroneous line does. That's not good.
I agree. That can be fixed. I am somewhat concerned with speed too. Most examples execute at about .001/second - 00.5. I've been writing and testing my code on a crappy AMD A4 Quad Core. It's slow, in fact, it's slower and crappier than 90% of the computers that are out on the market now. But to see your code perform under the worst possible conditions is to see it truly perform. Still I'd like to get that Execution time down a bit.
rolliebollocks
Posts: 2655
Joined: Aug 28, 2008 10:54
Location: new york

Re: json basic scripting language [update 7/8]

Post by rolliebollocks »

@AGS

A handy little debugging tool I created:

Code: Select all

? curLine.tostring : sleep
You can basically print any token_array (which is what the lexer translates a script into) with .tostring ..
rolliebollocks
Posts: 2655
Joined: Aug 28, 2008 10:54
Location: new york

Re: json basic scripting language [update 7/8]

Post by rolliebollocks »

!! UPDATE !!

As soon as the Parser runs across either a malformed or null variable it halts execution and prints out the offending line. There is now a check for var which makes sure the = is there.

Since false is legitimate value for a variable, I used _NULL_ for errors. Malformed variables sometimes happen for legitimate reasons, so I gave all variables a default value of true and opposed to MALFORMED (which is just 0 / default) .. There is basically no reason why the parser should ever encounter a malformed variable.
rolliebollocks
Posts: 2655
Joined: Aug 28, 2008 10:54
Location: new york

Re: json basic scripting language [update 7/8]

Post by rolliebollocks »

There was an issue parsing nested parentheticals. I fixed it. Pastes to boards are updated and so is sourceforge.
rolliebollocks
Posts: 2655
Joined: Aug 28, 2008 10:54
Location: new york

Re: json basic scripting language [update 7/8]

Post by rolliebollocks »

There are two last known bugs:

Parser is bungling this. I fixed it but I haven't updated yet:

Code: Select all

var arr = [1,2,3,4,5];
var x = fix( rnd*arr.length );
print( x );
 
Parser bungles this:

Code: Select all

var arr = [1,2,3,4,5];
var idx = rnd*arr.length;
print( arr[ fix(idx) ] );
Basically, it can't handle expressions in arrays, which I thought I had taken care of but hadn't.
AGS
Posts: 1284
Joined: Sep 25, 2007 0:26
Location: the Netherlands

Re: json basic scripting language [update 7/18]

Post by AGS »

I've 'found' a way to get a nice execution trace. I use it so I can get a clear view of what is going on
when jsb_parser.exe executes.

To get a simple trace of the functions executed during execution of jsb_parser:
--> compile jsb_parse.bas using fbc -g -pp jsb_parse.bas (preprocessing only)
--> open the resulting file (jsb_parse.pp.bas) in a regex capable editor and execute two (global) find-replace commands
first find-replace command
------> find: ^([ \t]*(sub|function|destructor|constructor)[ \t]+[a-zA-Z_][a-zA-Z0-9_]*.+)$
------> replace all: \1\nprint __FUNCTION__

second find-replace
------> find: ^([ \t]*operator[ \t]+.+)$
------> replace all:\1\nprint __FUNCTION__

I am doing the replacement in two steps as I was not confident I could come up with a
regular expression that would cover all possible cases (I never use operator overloading
myself and wasn't quite sure what could come after the keyword operator).

You may have to use some alternative regex notation (as regex notation tends to be
fairly editor dependent).

Note: the above only works if case insensitive matching (when using regular expressions)
is either the default or select. If not available you'd have to write out every character.
sub would have to be rewritten as [Ss][Uu][Bb], function as [Ff][Uu][Nn][Cc][Ii][Oo][Nn] etc...

After the above steps (preprocessing and find-replace) compile the resulting jsb_parser.pp.bas as usual
and execute the program. All functions, subs, constructors, destructors, operators will, upon entry, print their name to stdout.

To get reasonable performance I redirect the output like so
[quote]
jsb_parser.pp.exe >jsb_parser.output
[/quote]

I use redirection as writing to the console is slow when compared to redirection console output to a file.
Even faster would be writing to a file opened in jsb_Parser.pp.bas. But that's a bit harder to get right (you have to introduce an extra variable to store the file handle, add statements to open/close the file etc...)

I looked at opportunities to speed up the interpreter.
Perhaps it pays to replace lookup functions like is_jsb_func with something else
[code]
function is_jsb_func( byref s as string ) as integer
[/quote]

What is_jsb_func does is lookup a string in a list of strings using instr. And before the search is performed it performs string concatenation. String concatenation is expensive. And the algorithm behind instr (boyer moore) does well with long patterns but isn't so hot when used in combination with short strings.

Most if not all compiler/interpreter packages resort to using hash tables to perform lookup of a string in a table of strings. In case of is_jsb_func you can also use a static solution that could look something like this
[code]
function check_func(byref s as string) as integer
select case as const s[0]
case asc("+"),asc("/"),asc("*"),asc("^"),asc("%")
select case s
case "+","/","*","^","%":return 1
end select
case asc("a")
select case s
case "abs","acos","asc","asin","atan","atan2": return 1
end select
case asc("b")
if s = "beep" then return 1
case asc("c")
select case s
case "chr","clear","cls","cos":return 1
end select
case asc("f")
select case s
case "first","fix","frac","function": return 1
end select
case asc("i")
select case s
case "instr","int": return 1
end select
case asc("l")
select case s
case "last","lcase","left","len","length","load_file_as_string","log": return 1
end select
case asc("m")
select case s
case "mid","mod":return 1
end select
case asc("p")
select case s
case "pluck","pop","print","push","push_back":return 1
end select
case asc("r")
select case s
case "right","rnd": return 1
end select
case asc("s")
select case s
case "sgn","sin","size","sleep","slice","split","sqr":return 1
end select
case asc("t")
select case s
case "tan","typeof": return 1
end select
case asc("u")
if s = "ucase" then return 1
case asc("v")
if s = "var" then return 1
case else
return 0
end select
return 0
end function
[/code]

I did a benchmark using both is_jsb_func and check_func. check_func did a lot better than is_jsb_func (on my PC that is. It has lots of ddr3 memory and an intel core i7 CPU). The size of the executable will be bigger when using check_func.

check_func only works for a static lookup table. The interpreter also uses at least one dynamic lookup table (is_user_func) which is stored as one big string. This table is also searched using instr. Replacing that lookup table with a hash table should help performance.

Apart from the above there are several places where the interpreter performs a linear search on a list (a vector) of items. It's a bit hard to say how that influences performance but a linear search on a vector to find something cannot be good for performance.

Replacing lookup tables with hashtables (or perhaps, in case of a static table, select case) and linear scanning of lists (vectors) with something else will help improve performance a bit. I would, however, be surprised if those changes alone will bring about the kind of increase in performance you are looking for.
rolliebollocks
Posts: 2655
Joined: Aug 28, 2008 10:54
Location: new york

Re: json basic scripting language [update 7/18]

Post by rolliebollocks »

Those are good suggestions.

According to my tests, if I send a blank string to the parser, which is caught by an if statement, which returns "false", it takes about 1.0 x 10^-5 ... this is without -exx or -g which I would imagine would be slower. I might be able to perform less checks in vector.bi and reduce calls to ubound/lbound ..

I'm somewhat familiar with boyer-moore, it uses a jump table which creates a 300 character time penalty versus a naive linear string search.

However, it out performs a naive string search, perhaps because FB does a check to see if the string is less than 300 characters and uses bm in that even - I don't know.

Here's an example:

Code: Select all

Function InStr_Naive ( byref strbig As String, byref strlittle As String ) As Integer
    
    Dim As Integer ll = Len(strlittle)-1, lb = Len(strbig)-1
    Dim As Integer OK = 0
    For i As Integer = 0 To lb
        If strbig[i] = strlittle[0] Then
            If strbig[i+ll] = strlittle[ll] Then
                OK = i+1
                For ii As Integer = i To i+ll
                    If strbig[ii] <> strlittle[ ii - i ] Then OK = 0:Exit For
                Next
            Else
                i+=ll
            Endif
        Endif
        If OK Then Return OK
    Next
   
    Return OK
   
End Function

dim as string all_func_names = ",var,function,print,sqr,+,/,*,^,%," _
                                 + "sin,cos,tan,asin,acos,atan,atan2,sqr,abs," _
                                 + "log,fix,int,frac,sgn,rnd,mod," _
                                 + "len,asc,chr,left,right,mid,lcase,ucase,instr," _
                                 + "pluck,load_file_as_string,beep," _
                                 + "length,pop,push,sleep,push_back,last,first," _
                                 + "size,split,slice,typeof,clear,cls," _
                                 + ""

dim as double tnow 
dim as integer idx 

tnow = timer
for i as integer = 0 to 100000
    idx = instr_naive( all_func_names, ",cls," )
next i
? timer - tnow
? idx
sleep

tnow = timer
for i as integer = 0 to 100000
    idx = instr( all_func_names, ",cls," )
next i
? timer - tnow
? idx
sleep
I'm going to take a look at the JSON parser and see if I can optimize that a bit.

[EDIT]

I will have to look into using a hash table. For my purposes with the text generator, JSB actually performs really well. If I want to use it in other projects, like if I was going to do a codewars game or something, which I do want to do, I may need to make some changes.

I will look into a hash table. That might be useful for multiple projects.
AGS
Posts: 1284
Joined: Sep 25, 2007 0:26
Location: the Netherlands

Re: json basic scripting language [update 7/18]

Post by AGS »

I found a couple of things in jsb_lexer.bas that are not bugs but seem out of place.

Code: Select all

case asc("+"), asc("/"), asc("*"), asc("%"), asc("^")
                
                check_quotes()
                
                if check1 <> "" then
                    var_or_func(tokens)
                endif
                
                addToken(_MATHOP_,ascii(s[i]),tokens)
                                
            case 9, 10, 13, 32
                check_quotes()
Code is taken from token_array.init. What I found odd was the use of symbolic names in one part of the code

Code: Select all

case asc("+"), asc("/"), asc("*"), asc("%"), asc("^")
and then in another part of the code 'magic numbers' are used

Code: Select all

            case 9, 10, 13, 32
                check_quotes()
Why not use asc for the values at the second case label as well

Code: Select all

            case asc(!"\t"), asc(!"\n"),asc(!"\r"),asc(!"\"")
                check_quotes()
Further down the code there is one more instance of a case label that, again, uses a number
instead of asc()

Code: Select all

            case 34
                if inQuotes = 0 then inQuotes = 1 else inQuotes = 0
instead of

Code: Select all

            case asc(!"\"")
                if inQuotes = 0 then inQuotes = 1 else inQuotes = 0
The grammar as I posted it on this thread is not finished. I am going to finish the grammar. Toi check whether the grammar is correct I'm going to put together a recognizer (using the grammar) for jsonbasic.
I will reuse your tokenizer to generate tokens for the recognizer.

I am going to use lemon, the parser generator used by sqlite, to generate a recognizer.
I had already written the grammar in a format close to what lemon needs (lhs ::= rhs.).
Lemon needs some additional info in order to generate a parser (mostly info on operator precedence/associativity).

Which brings me to some question with regard to the grammar
--> Is a ; after a } mandatory?
--> Can a compound statement be empty?
--> Is a ; allowed instead of a (compound) - statement
Would, for example, the following code be legal

Code: Select all

while (1) ;
or maybe

Code: Select all

while (1) {;};
or both? Or neither?
--> Are floating point numbers allowed and if so what is their syntax?
Can I perhaps do a copy-paste of the number syntax as found in the JSON grammar?

Code: Select all

number ::=  int
number ::=  int frac
number ::=  int exp
number ::=  int frac exp 

int ::= digit
int ::= digit1-9 digits
int ::= - digit
int ::= - digit1-9 digits 

frac ::= . digits

exp ::=  e digits
exp ::= digits
exp ::= digit
exp ::= digit digits

e    ::=  e
e    ::=  e+
e    ::=  e-
e    ::=  E
e    ::=  E+
e    ::=  E-
--> Is it possible to use function calls or variables in an initializer?
An example

Code: Select all

var s = "hello world";
var a = [sin(55),len(s),s];
Or can only literals be used (eg 5, "hello world")?
If variables are allowed it would be possible to write

Code: Select all

var s = "hello";
var t = "world";
var a = {s:t};
rolliebollocks
Posts: 2655
Joined: Aug 28, 2008 10:54
Location: new york

Re: json basic scripting language [update 7/18]

Post by rolliebollocks »

Hey,

Thanks for your interest in the project. I will need to run a few tests to answer all your questions. I currently am just on a break of sorts because I've been going at this project and the text generator since February and there is a bit of a burn out happening so I've taken a step back. But there appears to be some good interest in the text generator, and JSB is a major upgrade from the old scripting language so I'm definitely encouraged to continue development.
Why not use asc for the values at the second case label as well
Yeah, that was just lazy coding. I will fix that.

--> Is a ; after a } mandatory?

Yes. There is no internal way to that I wrote to catch this. Minimal error trapping was done. That means that in the case of a user function it will work, but in the case of a compound statement the line coming immediately after will not be parse at all. I haven't thought of a good way to deal with that.

--> Can a compound statement be empty?

Yes, definitely. Made sure to test that.

--> Is a ; allowed instead of a (compound) - statement

No, I didn't think to add that. The preferred syntax for something like that would be:

while( true ) {
<stuff>
};

But the lexer doesn't think to catch true there so it immediately evaluates to false. I can look into that. My guess is that the lexer tokenizing "true" as the name of a variable instead of a bool.

--> Are floating point numbers allowed and if so what is their syntax?

All numbers are double precision floating point numbers. This is a JSON thing. All variables conform identically to JSON standard.

--> Is it possible to use function calls or variables in an initializer?

Yes except for arrays and objects. The lexer processes arrays/objects immediately so it would be all sent as literals and mangled. You are pretty much forced to do this:

Code: Select all

var a = [];
var s = "hello world";

array.push_back( sin(55) );
array.push_back( len(s) );
array.push_back( s );
It would be such a pain in the ass to make that work. You can initialize strings or numbers with functions or basically anything except arrays and objects. I would almost have to rewrite the language from scratch to handle those types of expressions. I suppose it is a drawback to using JSON as a universal variable type or failing that my foresight.

Code: Select all

    var s = "hello";
    var t = "world";
    var a = {s:t};
I need to think about how I would go about that. The problem is that the lexer basically handles and stores variables and JSON doesn't allow that so neither does JSB. It's definitely possible though.
Post Reply