A final BASIC dialect...

Berkeley · Post by **Berkeley** » Jun 15, 2024 16:35

Since a rather long time I've got the idea of a platform independent, modern, reliable final BASIC dialect. FreeBASIC comes extremely close to that, except for a few flaws. Maybe this forum is the best platform for getting suggestions... Perhaps it leads to further improvements of FreeBASIC...

There might be different uses i.e.: applications (fullscreen & GUI), compiled and interpreted, "applets" (plugins in webbrowsers), SSI like Perl/PHP/Python, an alternative to ECMA-Script and macro scripting e.g. in office suites and for desktop automation, resp. overall any application scripting. But all should use in principle the same syntax - although you might not copy your code in a webserver script and it will draw a website...

Therefore should be only a rather limited set of instructions comparable to C, which only consists of "keywords" instead of "instructions" and "loads" most of its functionality from libraries.

The BASIC standard should be rather similar to natural spoken English, as if you are telling the computer in words what it should do. And it should be still be based upon the original or traditional BASIC if it makes sense.

LOAD
Because of this, I'd use LOAD instead of #include. You may still use preprocessor directives for a BASIC compiler, but this should be deprecated. The code shouldn't be platform dependent and only use BASIC instructions. You may of course still link DirectX libraries. But in principle there should be no need for a #ifdef __WIN32__ or similar; this should be used in external C/ASM coded libraries/object files. Same applies to Inline-Assembler.

The LOAD instruction merges other BASIC source codes or links libraries/object files depending on file extensions. Maybe ".BAS" is for BASIC listings, ".BI" might be supported too. Real libraries have no extension. To make a C analogous example: "LOAD "stdio"" will add the prototypes e.g. of "printf()" to the namespace, and the linker(compiler collection) will know to link the "stdio.lib" resp. "stdio.dll", "stdio.a" and so on... The compiler settings are demanding which version of library - among other things debug or release - and if to link statically or dynamically. So the programmer has only to write "LOAD" and the library will work with the first test run.

Datatypes and declarations
"DIM AS <type>" is very tradition-conscious BASIC. But I'd keep still the suffixes. To be precisely: $ for strings, % for integers, ! for booleans, § for decimal variables and ? for a maybe-variant-type. This spares typedefs and is more obvious. - It will also mean that e.g. MID$() is the function and MID the instruction... If you use a new variable without suffix, it is expected to be of the type "real", which means normally double, but might be also "float" on feeble hardware or "long double". "Integer" is signed 32 bit but may vary too, <variable>.size or SIZEOF(<variable>) will reveal the current range resp. accuracy. "mystring$ AS INTEGER" will cause a syntax error - even "i% AS UINT16".

Variables and subroutines shouldn't have to be declared before you use them. Therefore a compiler/interpreter has to collect prototypes and typedefs first and parse a sourcecodefile then in a second run-through. All subroutines(functions) are distinguished by parameter lists and return values, although this only means boolean, strings, buffers/pointers and normal numeric values. Further "BYVAL" is default, only BYREF has to be stated.

Loops
I think "REPEAT" is better than "DO". REPEAT, WHILE, LOOP and UNTIL may be combined as you want. REPEAT can also have the argument "<n> TIMES" which repeats the loop content n times. In machine code this means a faster loop counting to 0, e.g. "8 TIMES" from 7 to 0. Maybe you might also access the counter. - "<variable> AS INDEX" ?

Instead of "EXIT" I'd use "LEAVE". "GOTO" should exist still too, although I believe EXIT/LEAVE makes it superflues.

The NEXT of FOR-NEXT may show/allow the counting variable, added by the editor, but not required. I'd further add FOR EACH from PHP:

Code: Select all

FOR EACH a$ = name$() i% AS INDEX
  b$=MID$(a$,...)
  b$=MID$(name$(i%),...)
NEXT a$

which allows to process all elements of an array a bit more comfortably.

Berkeley · Post by **Berkeley** » Jun 28, 2024 19:26

A core idea is an own bytecode for source files, a bit like the java binaries(.class), although the BASIC files also contain "symbol" names and comments - the whole source code. Script variations won't use this bytecode of course. The "IDE" would be interpreter and editor in one, so there's a syntax check already when you enter a line, not just when compiling. It allows auto-completion and much faster interpretation compared to text. Each instruction has a specific code, symbols an ID. You may distribute these platform-independent files by removing all comments and "symbols", but they could be also compiled - they are still source files. The concept is old school - homecomputers had only a few or 1 kibibyte of memory, therefore it was necessary to use one byte for an instruction like "PRINT", instead of 5 or more... and of course it's 5 times faster to check only 1 byte to get the instruction...

Logically, any "symbol" is known to the interpreter/compiler with no need for a "DECLARE". But it's nothing that can't be performed with a better compiler or editor.

You may vary the BASIC syntax further in the editor, e.g. use "ECHO" instead of "PRINT". The editor may even accept several instruction names, but will convert it to the predefined. So you will enter "echo "hello world"" and the editor turns it into "PRINT "hello world"". Also the code for "END_IF" might be printed
as "END IF" or "ENDIF", just how the user wants.

Subroutines/functions should be almost treated as the same like in C. Although you use a SUB like an instruction, the real instruction would be "GOSUB", but this word is hidden resp. superflues. SUBs may return optional values - error codes => perhaps only int32 values ? RETURN without value is also the instruction to leave a SUB before reaching END SUB and may work synonymously with END SUB, although it might be inserted automatically by the one or another way. There can't be subroutines inside other subroutines. Functions must return a value, and can't be called like a SUB / like a BASIC instruction.

Subroutines and functions may perhaps use different types of symbols, so they could have the same name anyway, as long the SUB version hasn't got a compatible return type like the FUNCTION => this confuses the parser. Different subroutines may also have the same name if they use incompatible parameter types. Those may be of course strings and numbers, perhaps also boolean values, and structs/classes. You can assign default values, which are used if you leave a parameter space empty ( "CALL ,,"hello"").

Special parameters:
ellipsis: SUB MYSUB ( myvalue%()... )
allows a perhaps infinitely list of parameters like with the CHR() function. "myvalue%()" becomes an Integer array with as much elements as parameters are given - of course you can determine the count of this array's fields, no "argc" needed...
"expression chain": - like PRINT uses. In principle you only have one string, but it is created out of a series of expressions. You concatenate expressions using ";" and "," where "," might insert a tabulator code and a "missing" ";" at the end causes to get a line break. For this I don't have an idea yet, but of course you should be able to demand what happens with the ";" and the ",". Expression chains exclude the use of any other parameters after them like an ellipsis, but you might use parameters before them - PRINT #X would be a different instruction BTW anyway.
keywords: "flag=0 AS ENUM(ANY=1)" - you get a variable flag that is 1, if you use as parameter "ANY", otherwise 0. See INSTR() function of FreeBASIC.

Commas may be also optional for parameters in many cases, where you can use space chars. Even omitted keyword parameters don't need commas when/because it requires an exact matching valid keyword of the list.

marcov · Post by **marcov** » Jun 28, 2024 21:15

Berkeley wrote: ↑Jun 28, 2024 19:26 A core idea is an own bytecode for source files, a bit like the java binaries(.class), although the BASIC files also contain "symbol" names and comments - the whole source code.

A bytecode is an intermediate format, not a source. I assume you mean tokenized Basic sources here (like C=64's Basic V2), but that is something different from UCSD/Java/C# bytecode.

Tokenized file formats are closer to source, bytescodes are closer to the backend/executable (basically assembler for a virtual machine, type inferred in later iterations).

Anyway, tokenizing sources has been proved a unsuccess way since it limits you to one editor (and adds an decent specialised editor as absolute requirement for your development system to become usable)

Script variations won't use this bytecode of course.

Why? It probably interprets faster than untokenised plain text, though it won't be that much of a factor.

one byte for an instruction like "PRINT", instead of 5 or more... and of course it's 5 times faster to check only 1 byte to get the instruction...

1 byte is a special case and limits you to a maximum of 256 symbols. Anyway overall it won't be five times anyway, as you still will have to parse parameters and stuff.

This because it is still basically only slightly mangled source, and additional parsing and transformations to executions (and specially efficient execution) still have to be redone each time.

This also goes for Java/C# bytecode that still has to make the translation from machine independent virtual machine to actually VM interpreter/executable code, but in the compilation from source to bytecode at least some more transformations have been done.

You may vary the BASIC syntax further in the editor, e.g. use "ECHO" instead of "PRINT". The editor may even accept several instruction names, but will convert it to the predefined. So you will enter "echo "hello world"" and the editor turns it into "PRINT "hello world"". Also the code for "END_IF" might be printed
as "END IF" or "ENDIF", just how the user wants.

If multiple words resolve to one token, if the programmer loads the source again, will it still look like they typed it?

Subroutines/functions should be almost treated as the same like in C. Although you use a SUB like an instruction, the real instruction would be "GOSUB", but this word is hidden resp. superflues.

A Basic V2 gosub only handles jumping to the code, and returning from it, but not parameter handling, that is what SUB added. How do you see that? The whole point of C functions is that the calling code and the called code can be compiled separately with only the function signature to link them.

When do you expect to have an initially working system?

marcov · Post by **marcov** » Jun 28, 2024 21:43

Berkeley wrote: ↑Jun 15, 2024 16:35 "GOTO" should exist still too, although I believe EXIT/LEAVE makes it superflues.

I prefer "COMEFROM" rather than GOTO

Anyway your language changes sound a bit anecdotal. I don't see a overall principle or something else overarching coming through.

Also your core application domain is a bit murky as you basically say "we'll do everything, if you want us to". That is not a choice, that is shying away from it.

p.s. sorry if I sound as a teacher, have been teaching a new junior programmer recently.

caseih · Post by **caseih** » Jun 29, 2024 3:32

Developing a new language and accompanying compiler or interpreter is an interesting and worthwhile project, regardless of the outcome. I've often thought that every serious programmer should embark on this sort of thing at least once. It's highly educational and occasionally rewarding. We spent a semester at uni building a compiler for a language the professor created. I learned a lot. I have a lot of respect and admiration for the FB devs now! It's very hard work. I hope you are open to learning more about computer language theory and compiler design, including formal grammars and parsing. Don't make the same mistakes the original PHP developer made!

I haven't spent much time perusing the FB source code, but I imagine the parser for any dialect of BASIC is quite complicated as there are lots of irregularities and special cases compared to languages with much simpler grammars such as C, so lots of backtracking required, and surely a two-pass parser. And way too many keywords in my opinion. haha.

Anyway the new language field is definitely quite full these days with plenty of interesting choices to pick from, such as Rust, Zig, Nim, Go, etc. Kind of a golden age of language experimentation and design.

Lost Zergling · Post by **Lost Zergling** » Jun 29, 2024 6:13

Kind of a golden age of language experimentation and design.

Agreed. Requirements, especially for "AI" (#undefined error), sounds like a new frontier for langage design.
@Berkeley. FB (low level capabilities) is suitable for coding your own keywords. To this purpose, you can define objects and properties in a .bi file.
You may then use then this object behaviour as an extension.
Complicated as you have to manage (and design) interface (kinematic), doable.
Lzle project ( https://www.freebasic.net/forum/viewtopic.php?t=26533 ) is an attempt doing the for each you may expect.

Berkeley · Post by **Berkeley** » Jun 29, 2024 8:07

marcov wrote: ↑Jun 28, 2024 21:15
Berkeley wrote: ↑Jun 28, 2024 19:26 A core idea is an own bytecode for source files, a bit like the java binaries(.class), although the BASIC files also contain "symbol" names and comments - the whole source code.
A bytecode is an intermediate format, not a source. I assume you mean tokenized Basic sources here (like C=64's Basic V2), but that is something different from UCSD/Java/C# bytecode.

In this case it is both - the bytecode is almost 1:1 a compressed form of the source code; each instruction a bytecode, but it's still the source code, translated to another form, that could be faster interpreted. What Java/.NET uses is more like an own machine code where you can't get back the source code; decompilation is guessing.

marcov wrote:Anyway, tokenizing sources has been proved a unsuccess way since it limits you to one editor

This bytecode should be final, open and well documented.

Script variations won't use this bytecode of course.
Why?

It won't be longer a script language otherwise

In principle you might make a "PHP" script interpreter, that can handle those BASIC files... But for best performance it will create machine code anyway.

marcov wrote:1 byte is a special case and limits you to a maximum of 256 symbols.

In those days this was rather more too much than few...

marcov wrote:If multiple words resolve to one token, if the programmer loads the source again, will it still look like they typed it?

Not for sure but it will change already when you entered the line. A text import function will have the effect of mutating a loaded source code.

marcov wrote: A Basic V2 gosub only handles jumping to the code, and returning from it, but not parameter handling, that is what SUB added.

- "PROCEDURE"... And in principle, it's almost the same. The returning address of GOSUB resp. "jump sub routine" is stored on the stack, and taken back from it again, parameters are put there additionally the same way - in C as machine code. In the earlier days there was no need for BASIC having local variables.

marcov wrote: When do you expect to have an initially working system?

First of all the whole syntax and bytecode should be planned.

Lost Zergling · Post by **Lost Zergling** » Jun 29, 2024 11:23

The spirit and description reminds me on a bit the old Psion OPL/w, wich was all good technology indeed.
The difficulty in going from pseudo code to 'bytecode' wich means :
- less readable
- the risk of a Perl like evolution (oriented macros commands)

marcov · Post by **marcov** » Jun 29, 2024 12:20

Berkeley wrote: ↑Jun 29, 2024 8:07 In this case it is both - the bytecode is almost 1:1 a compressed form of the source code; each instruction a bytecode, but it's still the source code, translated to another form, that could be faster interpreted.

But that still means it is only a tokenized source form, not a bytecode. A bytecode emphasises maximum interpretation speed or (in the case of .NET and Java 1.5+) JIT.

What Java/.NET uses is more like an own machine code where you can't get back the source code; decompilation is guessing.

So did the original bytecode, UCSD Pascal, that is what a bytecode is. Geared at fastest interpretation, but still machine type independent.

Script variations won't use this bytecode of course.
Why?

It won't be longer a script language otherwise

[/quote]

But what is then the advantage of the tokenized format at all? If for scripting it isn't tokenized, and for best performance it will be compiled. How many execution forms will there be? Script, bytecode, JIT and static compilation ?

marcov wrote:1 byte is a special case and limits you to a maximum of 256 symbols.
In those days this was rather more too much than few...

Yes. And we lugged with suitcases and called them a portable computer, and mobile phones were mostly something for doctors. Time has moved on.

Not for sure but it will change already when you entered the line. A text import function will have the effect of mutating a loaded source code.

I personally then don't see any advantage of then having multiple keywords go to the same token code. It won't easy anything (since you have to know them all anyway, since one might change to the other).

marcov wrote: A Basic V2 gosub only handles jumping to the code, and returning from it, but not parameter handling, that is what SUB added.

- "PROCEDURE"... And in principle, it's almost the same.
The returning address of GOSUB resp. "jump sub routine" is stored on the stack, and taken back from it again, parameters are put there additionally the same way - in C as machine code.
[/quote]

By machine code, not as machine code. But for that to work the sub has to interpret it in exactly the same way. Best performance of course, but sometimes a bit limiting in an interpreter.

In the earlier days there was no need for BASIC having local variables.

Because all we had then was 38911 bytes free. But even then, to me when I started QBasic, subs were an relief, easier to compartmentalise code.

Berkeley · Post by **Berkeley** » Jun 30, 2024 0:04

A "bytecode" is simply a byte-code - non-human-readable bit sequences. You may call speed-optimized code "platform-independent machine code"...

Nowadays it is no sensible performance boost of course. You can compile a mebibyte of code in a blink of an eye... The only advantage is that it forces you to avoid syntax errors while you are coding rather to check for errors by hitting "build". And you may type "ECHO" instead of "PRINT" getting the same result. It makes compiling also much easier. But in principle, the idea is a relict concept. In RAD you may not compile and can catch any error without abstract "application errors". That's all.

I personally then don't see any advantage of then having multiple keywords go to the same token code.

If you "come" from another programming language, it will be useful if "switch" / "select" turns into the "correct" "CHECK" for instance... And in principle, it's forced code-completion because you can't run your program if your currently typed line has a syntax error. You can not only use "END IF" and "ENDIF" as you want, you can also tell your editor to show it how you want it.

aurelVZAB · Post by **aurelVZAB** » Jun 30, 2024 18:44

Too much planing and talking lead into nothing ..i know that .
Changing keywords is nothing ..people hate to learn new keywords.
heh ...good luck

marcov · Post by **marcov** » Jun 30, 2024 21:01

Berkeley wrote: ↑Jun 30, 2024 0:04 A "bytecode" is simply a byte-code - non-human-readable bit sequences. You may call speed-optimized code "platform-independent machine code"...

wikipedia wrote: Bytecode (also called portable code or p-code[citation needed]) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable[1] source code, bytecodes are compact numeric codes, constants, and references (normally numeric addresses) that encode the result of compiler parsing and performing semantic analysis of things like type, scope, and nesting depths of program objects.

Nowadays it is no sensible performance boost of course.

You can compile a mebibyte of code in a blink of an eye... The only advantage is that it forces you to avoid syntax errors while you are coding rather to check for errors by hitting "build".

Well, that is a bit optimistic, and depends on your compiler and settings. But the bottleneck is often not in the parsing, so indeed a don't see a use for a tokenised source.

If you "come" from another programming language, it will be useful if "switch" / "select" turns into the "correct" "CHECK" for instance...

I would rather communicate with people that use the same language than mix in random bits of old, and everytime I download or use a piece of code written by somebody else, or by myself in a period make allowances.

I don't do

#define begin {
#define end }

in C either, even if I'm sometimes tempted if I read over another tiny }

Anyway, I still don't get it.

Lost Zergling · Post by **Lost Zergling** » Jul 01, 2024 8:49

@Berkeley
What is a computer language? Basically (if I dare say), an interface between the formalization of the human understanding of a problem and the machine. The machine is a Von Neuman type architecture: a data bus, an address bus and a command bus, all operated by the microprocessor. This is a little less true on AI or parallel flow machines, but the principle remains: the data manipulated must always be identified, at a critical moment, according to this classification. The role of the language is therefore to bring the level of formalization of the machine closer to that of the human, in an operational context (scientific, business, etc.)
I think you have in mind the nature of the Basic language (as close as possible to the so-called natural language) (but what is 'natural language?'). The so-called natural language includes several languages and a wide variety of thinking and linguistic structures. In addition, it evolves according to social norms and the comprehension schemes associated with it (ie #hashtag, etc) (I like 'Check' also, but I'd prefer to use it to qualify data status than to test instruction).
I remember ideas of renaming keywords, but they seem quite anecdotal. The idea of using bytecode could be interesting: under Notes/Domino, there was a macro-command language (@Command..), and a second one (a basic vb script, vb family, fb,..), the two complemented each other operationally, but did not communicate directly. A pseudo-code to bytecode architecture with automatic executable linking would be another approach to scalability: the language would remain the same, it would only be the syntactic approach (implicit and possibly explicit) that would guide things. It remains technical. The strength of this approach could be the RAD side, the weakness, the conceptual delay on tools like llvm.
Of course, such an approach would be fundamentally different from that of FB, drawing a line on low-level capabilities.
It would therefore be a high-level language, and therefore a business ontology is needed. I am not commenting on the redundancy of keywords, but there could perhaps be an interest, for FB, in a variation that would offer a restricted, scalable and compatible subset of very simple keywords, accessible to the youngest, and associated documentation, and FB would be, in a way, the "pro version".
Make no mistake, a limited but nevertheless operational version would be a titanic task. Rewriting or changing the name of fundamental keywords (tests, loops, variables) is no way (just like a post it), it would be necessary to over-implement (one way or another) original and intuitive thought automatisms.

aurelVZAB · Post by **aurelVZAB** » Jul 01, 2024 18:32

well there is one ;

Development started in early 2020 by Frank Hoogerbeets

https://sharpbasic.com/

Berkeley · Post by **Berkeley** » Jul 01, 2024 22:51

I'd keep the "tokenized" source, although it has not great benefits. Secondarily, it makes the parser programming a bit easier, primarily it's a good base for planning the syntax - what is the core BASIC, and what's peripheral. The replaceable nature of instructions is more required for the destination enviroment than for the source of the code.

Precisely spoken: you might put out messages with PRINT to the console, STDOUT the Unix shell or Windows' prompt. But you may also make outputs to a much more advanced console like that I made with RETROGRA, where your code doesn't change but the library in principle. - THIS is the idea of how BASIC programming should be: a reliable simple enviroment for beginners. You start your "editor" (interpreter), type a few lines and hit "run". Inspired by GFA-BASIC. Don't need to think about resolutions, double buffering and so on, and it works with every supported platform. On the other hand you should have no limits - although I'd suggest to use a BASIC controlled graphic/game engine instead of OpenGL instructions in BASIC...

Maybe you misunderstood the structure of the bytecode: It's divided in several parts like an executable - you've got a section that holds the symbol names, a section for comments, a section for data and const strings, and the section with the source code, where every instruction/keyword has an own code, variables are a reference to the variables table i.e. also a code resp. a number. If you just strip off the comments and symbol names, it would still work, but your editor can't longer show you comments and names. So the bytecode is not just a obfuscation of readable code, but nowadays superflues.

A final BASIC dialect...

A final BASIC dialect...

Re: A final BASIC dialect...

Re: A final BASIC dialect...

Re: A final BASIC dialect...

Re: A final BASIC dialect...

Re: A final BASIC dialect...

Re: A final BASIC dialect...

Re: A final BASIC dialect...

Re: A final BASIC dialect...

Re: A final BASIC dialect...

Re: A final BASIC dialect...

Re: A final BASIC dialect...

Re: A final BASIC dialect...

Re: A final BASIC dialect...

Re: A final BASIC dialect...