'Forth, interpreted or compiled?

Supposedly Forth programs can be "compiled" but I don't see how that is true if they have words that are only evaluated at runtime. For example, there is the word DOES> which stores words for evaluation at runtime. If those words include an EVALUATE or INTERPRET word then there will be a runtime need for the dictionary.

To support such statements it would mean the entire word list (dictionary) would have to be embedded inside the program, essentially what interpreted programs (not compiled programs) do.

This would seem to prevent you from compiling small programs using Forth because the entire dictionary would have to be embedded in the program, even if you used only a fraction of the words in the dictionary.

Is this correct, or is there some way to compile Forth programs without embedding the dictionary? (maybe by not using runtime words at all ??)



Solution 1:[1]

Forth programs can be compiled with or without word headers. The headers include the word names (called "name space").

In the scenario you describe, where the program may include run-time evalutation calls such as EVALUATE, the headers will be needed.

  • The dictionary can be divided into three logically distinct parts: name space, code space, and data space. Code and data are needed for program execution, names are usually not.

  • A normal Forth program will usually not do any runtime evaluation. So in most cases, the names aren't needed in a compiled program.

  • The code after DOES> is compiled, so it's not evaluated at run time. It's executed at run time.

  • Even though names are included, they usually don't add much to program size.

  • Many Forths do have a way to leave out the names from a program. Some have a switch to remove word headers (the names). Other have cross compilers which keep the names in the host system during compile time, but generate target code without names.

Solution 2:[2]

No, the entire dictionary need not be embedded, nor compiled. All that need remain is just the list of words used, and their parent words, (& grandparents, etc.). And the even names of the words aren't necessary, the word locations are enough. Forth code compiled by such methods can be about as compact as it gets, rivaling or even surpassing assembly language in executable size.

Proof by example: Tom Almy's ForthCMP, an '80s-'90s MSDOS compiler that shrunk executable code way down. Its README says:

 .    Compiles Forth into machine code -- not interpreted.

 .    ForthCMP is written in Forth so that Forth code can be executed
      during compilation, as is customary in Forth applications.

 .    Very fast -- ForthCMP compiles Forth code into an executable file
      in a single pass.

 .    Generated code is extremely compact. Over 110 Forth "primitives"
      are compiled in-line. ForthCMP performs constant expression
      folding, strength reduction, register optimization, DO...LOOP
      optimization, tail recursion, and various "peephole"
      optimizations.

 .    Built-in assembler.

4C.COM runs under emulators like dosemu or dosbox.

A "Hello World" compiles into a 117 byte .COM file, a wc program compiles to a 3K .COM file (from 5K of source code). No dictionary or external libraries, (aside from standard MSDOS calls, i.e. the OS it runs on).

Solution 3:[3]

Forth can be a bear to get your head around from the outside because there is NO standard implementation of the language. Much of what people see are from the early days of Forth when the author (Charles Moore) was still massaging his own thoughts. Or worse, homemade systems that people call Forth because it has a stack but are really not Forth.

So is Forth Interpreted or Compiled? Short answer: both

Early years: Forth had a text interpreter facing the programmer. So Interpreted: Check

But... The ':' character enabled the compiler which "compiled" the addresses of the words in the language so it was "compiled" but not as native machine code. It was lists of addresses where the code was in memory. The clever part was that those addresses could be run with a list "interpreter" that was only 2 or 3 instructions on most machines and a few more on an old 8 bit CPU. That meant it was still pretty fast and quite space efficient. These systems are more of an image system so yes the system goes along with your program but some of those system kernels were 8K bytes for the entire run-time including the compiler and interpreter. Not heavy lifting.

This is what most people think of as Forth. See JonesForth for a literate example. (This was called "threaded code" at the time, not to be confused with multi-threading)

1990ish Forth gurus and Chuck Moore began to realize that a Forth language primitive could be as little as one machine instruction on modern machines so why not just compile the instruction rather than the address. This became very useful with 32bit machines since the address was sometimes bigger than the instruction. They could then replace the little 3 instruction interpreter with the native CALL/Return instructions of the processor. This was called sub-routine threading. The front end interpreter did not disappear. It simply kicked off native code sub-routines

Today Commercial Forth systems generate native code, inline many/most primitives and do many of the other optimization tricks you see in modern compilers. They still have an interpreter facing the programmer. :-)

You can also buy (or build) Forth cross-compilers that create standalone executables for different CPUs that include multi-tasking, TCP/IP stacks and guess what, that text interpreter can be compiled into the executable as an option for remote debugging and configuration if you want it.

So is Forth Interpreted or Compiled? Still both.

Solution 4:[4]

You are right that a program that executes INTERPRET (EVALUATE, LOAD, INCLUDE etc.) is obliged to have a dictionary. That is hardly a disadvantage because even a 64 bit executable is merely a 50 K for Linux or MS-Windows. Modern single board computer like the MSP430 can have the whole dictionary in flash memory. See ciforth and noforth respectively. Then there is scripting. If you use Forth as a scripting language, it is similar to perl or python The script is small, and doesn't contain the whole language. It requires though that the language is installed on your computer.

In case of really small computers you can resort to cross compiling or using an umbellical Forth where the dictionary is positioned on a host computer and communicates and programs via a serial line. These are special techniques that are normally not needed. You can't use INTERPRETing code in those cases on the sbc, because obviously there is no dictionary there.

Note: mentioning the DOES> instruction doesn't serve to make the question clearer. I recommend that you edit this out.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Brian Fox
Solution 4