Native Oberon: Symbol and Object File Format

P. Reali

Introduction

This document describes the object and symbol file format used in PC Native Oberon 2.3.4 and greater and in binary compatibles systems.

Notation

The EBNF (Extended Backus Naur Format) is used to describe the syntax of the symbol and object file format. The semantics of the format are specified whenever needed.

We use the following writing conventions:

name write 0X terminated string
name0 write string with 0X compression
num 1 write num as 1 byte value
num 2 write num as 2 byte value
num 4 write num as 4 byte value
num write num as compressed value

0X compression integrates the 0X terminating the string in the last character by adding 80X to it. Exceptions to this rule are the empty string ( written as 0X ) and strings containing characters bigger than 7EX. Appendix B contains the zero compression code.

Symbol File

The symbol file implements a variation of the fine-grained object fingerprinting presentend in R. Crelier dissertation, also known as Object Model (OM). The OM allows to extend a symbol file, e.g. add symbols to a module interface, without invalidating the clients of the module.

SymFile = {modname0}
    [SFConst {Structure name0 val}]
    [SFvar {[SFreadonly] Structure name0}]
    [SFlproc {Structure name0 ParList}]
    [SFxproc {Structure name0 ParList}]
    [SFoperator {Structure name0 ParList}]
    [SFalias {Structure name0}]
    [SFtyp {Structure}]
    SFEnd
     
ParList = {[SFvar] Structure name0} SFEnd
Structure = Basic | UserStr | oldstr | modno (name0 | 0X oldimpstrn)
Basic = SFtypBool .. SFtypNilTyp
UserStr = [SFinvisible][SFsysflag flag] UserStr2
UserStr2 = (SFtypOpenArr | SFtypDynArr) Structure name0
  | SFtypArray Structure name0 size
  | SFtypPointer Structure name0
  | SFtypProcTyp Structure name0 ParList
  | SFtypRecord Structure name0 prio flags RecStr
RecDef = { FieldDef }[SFtproc {MethodDef}] SFend
FieldDef = [SFreadonly] Structure name0
MethodDef = Structure name0 ParList
     

Basic Types Encoding    
SFtypBool = 01X
SFtypChar = 02X
SFtypSInt = 03X
SFtypInt = 04X
SFtypLInt = 05X
SFtypHInt = 06X
SFtypReal = 07X
SFtypLReal = 08X
SFtypSet = 09X
SFtypString = 0AX
SFtypNoTyp = 0BX
SFtypNilTyp = 0CX
SFtypByte = 0DX
SFtypSptr = 0EX
     
Composed Types    
SFtypOpenArr = 2EX
SFtypDynArr = 2FX
SFtypArray = 30X
SFtypPointer = 31X
SFtypRecord = 32X
SFtypProcTyp = 33X
     
Flags    
SFsysflag = 34X
SFinvisible = 35X
SFreadonly = 36X
     
Section Delimiters    
SFconst = 37X
SFvar = 38X
SFlproc = 39X
SFxproc = 3AX
SFoperator = 3BX
SFtproc = 3CX
SFalias = 3DX
SFtyp = 3EX
SFend = 3FX
     

Object File

ObjectFile = OFTag OFVersion symfilesize SymbolFile
    Header Entries Commands Pointers Imports VarConsLinks Links
    Consts Exports Code Use Types References
OFTag = 0BBX
OFVersion = 0AFX
     

Heading

Header = refSize 4 nofEntries 2 nofCommands 2 nofPointers 2
    nofTypes 2 nofImports 2 nofVarConsLinks 2 nofLinks 2
    dataSize 4 constSize 2 codeSize 2 moduleName
     

This sections gives the number of entries in the following sections.

Entry Section

The entry table contains the address relative to the code base of the exported procedures. Also in this table are the procedures that are assigned to a procedure variable, because the assignment requires the absolute address of the procedure.

Entries = 82X {entryOffset 2}
     

Command Section

Exported procedures without parameters are commands and can be invoked by the system. cmdOffset is relative to the code base.

Commands = 83X {cmdName cmdOffset 2 }
     

Pointer Section

This section lists the pointers in the global variables. This information is used as root set for the current module by the garbage collector. The pointerOffset is relative to the static base of the module and is always a negative number (variables are stored below the static base).

Pointers = 84X {pointerOffset 4 }
     

Import Section

This section lists the modules needed by the current modules. These modules must be loaded before the current module.

Imports = 85X {moduleName}
     

  
VarConstLink Section

This section contains the fixup lists for global variables and constants (including type descriptors). The list contains the count of fixes to be done and their offset relative to the code base. The address of the entry has to be added to the value found at the offset!

All the entries of the current module are grouped into the list with mod = 00 and entry = 0FFFFH.

For every imported entry, mod is the module where the entry is defined (as implicitly numbered in the import section), entry is always 0. The Use Section (4.10) contains the table that maps an imported symbol to a fixup list.

VarConstLinks = 8DX { VarConstLinkEntry }
VarConstLinkEntry = mod 1 entry 2 count 2 {offset 2}
     

Link Section

This section contains the fixup list for procedure calls, system calls, and some other special fixups in the code. offset is relative to the code base. That location contains the address of the next fixup (this fixup chain is embedded in the code).

The following mod / entry have special meanings:

00 / 255 case table fixup
00 / 254 local procedure assignment
00 / 253 system call to NewRec
00 / 252 system call to NewSys
00 / 251 system call to NewArr
00 / 250 system call to Start
00 / 249 system call to Passivate
00 / 247 system call to Lock
00 / 246 system call to Unlock

Links = 86X {LinkEntry}
LinkEntry = mod 1 entry 1 offset 2
     

Const and Code Sections

The Consts are loaded in memory beginning at the static base.

Consts = 87X {char 1}
Code = 89X {char 1}
     

  
Export Section

This section lists the exported symbols of the module

Exports = 88X nofExports 2 {ExportEntry} 0X
ExportEntry = FP fixup [1X ExportRecord]
ExportRecord = oldref
  | tdentry [1X ExportRecord] nofFPs 2 {FP [1X ExportRecord]} 0X
     

The Export Section contains the information and linking point of all the exported symbols in the module. The ExportEntry describes different entry kinds, depending on the value of the fixup:

A record may be described in two ways (recognized by the first value read):

  
Use Section

Use = 08AX {UsedModules} 0X
UsedModules = moduleName0 {UsedVar |UsedProc |UsedType} 0X
UsedVar = FP varName0 fixlist [1X UsedRecord]
UsedProc = FP procName0 offset
UsedType = FP typeName0 0X [1X UsedRecord]
UsedRecord = tdentry [FP "@"] 0X
     

The Use Section contains the information for the linker about all the imported entries (Variables, Procedures and Type Descriptors). Every entry has the format fingerprint name value. Variables have a positive value, procedures a negative one and types have value = 0.

Variables

fixlist is an index to a fixup list in the VarConstLink Section (4.6. The address to be patched is found in the Export Section (4.9) of the module exporting the variable using the fingerprint.

Procedures

The fixup chain for this call starts at -offset relative to the code base. The address to be patched is found in the Export Section (4.9) of the module exporting the procedure.

Type Descriptors

For every imported type descriptor, an hidden copy of the pointer to the descriptor is allocated in the local constant section at offset tdentry. The address to be patched is found using the fingerprint in the Export Section (4.9) of the module exporting the Type.

Types Section

This section contains the description about the type descriptors.

Types = TypeTag {TypeEntry}
TypeEntry = size 4 tdaddr 2 Base Count name Methods Pointers
Base = module 2 entry 4
Count = nofMethods 2 nofInheritedMethods 2 nofNewMethods 2 nofPointers 2
Methods = {methodNumber 2 entryNumber 2}
Pointers = {pointerOffset 4}
     

There are 3 different Base allowed:

no base
module and entry are equal to -1
local record
module equal to -1; entry is the offset in the constant section of the pointer to the base type
imported record
module is an index in the table of imported modules; entry is the fingerprint of the record (to be found and checked in the export section of the module)1

Reference Section

The reference section is used by the Oberon trap handler to display the values on the stack when the trap occured. With some extensions (i.e. object types) it may be also used for some meta-programming.

Reference = 8BX {ProcRef}
ProcRef = 0F8X offset name {VarMode VarType [dim] offset name}
VarMode = Direct | Indirect
VarType = Byte | Bool | Char | SInt | Int | LInt | Real | LReal | Set | Pointer | Proc | String
Direct = 1X
Indirect = 3X
Byte = 1X | 81X
Bool = 2X | 82X
Char = 3X | 83X
SInt = 4X | 84X
Int = 5X | 85X
LInt = 6X | 86X
Real = 7X | 87X
LReal = 8X | 88X
Set = 9X | 89X
Pointer = 0DX | 8DX
Proc = 0EX | 8EX
String = 0FX
     

A $VarType \geq 80X$ means an array of the given type; in this case the number of dimensions must follow the type. Open Arrays have dimension 0.

Oberon Kernel System Calls

This appendix has been contribued by Pieter Muller. Many thanks.

TYPE ProtectedObject = POINTER TO RECORD END;  (* protected object (10000) *)
TYPE Body = PROCEDURE (typetag: LONGINT; self: ProtectedObject);
PROCEDURE CreateActivity(body: Body; priority: LONGINT; flags: SET; obj: ProtectedObject);

Create a thread associated with the active object SPMquotobj". SPMquotbody" contains the body method of the active object. SPMquotpriority" and SPMquotflags" are the values specified in the annotation of the body. This call is generated by the compiler when an active object (a SPMquotPOINTER TO RECORD" variable with a SPMquotBEGIN-END" body) is allocated with SPMquotNEW". First the object is allocated as usual, then its initializer is called (if defined), and then SPMquotCreate" is called to activate it.

The call creates a new thread that has the body entry point as initial instruction pointer. A stack is set up for the local variables of the thread, in such a way so that return from the body will terminate the thread. A stack overflow will cause an extension of the stack, or a trap if no more memory is available to the thread. Any trap will cause the thread to be either restarted at the body, or terminated (this depends on SPMquotflags"). It is not (yet) defined what happens to the locks that are held by a trapping thread. Priority levels are not yet defined. The thread of an active object will anchor the object until it terminates. After that the object may remain anchored by other references to it, but it can not become active again. It can remain in the system as a protected object.

PROCEDURE Lock(obj: ProtectedObject; exclusive: BOOLEAN);

Lock protected object SPMquotobj". The compiler generates this call at the entry to a method with the SPMquotEXCLUSIVE" or SPMquotSHARED" annotation (called a protected method). SPMquotexclusive" indicates which is the relevant case. Only one thread can lock an object exclusively, and many threads may obtain a shared lock when no exclusive lock is held. A thread is not allowed to re-enter its own exclusive region. Any object with exclusive or shared methods (also if only in an extension) can be locked. The compiler must indicate this in the type descriptor of the object, so that the object can be allocated with the relevant header. (As an approximation, any object with methods may be treated as a protected object). Shared locks may be implemented identical to exclusive locks in the simplest case.

PROCEDURE Unlock(obj: ProtectedObject; dummy: LONGINT);

Unlock protected object SPMquotobj". The compiler generates this call at the exit of a protected method. The relevant lock is released. (The SPMquotdummy" parameter is a placeholder to be used or removed later).

TYPE Condition = PROCEDURE (slink: LONGINT): BOOLEAN;
PROCEDURE Passivate(cond: Condition; slink: LONGINT; obj: ProtectedObject; flags: SET);

Passivate the current thread until some condition becomes true. The compiler generates this call for the SPMquotPASSIVATE" statement. The boolean condition is compiled in a separate procedure which is logically nested in the scope where the passivate resides, and which returns the boolean result of the expression. The static link value to that scope is passed in the SPMquotslink" parameter. This value is used when calling the condition procedure, so that it can access the variables of the enclosing scope. SPMquotobj" points to the object instance containing the passivate statement. Bit 0 of the SPMquotflags" parameter is set if the compiler detects a `global' condition, i.e. a boolean expression with function calls or reference to non-local variables.

PROCEDURE NewRec*(VAR p: SYSTEM.PTR; typetag: LONGINT);

This call is generated for the SPMquotNEW" procedure with a SPMquotPOINTER TO RECORD" parameter. SPMquottypetag" is a the address of a type descriptor for the specified record type. From the type descriptor can be learned if the relevant object is a protected object, in which case a heap block with the required protected object header is allocated. (It would be advantageous to have the compiler generate a separate kernel call for this case). SPMquotp" returns the allocated pointer value.

PROCEDURE NewArr*(VAR p: SYSTEM.PTR; elemTag, numElems, numDims: LONGINT);

This call is generated for the SPMquotNEW" procedure with a SPMquotPOINTER TO ARRAY OF" parameter, where the array elements are pointers or records containing pointers. SPMquotelemTag" is the address of a type descriptor for the element record type, or 0 in the case of an array of pointers. SPMquotnumElems" and SPMquotnumDims" indicate the total size and number of dimensions of the array. The array is allocated with a special header where the sizes of the different dimensions are stored. These fields are initialized by code generated by the compiler after the kernel call.

PROCEDURE NewSys*(VAR p: SYSTEM.PTR; size: LONGINT);

This call is generated for the SPMquotSYSTEM.NEW" procedure to allocate a block of memory that does not contain any pointers that have to be traced by the garbage collector. It is also used for the SPMquotNEW" procedure with a SPMquotPOINTER TO ARRAY OF" parameter, where the array elements do not contain pointers.

Kernel call numbers:
244 -
245 -
246 Unlock
247 Lock
248 -
249 Passivate
250 CreateActivity
251 NewArr
252 NewSys
253 NewRec

   
Zero Compression of strings

PROCEDURE WriteString(VAR R: Files.Rider; VAR s: ARRAY OF CHAR);
  VAR i: INTEGER; ch: CHAR;
BEGIN
  i:=0; ch:=s[i];
  IF  ch=0X  THEN  Files.Write(R, 0X); RETURN  END;
  WHILE  (ch#0X) & (ch<7FX)  DO  INC(i); ch:=s[i]  END;
  IF  i>1  THEN  Files.WriteBytes(R, s, i-1)  END;
  IF  ch=0X  THEN  Files.Write(R, CHR(ORD(s[i-1])+80H))
  ELSE  
    IF  i>0  THEN  Files.Write(R, s[i-1])  END;
    Files.Write(R, 7FX);
    REPEAT  Files.Write(R, ch); INC(i); ch:=s[i]  UNTIL ch=0X;
    Files.Write(R, 0X)
  END
END WString;

PROCEDURE ReadString(VAR R: Files.Rider; VAR s: ARRAY OF CHAR);
  VAR i: INTEGER; ch: CHAR;
BEGIN i := 0;
  LOOP Files.Read(R, ch);
    IF ch = 0X THEN s[i] := 0X; RETURN
    ELSIF ch < 7FX THEN s[i]:=ch; INC(i)
    ELSIF ch > 7FX THEN
      s[i] := CHR(ORD(ch)-80H); s[i+1] := 0X; RETURN
    ELSE (* ch = 7FX *) EXIT END
  END;
  LOOP Files.Read(R, ch);
    IF ch = 0X THEN s[i]:=0X; RETURN
    ELSE s[i]:=ch; INC(i) END
  END
END ReadString;



Footnotes

... module)1
In fact this is not really needed, because for every imported record, a pointer to it is created in the local constants section. The module number could be safely ignored and entry be the offset in the constants; the fingerprint would be checked in the use section when fixing the reference to the td.


reali@inf.ethz.ch