ETH Oberon system stability

Stability is a prominent highlight of Oberon. There exist however a number of ways to produce an "unrecoverable" system freeze and they are discussed below.

  1. Using the unsafe SYSTEM features or unsafe interfaces of low-level modules (e.g. Kernel) incorrectly.

  2. When a stack overflow occurs. Native Oberon and Bluebottle solve this problem by leaving the bottom page of the stack area in virtual memory unmapped. This causes a page fault when the stack overflows. This is reliable, even for large local variables, since stack initialization is done from top to bottom.

    A stack overflow may happen when passing very large arrays as value parameters in procedure headings. Increasing the stack size (discussed at the end) is a way out, but the best solution is to use VAR formal parameters or pointers.

    As a concrete example, try the following command on your Oberon system to see how it handles stack overflow induced by a recursive Trap call (warning: it may crash):

    MODULE Temp; 
      PROCEDURE Trap*; 
      BEGIN 
        Trap 
      END Trap; 
    END Temp. 
    Temp.Trap
    
    Here are the results of test conducted on the four ETH Oberon systems (latest versions):

    1. Native Oberon:
      TRAP -14  Stack overflow ( 00103FFCH ) (PC Native 20.08.2002) 
      Temp.Trap  PC = 10 
      Temp.Trap  PC = 15 
      ...
      
    2. Oberon for Bluebottle:
      [1] TRAP -14 PL 3 stack overflow Aos 20.07.2002 
      CS:=0000001B DS:=00000020 ES:=00000020 SS:=0000002B CR0=80040031 FPU=00000000 
      EIP=00574012 ESI=00573F70 EDI=8427F694 ESP=84261000 CR2=84260FFC PID=00000001 
      EAX=00898DC0 EBX=0057400F ECX=00000000 EDX=005A8FF8 CR3=02FFE000 LCK=00000000 
      EBP=84261000 FS:=00000000 GS:=00000000 ERR=00000006 CR4=00000000 TMR=00027BCA 
      EFLAGS: cPaZstIdo iopl3 {1..2, 6, 9, 12..13, 16} 
      Process: 18 run 1 2 0057A1B0:AosCommands.Runner AosLocks.ReleasePreemption pc=859 {0, 28} 
      Temp.Trap pc=10 
      Temp.Trap pc=15 
      Temp.Trap pc=15 
      ...
      
    3. ETH PlugIn Oberon for Windows (14.5.2001) running on Windows 2000 Version 5.0.2195 Service Pack 2:
      TRAP stack overflow in thread Oberon.Loop 
      Temp.Trap  PC = 7 
      Temp.Trap  PC = 15 
      Temp.Trap  PC = 15
      
    4. Oberon for Linux x86 crashes (terminates the Oberon process). This bug was analyzed by Günter Feldmann. He found no way to handle a segmentation violation caused by stack overflow in Linux. In other Unix versions known to him, stack overflow handling is not a problem as they support an alternate stack for signal (trap) handling. In the Solaris ports of Oberon, Temp.Trap yields two segmentation violation traps. The second (recursive) trap view contains a notice which informs the user that the first trap was probably caused by stack overflow.

      When he looked for the alternate signal stack in Linux last time (Kernel 2.2, x, glibc 2.2.1), he finally found the procedure "sigalstack". But calling this procedure crashed the program. He hopes that in the near future signal stack will work in Linux too.

    Stack size - Default size and how to adjust it

  3. Unloading a module whereby some procedure variable reference to it still exists. The visual gadget case is a common manifestation of that, as can be demontrated with this sequence of operations:

    Set the caret in a viewer
    Execute Gadgets.Insert BasicGadgets.NewButton ~
    Execute System.Free BasicGadgets ~

    Explanation: The Button's Handler was "stolen". When freeing a Module, Oberon checks if there are still other Modules depending on that module but it cannot currently check if a procedure variable reference to it still exists (e.g. to the Handler of the visual gadget).

    Workaround: Do not System.Free a Module when its Objects are still displayed, since the Handler is called any time a Display.Broadcast (or update message) is sent (which is the case when new Viewers like a TRAP viewer are opened).

    Solution: The aim is to make the system to trap gracefully and not crash. One way to solve the problem is to have a termination handler for each module. This might be true in most cases, e.g. if you have installed a task, or some other upcall that can be uninstalled again. In the case of visual gadgets, this is difficult, because the gadgets are not linked anywhere except in the display space (by design). Even broadcasting a "remove yourself" message to the gadgets of the module will not always work, since the gadget may be in a covered track.

    This situation is handled correctly in Native Oberon. The pragmatic solution is:

    1. Where possible, the module termination handler (installed with Modules.InstallTermHandler) cleans up so that this problem does not occur.

    2. When a frame handler traps, the frame is closed. This avoids recursive traps and stack overflows, which thus avoids system crashes. The code for this is a bit of a hack, but it seems very effective. It is implemented in Viewers.Broadcast, Viewers.Close and System.Trap. In some cases it produces false positives, e.g., sometimes when a command linked to a gadget traps, the (innocent) frame containing the gadget is closed. System.Recall can be used to recall it.

  4. When a "Volume full" situation is encountered - cfr. Partition size considerations.

  5. When Oberon.Text is corrupted, Oberon start-up can end abruptly. This situation can be caused by a typing error during the edition of Oberon.Text. To recover from this situation, the partition hosting the corrupted Oberon.Text must be edited by mounting the file system from another Oberon system. The other system could reside on the same machine, on an Oberon-0 boot diskette or on CD-ROM. Pay particular attention to unmatched "{" and "}" parentheses used for nesting sections in the text.

    Bluebottle also has an AosConfig.XML file which directs the start-up process and it is essential that it is syntactically well formed. A slight mistake could cause the start-up to derail. To enhance the system stability, the system is prepared to switch automatically to an alternate file Save.AosConfig.XML which is supposedly intact, thus giving the user a possibility to mend an error in AosConfig.XML. If the alternate file is corrupted, the same solution as described for Native Oberon must be applied. Both files are identical in the distribution.

[Top]

11 Jun 2003 - Copyright © 2003 ETH Zürich. All rights reserved.
E-Mail: oberon-web at inf.ethz.ch
Homepage: http://www.ethoberon.ethz.ch/