Megatest: debugging.txt at [f6037c5e9e]

File docs/manual/debugging.txt artifact 731079995f part of check-in f6037c5e9e

//  Copyright 2021, Matthew Welland.
// 
// This file is part of Megatest.
// 
//     Megatest is free software: you can redistribute it and/or modify
//     it under the terms of the GNU General Public License as published by
//     the Free Software Foundation, either version 3 of the License, or
//     (at your option) any later version.
// 
//     Megatest is distributed in the hope that it will be useful,
//     but WITHOUT ANY WARRANTY; without even the implied warranty of
//     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
//     GNU General Public License for more details.
// 
//     You should have received a copy of the GNU General Public License
//     along with Megatest.  If not, see <http://www.gnu.org/licenses/>.

Debugging
---------

Well Written Tests
~~~~~~~~~~~~~~~~~~

Test Design and Surfacing Errors
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Design your tests to surface errors. Ensure that all logs are
processed by logpro (or a custom log processing tool) and can be
reached by a mouse click or two from the test control panel.

To illustrate, here is a set of scripts with nested calls where script1.sh calls script2.sh which calls script3.sh which finally calls the Cadence EDA tool virtuoso:

.script1.sh
..............................
#!/bin/bash
code ...
script2.sh some parameters > script2.log
more code ...
..............................

.script2.sh
..............................
#!/bin/bash
code ...
script3.sh some more parameters > script3.log
more code ...
..............................

.script3.sh
..............................
#!/bin/bash
code ...
virtuoso params and switches ...
more code ...
..............................

The log files script2.log, script3.log and the log output from
virtuoso are not accessible from the test control panel. It would be
much better for future users of your automation to use steps more
fully. One easy option would be to post process the logs in downstream
additional steps:

.testconfig
..............................
[ezsteps]
step1 script1.sh
step2 cat script2.log
step3 cat script3.log

[logpro]
step1 ;; some logpro rules
  (expect:required in "LogFileBody" > 0 "Expect this output" #/something expected/)
step2 ;; some logpro rules for script2.sh
step3 ;; some logpro rules for script3.sh

[scripts]
script1.sh #!/bin/bash
 code ...

...
..............................

With the above testconfig the logs for every critical part of the
automation are fully surfaced and rules can be created to flag errors,
warnings, aborts and to ignore false errors. A user of your automation
will be able to see the important error with two mouse clicks from the
runs view.

An even better would be to eliminate the nesting if possible. As a
general statement with layers - less is usually more. By flattening
the automation into a sequence of steps you can use the test control
panel to re-run a step with a single click or from the test xterm run
only the errant step from the command line.

The message here is make debugging and maintenace easy for future
users (and yourself) by keeping clicks-to-error in mind.

Examining The Test Logs and Environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test Control Panel - xterm
^^^^^^^^^^^^^^^^^^^^^^^^^^

From the dashboard click on a test PASS/FAIL button. This brings up a
test control panel. Aproximately near the center left of the window
there is a button "Start Xterm". Push this to get an xterm with the
full context and environment loaded for that test. You can run scripts
or ezsteps by copying from the testconfig (hint, load up the
testconfig in a separate text editor window).

With more recent versions of Megatest you can step through your test
from the test control panel. Click on the cell labeled "rerun this
step" to only rerun the step or click on "restart from here" to rerun
that step and downstream steps.

NOTE 1: visual feedback can take some time, give it a few seconds and
you will see the step change color to blue as it starts running.

NOTE 2: steping through only works if you are using ezsteps.

A word on Bisecting
~~~~~~~~~~~~~~~~~~~

Bisecting is a debug strategy intended to speed up finding the root
cause of some bug.

.A complex process with a problem found in stage "E"
["graphviz", "bisecting.png"]
----------------------------------------------------------------------
include::bisecting.dot[]
----------------------------------------------------------------------

It is common to start debugging where the problem was observed and
then work back. However by inspecting the output at stage "C" in the
example above you would potentially save a lot of debug effort, this
is similar to the feature in source control tools like git and fossil
called biseceting.

Tough Bugs
~~~~~~~~~~

Most bugs in Megatest based automation will be in the scripts called
in your test steps and if you utilize the good design practice
described above should be fairly easy for you to reproduce, isolate
and find.

Some bugs however will come from subtle and hard to detect
interactions between Megatest and your OS and Unix environment. This
includes things like constructed variables that are legal in one
context (e.g. tcsh) but illegal in another context (e.g. bash),
variables that come from your login scripts and access and permissions
issues (e.g. a script that silently fails due to no access to needed
data). Other bugs might be due to Megatest itself.

To isolate bugs like this you may need to look at the log files at
various stages in the execution process of your run and tests.

.A simplified diagram of the stages Megatest goes through to run a test.
["graphviz", "megatest-test-stages.png"]
----------------------------------------------------------------------
include::megatest-test-stages.dot[]
----------------------------------------------------------------------

.How to check variable values and inspect logs at each stage
[width="80%",cols="<,2m,2m",frame="topbot",options="header"]
|======================
|Stage	                    | How to inspect                               | Watch for or try ...
|A: post config processing  | megatest -show-config -target your/target	   | #f (failed var processing)
|B: post runconfig          | megatest -show-runconfig -target your/target | Add -debug 0,9 to see which file your settings come from
|C: processing testconfigs  | inspect output from "megatest -run ..."      | Messages indicating issues process configs, dependency problems
|D: process testconfig for test launch | inspect output from megatest runner | Zero items (items expansion yielded no items)
|E,F: launching test        | start test xterm, look at mt_launch.log      | Did your batch system accept the job? Has the job landed on a machine? 
|G: starting test           | look at your batch systems logs for the process | Did the megatest -execute process start and run? Extract the "megatest -execute ..." command and run it from your xterm.
|H,H1,H2: step exectution   | look at <stepname>.log, <stepname>.html and your own internal logs | Do you have sufficiently tight logpro rules? You must always have a "required" rule! 
|======================

Bisecting megatest.csh/sh
^^^^^^^^^^^^^^^^^^^^^^^^^

Sometimes finding the environment variable that is causing the problem
can be very difficult. Bisection can be applied.

Edit the megatest.csh or megatest.sh file and comment out 50% per
round, source in fresh xterm and run the test.

This idea can also be applied to your .cshrc, .bashrc, .aliases and
other similar files.

csh and -f
^^^^^^^^^^

A common issue when tcsh or csh shells are used for scripting is to
forget or choose to not use -f in your #! line.

.Not good
..............................
#!/bin/tcsh
...
..............................

.Good
..............................
#!/bin/tcsh -f
...
..............................


Config File Processing
^^^^^^^^^^^^^^^^^^^^^^

As described above it is often helpful to know the content of
variables in various contexts as Megatest works through the actions
needed to run your tests. A handy technique is to force the startup of
an xterm in the context being examined.

For example, if an item list is not being generated as expected you
can inject the startup of an xterm as if it were an item:

.Original items table
-----------------
[items]
CELLNAME [system getcellname.sh]
-----------------

.Items table modified for debug
-----------------
[items]
DEBUG [system xterm]
CELLNAME [system getcellnames.sh]
-----------------

When this test is run an xterm will pop up. In that xterm the
environment is exactly that in which the script "getcellnames.sh"
would run. You can now debug the script to find out why it isn't
working as expected.

Similarly in a script just call the xterm.
NOTE: This technique can be very helpful in debugging running of EDA tools in Perl, Ruby, Python or tcl scripts:

.Perl example
..............................
some_code();
$cmdline="virtuoso -some-switches and params ...";
print "$cmdline"; # print the command line so you can paste it into the xterm that pops up
system("xterm");  # this line is added for the debug and removed when done
system($cmdline);
more_code();
..............................

Misc Other Debugging Hints
~~~~~~~~~~~~~~~~~~~~~~~~~~

Annotating scripts and config files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Run the "env" command to record the environment:

env | sort > stagename.log

In a config file:

.megatest.config, runconfigs.config and testconfig
...................
#{shell env | sort > stagename.log}

# or

[system env | sort > stagename.log]
...................

In scripts just insert the commands, this example helps you identify
if "some commands ..." changed any environment variables.:

.myscript.sh
..............................
env | sort > somefile-before.log
some commands ...
env | sort > somefile-after.log
..............................

.Use meld to examine the differences
..............................
meld somefile-before.log somefile-after.log
..............................

Oneshot Modifying a Variable
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To try various values for a variable without mutating the current value

.within a bash shell
..............................
SOMEVAR=123 runcmd.sh
..............................

.within csh
..............................
(setenv SOMEVAR 123;runcmd.sh)

# OR

env SOMEVAR=123 runcmd.sh
..............................

Capturing output from a command
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.Use the "script" utility
..............................
script -c "virtuoso -params and switches ..."
..............................