AOB + KIR
last update 06/11/98
comments to hainaut@cfht.hawaii.edu.
The different parts of the system are controlled by different programs,
so here comes just a rough description of the basic lay-out of the
aobir session,
to help to understand better the error messages, and to nail the source
of trouble faster.
Neptune host the aobir session:
1/ AOB
-
Non real time commands (move the WFS, change the neutral density,...) are
initiated by running aopch, then passed to aopcserver, then to the Prolog,
then to the optical bench.
So, if there is a problem with the AOB slide / neutral density filter
/ shutter / WFS / ...,
restart aopcserver (OCS Configuration window) and then aoserver (AOB
setup window).
-
Real time commands (correction) are initiated by running aoh, passed to
aoserver, and then to the RTC.
So, if there is a problem with the aoserver or RTC,
kill aoserver, although the processes may not be there anymore already,
and restart aoserver (AOB setup window),
if this does not solve the problem, reset
the RTC.
2/ DetI
-
DetI runs on a Sparc1E (cassini or dione) and receives commands from Neptune
through Director, it uses the Leach controller (VME, Timing and Utility
boards) and accesses directly the VME memory.
So, if there is problem with taking exposures and saving them,
first try >agent restart in DetI window, and the system may
recover by itself.
But troubles may also be due to communication problems with the 3U
crate, cassini crash..., see DetI
Problems below.
3/ KIR
If there is problem with the KIR filter wheel, restart kirfserver (OCS
configuration window).
Note that the filter wheel controller initialization takes a while,
and the window showing the progress of the restart often hangs in the meanwhile,
just quit that window and ping to check the initialization is successful.
4/ TCS
AOB Problems
First make sure there really is a problem and that all error messages are
not only the consequence of acting too fast.
Example, when there is a sequence of error messages like:
aoh: completing unsuccessfully.
aobclosetheloop: aoh command 'aoh -observeloop' failed.
aoserver: (main 789) sequence: observe_loop, task: stop_closed_loop,
optim failed ; Task not known by the optim process of the RTC .
aoserver: (main 662) sequence: , task: , central failed ; Task not
known by the central process of the RTC .
aoserver: (main 849) message 908 from traffic control ignored.
aoserver: (main 789) sequence: , task: , optim failed ; Task not known
by the optim process of the RTC .
aoserver: (main 662) sequence: , task: , central failed ; Task not
known by the central process of the RTC .
Followed 5 minutes later by:
aoh: completing unsuccessfully.
aobclosetheloop: aoh command 'aoh -passthru stop_closed_loop' failed.
xform: Execution of 'aobacquireguidestar' failed (1).
The "5 minute later" messages can be ignored. They are a consequence
of the first messages. The first set of messages are typical of what
happens when the observer does things too quickly. In this case the
correction was not on when a stop correction command was received. (Note
"sequence: , task: ,". The blank sequence and task names indicate
an interaction between tasks.)
Other examples, stop correction right after the end of execution of
a script, when the prompt has just reappeared in the xterm window; try
to start correction while not properly guiding in Cass yet (it is never
necessary to guide in Cass first, although it is a good precaution to have
an off axis star ready as a reference to recenter the field, before to
try to start correction in questionable conditions), ...
Reset the Real Time Computer (RTC)
-
Kill AO Servers, in AOB Setup window.
-
Reset the RTC (Puelolo and Pueomaka) and WAIT three minutes.
-
Start AO Servers, in AOB Setup window.
-
WAIT until the AOB window disappear.
The RTC has to be reset every beginning of night, before to open
an "aobir" session on Neptune (save steps and time).
-
Reset the RTC (Puelolo and Pueomaka) and WAIT three minutes.
-
Log on Neptune as "aobir".
-
WAIT until the AOB window disappear.
-
"go to observe configuration" in "Adaptive Optics Bonnette"
window, WAIT.
Difficulties to stop correction
-
The correction is primarily started with the object placed on the hot spot;
if for any reason, the correction was interrupted with the object at
an offset from the hot spot,
the WFS has to be brought back to central position, in order to start
correction on the hot spot again.
"Recenter on Hot Spot" in "Adaptive Optics Bonnette".
-
There is no need to recenter on the hot spot if you are in the exact same
conditions of observation!
If you had to stop the correction and / or reset the RTC in the middle
of a mosaic for example, correction can be started again as long as Telescope
and WFS offsets still match, if the Telescope has to be recentered on the
object because of drift, then the WFS must also be recentered on the hot
spot.
-
A reset of the RTC does not affect the WFS position.
-
If there is very little flux on the APD, check the neutral density filter
(you have to stop correction) and the gain.
Problems with AOB Prolog commands
DetI problems
-
"warning: SAO_IMAGE EXIT ON ERROR: Could not connect
to display: "
The program is trying to send the image to Pegasus
too,
so just open SAOImage window in Pegasus/Image.
-
"detio exiting: lost parent process deticli" or similar message, >agent
restart to recover.
-
Error message indicating that the deti.par file cannot
be used by the program, type ".rm ~aobir/.,deti.par" in an xterm window
and then >agent restart in DetI window to restore the latest version
of the parfile.
-
If DetI hangs during an acquisition sequence, >break
is usually enough to restart it,
which means that, in this particular case ,the
command break won't actually interrupt the sequence after the current exposure.
If it does not work, >agent restart usually helps, if
a data cube was being taken, the data can be recovered using the following
commands in an xterm window, make sure you are in aobir directory, "rcp
cassini:/usr/local/img/######o.fits ." which will copy the file in
your directory, cassini being the detector host in this example (check
the detector host at the top of DetI window) and ######0.fits the filename
for the image being taken when it hung.
-
If problems with "DetI Exposure Form",
click on detector in Pegasus menu
will reset it, use link to reconnect if necessary, you may have to kill
the window first.
-
Perform two snaps too fast one after the other will
provoke a core dump, >agent restart to recover.
Unless perfectly explicit error messages point at the solution:
-
>agent restart is usually the first thing to try to fix DetI related
error messages.
-
Check the VME board, the Timing Board and the Utility board status:
-
>mode engineering,
-
>tdl, will check that DetI can communicate with the 3U crate,
-
>ids, will give the three boards status, should indicate KIR, otherwise:
-
dewar not connected
-
error status on the VME board, see below.
-
error status on the Timing or Utility board,
-
>mode observing, not to forget as observers are not supposed to
have access to those commands.
Communication problems with DetI:
When KIR seems hung, and in particular when the message "director: Can't
communicate with VME Board (CDMA not synced)" appears in the feedback window,
use the following sequence, stopping when you get a succesfull
-
>agent restart
-
If restart followed by the same error message "director: Can't communicate
with VME Board (CDMA not synced)", follow the procedure below to reboot
the Sparc1E hosting DetI (in this case described with cassini as an example).
Reboot cassini
cassini and dione are two identical acquisition systems for KIR at the
summit,
so if something goes wrong with cassini, switch to dione is very fast
and minimizes time lost on the sky while troobleshooting cassini.
-
Switch the fiber from cassini to dione on the fiber front panel (there
is only one way to plug the fiber connector).
-
In director window, type >agent start dione:deticli -s, this will
restart DetI on dione, the top of the window should show "aobir - DetI
on dione".
-
>tdl to check you can comnunicate with the controller. in which
case you should get the message "Communication established with VME, Timing
and Utility boards."
-
In case of failure, you will receive the message "Can't communicate with
Timing board." Double check the fiber connection and try >tdl again,
time to call if still unsuccessful...
Note that the communication is checked anytime you restart the agent,
so you will receive one of the two messages above.
-
In an xterm window (can be on mars),
-
"telnet termsrv2 5005" (5005 for Tethys or cassini, 5004 for dione, this
will give you the console with lots of additional information on screen
during the reboot, that may be useful in case it isn't successful)
-
login as root on cassini
-
"fasthalt"
-
go reset cassini on the CPU board in the control room
-
wait for the end of the reboot ("cassini login:" appears again)
-
control-[ to quit
-
restart DetI with OCS (feedback)
-
>tdl, if still not up, call detectors or Jim
-
>snap, if does not work, call detectors or Jim
Same procedure from dione to cassini if dione fails.
If none of the above is the cause of trouble, call detectors.
Noisy Images
-
A software reset is unlikely to fix it,
-
>mode engineering,
-
>ids, will give the three boards status, should indicate KIR, otherwise:
-
dewar not connected
-
error status on the VME board, see below.
-
error status on the Timing or Utility board,
-
>rsthdr, reset the 3U crate, hence the Timing and Utility boards,
-
>ids, should always indicate a status error (ERR) for those two
boards after a reset,
-
>preexp, will reload the software for the boards,
-
>ids, should now indicate KIR for the three boards,
-
>snap with a short exposure time to check everything is back to
life,
-
>mode observing, not to forget as observers are not supposed to
have access to those commands.
-
if still noisy, reset the 3U crate from the reset button itself, >tdl,
>snap
-
if still noisy, try another reset of the 3U crate from DetI(>rsthdr),
>ids, >tdl, >snap
-
if still noisy, power cycle the 3U crate, >tdl, >snap and...
you should be on the phone with somebody from detectors by now.
-
Also, to check noisy images,
-
>etype r (reference type)
-
>go (to obtain the reference, one simple reading of the detector)
-
>etype o (object type)
-
>go (to check the image)
The cosmetic defaults on the detector (lower left quadrant) are a good
diagnostic tool, if they are visible, even though it is very noisy, it
means you actually have an image of the detector, if not, there is a severe
problem such as a disconnected cable.
-
In an extreme situation, meaning complete inability to identify the source
of a very high level of noise, and with somebody from detectors on the
phone, take an image in DIM mode,
-
>mode engineering
-
>opt lvb 2 1 (set DIM mode)
-
>go (so you save the image in a file)
-
>opt ltb 1 1 (set to test pattern, will tell you if the data link
is ok)
-
>go
-
>reset (return to normal mode)
-
>mode observing
-
Pixel counts > 64000 don't indicate a saturated pixel but adu around 100,
begin a "new" count at 64000.
-
Taking an exposure while the filter wheel is moving will result in a very
noisy picture.
-
Not a noisy image, but if all the pixels have the same value of 1, the
problem is probably due to the connector to the controller not perfectly
plugged in, if so, the connector is unable to check all tension values
and does not turn them on.
-
power of the controller,
-
unplug and replug the connector,
-
power on the controller
-
snap
If still not working, time to call detectors, the next step will probably
be to power off again, take off the back panel of the controller, unplug,
replug, check the led are properly lit...
Note that sometimes the SAOimage fails too and gives you an image with
all pixels value equal and over 32000, this happens when two images are
taken very close in time (could happen in cube mode with short exposure
time), SAOimage does not have time to load the image before the file is
being erased for the next exposure, and there is no problem with the file
itself.
Prevent trouble
-
Reset of the RTC at the beginning of the night, before opening the aobir
session.
-
In the TOS session, open an xterm window and type director -k aobir
(may be better on Neptune) to get a clone of the observers DetI window.
-
Always make sure the correction has been stopped before to move to a different
target.
-
Always wait for the end of an AOB action before asking for another one.
-
Be patient!