I'm doing a code review for our partner in the current project. In a recent
integration test, we found a problem.
An overview of the equipment. We're talking about software running on a
PC104 device (basically a small PC) running Linux. This PC104 has a bunch of
serial ports.
They have a custom I/O card, let's call it Raba1. This card has two serial
ports. The software reviews the incoming data from port TTY2, thinks about it and
then sends the appropriate commands on the TTY1 port. Problem is, sometimes
the software just halts for a minute. It does nothing while actually data was
coming in.
One thing that can go wrong here, has its root in the structure of the
software. It basically runs in a big loop, at the start of which is a select()
statement which monitors the two serial ports and the internal command channel
for input.
A theory is that it's missing data because the loop isn't fast enough. It
could happen that we send commands to the software through the internal
command channel and while these are parsed and handled, other data comes in
through port TTY2.
What makes this unlikely, is that such buffer overflows can be detected by the
Linux serial device driver. These were watched and were not found to correlate
with the pauses.