I recently had a run-in with the driver for a DVB-S2 card; you know, one of the things that you can connect to a satellite dish and use to watch TV with. (I wanted to use it to demodulate and decrypt an entire transponder and blast each channel out over multicast.) The problem in question was that I couldn't get it to decrypt channels in a stable fashion, especially with SMP, and some searches found out that this was a common problem. So, I thought I'd get my hands dirty and dig into the kernel source, and I thought it would be interesting to write down some of my findings, primarily because it gives some insight in how Linux deals with typical hardware these days. (As usual, I might be completely wrong; if I were an expert in these matters, I probably wouldn't find this interesting enough to blog about!)
The card's product name is Terratec Cinergy DVB-S2, but internally it's called Mantis, as are many other cards that are essentially clones with different PCI IDs. There is already a Mantis driver in Linux mainline (since 2.6.33), but there are also as far as I know two different out-of-tree drivers (one of them is the same as the one that is merged into mainline, but I don't know to what degree they have diverged).
When you get down to the driver level, it's actually less useful to think of the device as “a card”; it's more of a collection of devices bound together by some glue:
- First, there's the STB0899 frontend, which is the actual tuner taking in the DVB-S2 signal and decoding it. This is used in several different DVB-S2 cards; the reference guide is, unfortunately, only available under NDA, and it is reportedly a highly complex beast.
- Second, there is a device just called the “RISC”, which indeed seems to be a sort of RISC CPU—you send it extremely primitive “programs” to make it initiate DMA transfers to the host CPU whenever there is data to send, raise IRQs, and so on.
- Then, there is an I²C controller for low-bandwidth control communication, e.g. to twiddle registers on the frontend.
- Also, there's a DVB-CI (Common Interface) slot, which is a standardized slot for the CAM (Conditional Access Module) if you have one. The CAM is typically specific to the cryptosystem in use, and stands for the decryption. You typically insert a smartcard in the CAM (which identifies your subscription), which supplies it with cryptographic keys as needed. You can then ask the card to route the DVB bitstream coming from the frontend through the CAM for decryption (and you typically want to, as there is little interesting unencrypted content).
- As if this were not enough, there's a serial port controller (UART) hooked up to an IR receiver, since people who want TV frequently like to use remote controls.
- And finally, there's the Mantis PCI bridge itself, which connects to the PCI bus and allows you to communicate (indirectly) with all of these components.
Now, if you were writing a Windows driver, you'd need to write code for all of these. However, in Linux, typically component drivers are shared; the STB0899 frontend already has a stable driver in Linux, there's an entire I²C subsystem that the STB0899 driver talks to, there's a CAM subsystem that knows how to talk to and poll CAMs (if you can give it some hooks to talk to), there's an IR subsystem, and so on. And of course, there's a standardized DVB subsystem so that userspace largely doesn't need to care what kind of card it's talking to. (If you're a card manufacturer, perhaps this is not really a positive, though, as you can no longer distinguish your product through included software in the same way. I'm sure Windows has some sort of standardized DVB subsystem too, but bundled software is probably a much more visible part of the product for most people.)
So that leaves largely the code for the glue, actually, which means that there's a whole lot less code to worry about. The Mantis driver is about 4000 lines of code, which covers at least seven different designs (of which many are for cable and not satellite); in comparison, the driver for the STB0899 frontend alone is about 3200. But let's take a look at how the PC communicates with the Mantis card.
Like much other modern hardware, the Mantis card is generally operated by memory-mapped I/O (MMIO). Basically, you map a portion of the address space to the card, and when you read from or write to that a command goes over the PCI bus, which essentially is treated as a command. (In other words, the given memory doesn't operate as regular RAM in any reasonable way.) Short of the DVB stream itself, which is DMAed, there's really nothing that needs a lot of bandwidth in here, so one can deal with a pretty simplistic scheme.
The way you communicate with anything, say, the CAM, is thus to do
it one byte at a time. There's no such thing as synchronous I/O when
you're talking about the kernel (that's really just an illusion
anyway), so you'll need some communication back and forth.
So, you poke an MMIO address with the first byte, and then after a
while (say, a few microseconds) the hardware will have processed
that byte and an “operation done” bit will show up in a status
register (MANTIS_GPIF_STATUS
).
However, reading that status register in a busy loop until the operation is
done is wasteful; for one, the host CPU could perhaps be doing something more
useful, and besides, every poll creates traffic across the PCI bus which can
take up space you'd want for the DMA operations (or for other cards).
So instead, you can ask (by poking a different MMIO port, namely
MANTIS_GPIF_IRQCFG
) that the card raise an interrupt line
whenever there's a change in the status, and then go to sleep (ie., let
some userspace process or other kernel thread use the CPU). When the card
signals the IRQ line, the CPU jumps directly into the interrupt handler,
where you can poll away to your heart's intent. It's a bit roundabout,
though; first, you poll MANTIS_INT_STAT
which contains a lot
of other status bits (including “new data ready for DMA”, which is a bit
set by the RISC mini-programs described above), and if
MANTIS_INT_IRQ0
is set, you should go poll
MANTIS_GPIF_STATUS
, since it's probably changed.
(I guess this is related to how the devices are wired up internally
on the card, in that IRQ0 really is the DVB-S2 frontend asserting an IRQ
on the Mantis bridge. IRQ1 is for the serial port. Note that 0 and 1
refers to the IRQ lines on the Mantis bridge, not on your PC.
For instance, the Mantis card has only one IRQ on my PC, namely number 20.)
Now, you don't really want to do a lot of work in your interrupt handler, mainly due to latency issues (interrupt handlers are never preemptible), so after polling and determining that the IRQ0 bit is set, the interrupt defers to a work queue, a kind of worker thread, which can run at leisure, on another CPU if needed. (There are also tasklets, which are similar, but have different guarantees. The Mantis driver uses tasklets to process data coming in from DMA.) The worker thread processing the work queue, when it eventually is scheduled, will read the GPIF status, figure out that the operation is done, and wake up the thread that wrote the byte in the first place. And then it can poke the next byte, and so on, until it's done. Fortunately the PCI bus is pretty fast, and we only need to poke something like four or five bytes every now and then (plus some length bytes), so it doesn't take long. Reading data works in a similar fashion; you poke an address into an MMIO register, and then after a while the result is placed in another MMIO register, the IRQ0 status bit is set, and the Mantis IRQ is raised.
This is the theory. Unfortunately, in my case, when trying to poll for new data, often the IRQ0 bit would not be asserted for a long time (several hundred milliseconds), if ever. Seemingly this would happen for some reason when the Mantis bridge was busy spewing interrupts at the CPU for other reasons, like ongoing DMA transfers; it might also be a bug somewhere in the driver causing it to clear out the IRQ0 bit before it's ever read. Anyway, no matter the cause, this would cause components that tried to read from the CAM to think the CAM was dead, and either give up or try to reset it, neither of which is very good. I still don't understand why it seems to happen a lot more often on SMP than on non-SMP; maybe there's a timing issue of some sort. (It would seem there were also some SMP-related bugs in the driver itself, but I think I've fixed most of those.)
So, what's the brilliant fix? Well, it's really a hack taken from the mailing list; reduce the timeout from 500 ms to 2 ms, and if it times out, just ignore the error, go on and hope the data has been put in the right place. It's an unfortunate hack, but in lieu of a proper fix, at least it gives me rock-steady CAM operation. (Due to a bug, timeouts were always handled as a success anyway.)
I've submitted the fixes (or rather, collections of hacks) in a message to the linux-media list; we'll see if anybody finds them interesting.