Steinar H. Gunderson

Sun, 25 Dec 2016 - Cracking a DataEase password

I recently needed to get access to a DataEase database; the person I helped was the legitimate owner of the data, but had forgotten the password, as the database was largely from 1996. There are various companies around the world that seem to do this, or something similar (like give you an API), for a usually unspecified fee; they all have very 90s homepages and in general seem like they have gone out of business a long time ago. And I wasn't prepared to wait.

For those of you who don't know DataEase, it's a sort-of relational database for DOS that had its heyday in the late 80s and early 90s (being sort of the cheap cousin of dBase); this is before SQL gained traction as the standard query language, before real multiuser database access, and before variable-width field storage.

It is also before reasonable encryption. Let's see what we can do.

DataEase has a system where tables are mapped through the data dictionary, which is a table on its own. (Sidenote: MySQL pre-8.0 still does not have this.) This is the file RDRRTAAA.DBM; I don't really know what RDRR stands for, but T is the “database letter” in case you wanted more than one database in the same directory, and AAA, AAB, AAC etc. is a counter so that a table grows to be too big for one file. (There's also .DBA files for structure of non-system tables, and then some extra stuff for indexes.)

DBM files are pretty much the classical, fixed-length 80s-style database files; each row has some flags (I believe these are for e.g. “row is deleted”) and then just the rows in fixed format right after each other. For instance, here's one I created as part of testing (just the first few lines of the hexdump are shown):

00000000: 0e 00 01 74 65 73 74 62 61 73 65 00 00 00 00 00  ...testbase.....
00000010: 00 00 00 00 00 00 00 73 46 cc 29 37 00 09 00 00  .......sF.)7....
00000020: 00 00 00 00 00 43 3a 52 44 52 52 54 41 41 41 2e  .....C:RDRRTAAA.
00000030: 44 42 4d 00 00 01 00 0e 00 52 45 50 4f 52 54 20  DBM......REPORT 
00000040: 44 49 52 45 43 54 4f 52 59 00 00 00 00 00 1c bd  DIRECTORY.......
00000050: d4 1a 27 00 00 00 00 00 00 00 00 00 43 3a 52 45  ..'.........C:RE
00000060: 50 4f 54 41 41 41 2e 44 42 4d 00 00 01 00 0e 00  POTAAA.DBM......
00000070: 52 65 6c 61 74 69 6f 6e 73 68 69 70 73 00 00 00  Relationships...

Even without going in-depth, we can see the structure here; there's “testbase” which maps to C:RDRRTAA.DBM (the RDRR itself), there's a table called “REPORT DIRECTORY” that maps to C:REPOTAAA.DBM, and then more stuff after that, and so on.

However, other tables are not so easily read, because you can ask DataEase to encrypt a table. Let's look at such an encrypted table, like the “Users” table (containing usernames, passwords—not password hashes—and some extra information like access level), which is always encrypted:

00000000: 0c 01 9f ed 94 f7 ed 34 ba 88 9f 78 21 92 7b 34  .......4...x!.{4
00000010: ba 88 0f d9 94 05 1e 34 ba 88 a0 78 21 92 7b 34  .......4...x!.{4
00000020: e2 88 9f 78 21 92 7b 34 ba 88 9f 78 21 92 7b 34  ...x!.{4...x!.{4
00000030: ba 88 9f 78 21 92 7b 34 ba 88 9f 78 21 92 7b     ...x!.{4...x!.{

Clearly, this isn't very good encryption; it uses a very short, repetitive key of eight bytes (64 bits). (The data is mostly zero padding, which makes it much easier to spot this.) In fact, in actual data tables, only five of these bytes are set to a non-zero value, which means we have a 40-bit key; export controls?

My first assumption here was of course XOR, but through some experimentation, it turned out what you need is actually 8-bit subtraction (with wraparound). The key used is derived from both a database key and a per-table key, both stored in the RDRR; again, if you disassemble, I'm sure you can find the key derivation function, but that's annoying, too. Note, by the way, that this precludes making an attack by just copying tables between databases, since the database key is different.

So let's do a plaintext attack. If you assume the plaintext of the bottom row is all padding, that's your key and here's what you end up with:

00000000: 52 79 00 75 73 65 72 00 00 00 00 00 00 00 00 00  Ry.user.........
00000010: 00 00 70 61 73 73 a3 00 00 00 01 00 00 00 00 00  ..pass..........
00000020: 28 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  (...............
00000030: 00 00 00 00 00 00 00 00                          ........ 

Not bad, eh? Actually the first byte of the key here is wrong as far as I know, but it didn't interfere with the fields, so we have what we need to log in. (At that point, we've won, because DataEase will helpfully decrypt everything transparent for us.)

However, there's a twist; if the password is longer than four characters, the entire decryption of the Users table changes. Of course, we could run our plaintext attack against every data table and pick out the information by decoding the structure, but again; annoying. So let's see what it looks like if we choose “passs” instead:

00000000: 0e 01 9f 7a ae 9e 21 f5 08 63 07 6d a3 a1 17 5d  ...z..!..c.m...]
00000010: 70 cb df 36 7e 7c 91 c5 d8 33 d8 3d 73 71 e7 2d  p..6~|...3.=sq.-
00000020: 7b 9b 3f a5 db d9 4f 95 a8 03 a7 0d 43 41 b7 fd  {.?...O.....CA..
00000030: 10 6b 0f 75 ab a9 1f 65 78 d3 77 dd 13 11 87     .k.u...ex.w....

Distinctly more confusing. At this point, of course, we know at which byte positions the username and password start, so if we wanted to, we could just try setting the start byte of the password to every possible byte in turn until we hit 0x00 (DataEase truncates fields at the first zero byte), which would allow us to get in with an empty password. However, I didn't know the username either, and trying two bytes would mean 65536 tries, and I wasn't up for automating macros through DOSBox. So an active attack wasn't too tempting.

However, we can look at the last hex byte (where we know the plaintext is 0); it goes 0x5d, 0x2d, 0xfd... and some other bytes go 0x08, 0xd8, 0xa8, 0x78, and so on. So clearly there's an obfuscation here where we have a per-line offset that decreases with 0x30 per line. (Actually, the increase/decrease per line seems to be derived from the key somehow, too.) If we remove that, we end up with:

00000000: 0e 01 9f 7a ae 9e 21 f5 08 63 07 6d a3 a1 17 5d  ...z..!..c.m...]
00000010: a0 fb 0f 66 ae ac c1 f5 08 63 08 6d a3 a1 17 5d  ...f.....c.m...]
00000020: db fb 9f 05 3b 39 af f5 08 63 07 6d a3 a1 17 5d  ....;9...c.m...]
00000030: a0 fb 9f 05 3b 39 af f5 08 63 07 6d a3 a1 17     ....;9...c.m...

Well, OK, this wasn't much more complicated; our fixed key is now 16 bytes long instead of 8 bytes long, but apart from that, we can do exactly the same plaintext attack. (Also, it seems to change per-record now, but we don't see it here, since we've only added one user.) Again, assume the last line is supposed to be all 0x00 and thus use that as a key (plus the last byte from the previous line), and we get:

00000000: 6e 06 00 75 73 65 72 00 00 00 00 00 00 00 00 00  n..user.........
00000010: 00 00 70 61 73 73 12 00 00 00 01 00 00 00 00 00  ..pass..........
00000020: 3b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ;...............
00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00     ...............

Well, OK, it wasn't perfect; we got “pass\x12” instead of “passs”, so we messed up somehow. I don't know exactly why the fifth character gets messed up like this; actually, it cost me half an hour of trying because the password looked very real but the database wouldn't let me in, but eventually, we just guessed at what the missing letter was supposed to be.

So there you have it; practical small-scale cryptanalysis of DOS-era homegrown encryption. Nothing advanced, but the user was happy about getting the data back after a few hours of work. :-)

[22:21] | | Cracking a DataEase password

Steinar H. Gunderson <sgunderson@bigfoot.com>