codeburst

Bursts of code to power through your day. Web Development articles, tutorials, and news.

Follow publication

Risking Other People’s Data By Using Relational Databases

Lance Gutteridge
codeburst
Published in
6 min readJun 25, 2018

Web site hacked — millions of passwords exposed — millions of credit card numbers taken — emails made public are causing major embarrassment.

It’s depressing isn’t it — and scary. Why is this happening? It seems that when one of these intruders gains access to a server they go and look at and copy data at will.

It is like a burglar breaking into a bank and scooping up all the money that was left lying on the counter.

Wait a second—money left lying on the counter?

What kind of bank does that? Isn’t that what they have those big monster vaults for? Sure you can break into a bank building but that’s the easy part. The really valuable stuff is inside a box made of steel a meter thick. Good luck on breaking into that.

The software equivalent of a thick steel bank vault with a fancy combination lock is encryption.

Modern encryption is basically bullet proof (see my article on how much encryption strength is enough). So in software we have the equivalent of massively thick steel vaults.

But the fact is that almost all of the databases on servers are in the clear, i.e., not encrypted — they are not in a vault at all. Anyone who manages to get access to the files on the server can easily read all the data — it’s like a bank leaving your money piled up on the floor overnight instead of putting it into a vault.

This is why security on the internet is so fragile. A server is only as secure as its most insecure program.

Someone breaks into the server and all the data on it —including all of your private information— is all lying there in an easy-to-read database format.

So why don’t we use encryption? Do companies feel that your private data isn’t worth the effort?

The reason why almost no one encrypts their databases is one of the dirty secrets of IT.

It’s all because most of the data is stored in relational databases and relational databases handle encryption so awkwardly that almost no one does it.

That awkwardness is fundamental to the way relational databases work. It is built in.

The root of the problem is that the concept of database is really a 1980's concept that because of the limitations of 1980's technology mixed two fundamental services.

The two major services provided by databases are persistence and data access.

Persistence is the ability to keep data even when the power goes off. As I mentioned in my article on why databases will eventually be a forgotten technology (see Databases The Future Buggy Whips of Software), not having a high-speed persistent memory had bedeviled computing since its very beginnings. So one thing that people use databases for is to guarantee that the data will survive a power interruption (persistence).

The other service that databases supply is Data Access. This is searching for and finding the data that is needed.

The whole database concept, including relational databases, was born in the 1970s and 1980s. In those days high-speed main memory was incredibly expensive — an IBM 360 would ship with 4MB and cost over a million dollars. That’s 1/1,000 of the memory that a really cheap back-to-school special laptop would ship with today.

This meant that data had to be stored on the disk. With the data on the disk when, for example, you wanted to gather a list of employees that satisfied a particular criteria, you had to read a series of disk records to extract employee data and evaluate the criteria.

But think about encryption. If you are looking for things on disk and the disk is encrypted how do you find it? Do you go through and decrypt everything? Disks are slow, and to get acceptable performance there have to be indices — like a phone book to find an actual number. But if the phone book is in the clear so are all those names and they can be read by an intruder. But if it is encrypted then how do you look anything up?

This means that encryption of databases that do their data access directly from disk is incredibly difficult.

Like any encryption scheme you have to manage the keys. That is an administrative chore and, as in all security regimes, it is inconvenient.

There is also a performance hit of somewhere between 30% to 50% when encrypting relational databases. That’s because the code has to keep decrypting data so it can figure out if it’s the stuff that’s wanted.

And to top if off, if you are using a relational database from a major vendor then encryption is a very expensive add-on.

So that is why hardly anyone encrypts their relational database. It’s an expensive pain in the butt that slows everything down.

82% of databases on the public cloud are not encrypted (article here). There are no public numbers for databases on internal (non-public) servers, but I would guess it’s close to 100%. Just anecdotally, in my decades in IT I have never encountered an enterprise system that used relational database encryption.

This is because most IT people rely on perimeter security. That means they try to stop people from getting access to the server’s file system in the first place. But that is just one line of defense.

Everyone knows that security comes from having as many lines of defense as possible.

Banks secure their buildings, but they know that is not enough. That is what bank vaults are for. If someone penetrates the perimeter of the building it is the next line of defense, the vault, that then secures the valuables.

The truth is the perimeter security on servers is totally porous. Servers are penetrated continually with relational database leading that charge by making SQL injection possible (see my article on why relational databases are so bad). So leaving the data lying around in the clear is a predictable disaster. But most companies don’t go to the trouble of putting your data into an encryption vault. They can’t be bothered to change the way they build systems.

Hey, it’s a complicated technical issue that few understand, so why should companies care about it as no one is going to complain. It’s easier just to ask for forgiveness when the inevitable exposure happens.

The technology choices belong to the companies entrusted with your data. If their technology doesn’t accommodate encryption easily and efficiently, surely they are responsible for finding one that does.

It is not that with our modern memory sizes we don’t have other options. With 21st century hardware we just don’t have to expose our data as we’ve been doing. (see my technical article on database alternatives)

Imagine if Sony had had their email store encrypted properly. The intruders may have penetrated the server but it would have made it much harder if not impossible to read the mail.

And what about liability. Imagine if your bank left your safety deposit box on the front counter and some thieves broke in and stole all your documents and valuables. You would hold the bank liable. Leaving your valuables outside the vault overnight just doesn’t meet the standard of care you would expect of a bank.

But most companies do exactly that. They leave your valuable data just lying there in the open.

They are just too lazy and cheap to do the secure thing and figure out better methods of encryption.

So the next time you read about a major security incident with private data exposed to the world you might wonder why that data wasn’t encrypted. You might also wonder, despite all their seemingly earnest words of contrition, how much the company really cared.

If you enjoyed this you may like my book Avoiding IT Disasters that shows a hidden side of enterprise systems that would be valuable for you to be aware of.

✉️ Subscribe to CodeBurst’s once-weekly Email Blast, 🐦 Follow CodeBurst on Twitter, view 🗺️ The 2018 Web Developer Roadmap, and 🕸️ Learn Full Stack Web Development.

Sign up to discover human stories that deepen your understanding of the world.

Published in codeburst

Bursts of code to power through your day. Web Development articles, tutorials, and news.

Written by Lance Gutteridge

Dr. Lance Gutteridge has a PhD in computability theory. Presently CTO of Formever Inc. (www.formever.com) where he architects ERP authoring software.

Responses (14)

Write a response