Checked C: Can we make C safe?

Safety of programming languages has long been a topic of research, but it is more and more important today as software is being used in every single part of our life.

While there are tons of programming languages to choose from, depending on what the software is intended to do, there are actually very few options (IMHO) when it comes to industry. As a for-profit company, who needs software developers to run it's business, the top most concern is being able to find people who can code.

So, what are those options? Having worked in a big enough company, I think the options are C, C++, Python, Java, Golang and then some parts of the company doing secluded work writing safe programs in Rust and Haskell.

If you ask me, what is the best language if you need the best performance out of an hardware and can't possibly imagine to write in assembly? C/C++. I personally would never be able to write a production grade program in either of those languages, but tons and tons of  production grade software is indeed written in C/C++.

But, as a security enthusiast, I am even more scared to recommend C/C++ to anyone else. The speed in C/C++ is a result of low level access to manage memory and compute resources. Threads and processes are managed with the lowest level primitives. Jumping to addresses is a manual process with passing around of pointers. The writer knows when they are passing a pointer to an object that which kind of object it is. There is doubt and checks happening. If they are wrong, then program crashes, or worse, continues to go on.

We software developers like to think of ourselves as smart people. We would rarely admit that all the power we get in C/C++ can actually be bad because we are not always smart enough to write perfect code all the time (assuming that we are smart enough to write perfect code at-least 2min in a 10hours session, which is too much to ask). So, a large number of software still continues to be developed in C/C++.

Memory safety is an issue that has plagued low level systems programming languages like C/C++. I recently wrote a blog post on all the different types of memory safety errors that exist in GNU/Linux Kernel. The reason I bring that up here is because GNU/Linux is one of the biggest programs that we not just interact with everyday, all the time, but is also very critical to our lives today. I mean our actual physical lives. There are tons and tons of IoT devices running Linux, machines in factories running lives, Cars running Linux, Smartphones running Linux and what not.

So, naturally, there has been tons of research done to see if we can make C more secure. How do we do that? The basic premise to prevent memory safety related bugs is to add some sort of checks that would recognize objects and pointers to those objects. In technical terms, add some sort of Type Safety in C.

This post was inspired by a recent paper from Microsoft Research, Checked C: Making C safe by Extension. Even before you open the paper, I would like to tell you that this problem has been a topic research for far too long and we don't have an implementation that is widely used, to the best of my knowledge. There have been many papers published (as you can find out from the very first paper of Checked C and Related Work [^1] )

Here is the simplest example from the paper:

void
read_next(int *b, int idx, _Ptr<int> out) {
    int tmp = *(b + idx);
    * out = tmp;
}

The program is quite simple, given a pointer to an integer b, and an index idx, it stores the integer at that index after b into the address pointed to by out.

But notice the new syntax _Ptr<int> to declare the type of out. In normal C, it would usually be void *. By declaring that out has to have an integer, the compiler can add null checks and object type checks.

You can also use _Arr_ptr<T> for pointers that you can do arithmetic on, as compare to the first notation that you can only use for dereferencing.

There are several other type annotations that you can add to your source code to make the compiler smarter.  To me, these look like type annotations in Python. If you don't already know, Python is a dynamically typed high level language. You don't have to declare object  types in Python, but it figured them out eventually. However, since it is an interpreted language, it figures them out at runtime when a int was passed where a str should have been and depending on the error handling your program would crash or behave maliciously.

The recent support for type annotations would allow you to write a code something like this:

def my_func(a: int, b: int) -> int:
    return a + b

This code will be interpreted in the same way as it would without the types, but you can run a type checker on your program to figure out if there is any possible flow which would allow you to pass a string as a to the function my_func.

Why do I compare type annotations in Python with Checked C you ask? Well, the biggest reason for me was it being optional. No one who doesn't want to do it will not have to. That is good and bad. Good because you can gradually start adding type annotations to your source code without having to do a massive move to typed code initiative at your company. Bad because unless the whole source code (including all the libraries) are type annotated, you don't get the full confidence of your program being safe.

Checked C is designed to add pointer types gradually to a huge code base, and only those portions will be checked by the compiler  for safety.

So, why do people don't jump on this obvious opportunity to make their program safe? Well, performance. Bad security may bite you if you are unlucky, there is always a chance where no one looks at your bad security and only use your software the way it was intended for (this assumption is wrong and wrong, increasingly every single day). But if the performance of your application is bad, then no one is every going to use it. What is worse is if you promise and show people a performant application and then decrease it's performance  because you wanted to make it "secure".

No matter how much you try to educate people that software security is important, if your software is un-usable, bad performing of even bad looking visually, there are few people who are going to use it. Those few will be the people like us, security focused software developers, whose day job is to deal with this kind of stuff. I digress.

Checked C promises a improved performance as compared to other solutions  (i.e. lesser slowdown when using it, there is always a slowdown). On an average, the runtime overhead was 8.6%, which is not bad at all (security folks are, okay with anything under 10% performance impact, for research papers at least, but that's okay because it shows promise which can be further improved for even better performance).

What do you I think about this paper? Well, it is definitely interesting, but there have been similar attempts like this before. What is different about them (acc to them) is that they allow compatibility with legacy code that is not type annotated and also that they are working on ways to automatically add types to most of the existing C source code. I believe that is possible because there exists one for Python, check out Monkeytype. Whether or not gains wide spread use (or, even just inside Microsoft) is to be seen with time.

[^1]: Pro Tip, If you want to find out what is special about this paper and don't want to read the whole paper, read the Related Work section, usually at the end before Future Work section. That is meant for researchers to explain why their paper is different from all the other research that has been done in the same field on this very topic.