Cogent – verification made easy

At Trustworthy Systems, we are one of the few places that understand software verification. We’ve built the worlds only verified, general purpose micro-kernel. And many people in the group are core contributors to Isabelle/HOL and CakeML. And, as a group, we not only believe in software verification, we understand what it involves, and also realise that the returns we get from verified software don’t come cheap. With that in mind, we started working on Cogent, with the aim of reducing the cost of verified systems. For those who are keen to read about how we achieve that, I suggest reading the publications section here.

What is Cogent?

Cogent is a restricted, polymorphic, higher-order and purely functional language with linear types and without the need for a trusted runtime or garbage collector.

In the context of Cogent, the adjectives that I used above, mean the following:

  • restricted: Cogent omits certain features that one might come to expect from a programming language. For instance, Cogent doesn’t support general recursion.
  • polymorphic: This is similar to templates in C++ or generics in Rust.
  • higher-order: Cogent uses software components like functions, modules or objects as values.
  • purely functional: Cogent forbids changing state, mutable data. Evaluation is done similar to a mathematical function.
  • linear types: ‘Objects'(although there is no such concept in Cogent, I’m using it to very broadly define instances/variable of a particular type) can be used exactly once. Rust does something similar, I think.
  • trusted runtime: Cogent compiler generates C code, so the trusted component is limited to the C runtime system.
  • garbage collector: Since Cogent compiler generates C code, we would manage memory manually, like we do in any C program.

Being very new to functional programming, language theory and type systems, my level of understanding of the theory behind the earlier mentioned concepts, is still pretty shallow. For an authoritative and in-depth explanation of these concepts, please refer to the Cogent publication list and Liam O’Connor‘s(one of the compiler writers, the other being Zilin Chen) blog, where he occasionally writes about the compiler and type system.

As mentioned earlier, one of the key ideas behind developing Cogent is to reduce the cost of verifying software. Since we are a systems engineering research group, our focus is particularly on reducing the cost of verifing system software. And that brings us to the next key question.

What can Cogent be used for?

I want to respond to the question with the answer, “For anything”, but I would be lying. If you intend to write a program in Cogent that prints a  “Hello, World”, I can tell you that it isn’t possible without using C code, since we don’t have IO support in Cogent. Infact things that are simple to write in other programming languages, loops for example, have very complicated usage patterns in Cogent.

I think, for any compiler to be good enough, it takes a long time and a lot of man-years(Look at gcc, llvm/clang, rustc and various other mature compilers). In comparison, Cogent is still in its infancy and at our peak, we had about 4 compiler developers(We have 2 people working on the compiler now). Having said that, the Cogent compiler does let us write verifiable Linux file system drivers, and it does some parts of it well. Infact we have implemented 3 filesystems(at various levels of completion) in Cogent – bilbyfs, ext2 and vfat.  All of the Cogent source is available on Github.

Our eventual goal is to have complete implementations of these(and more) filesystems along with other parts of the system software stack, viz., networking code, drivers and the like. Currently the implementations we have in Cogent have varying degrees of performance costs(based on the usage, as highlighted in the Cogent paper), but we are working towards making sure that we don’t lose out on performance,  although we aren’t there yet.

Why?

I’ve already explained why we are working on Cogent, and what our goals are. And if I have piqued your interest enough, I suggest that you got read some of the papers in the publication list.

What I want to mention is why I’m writing about this now. Two reasons. One, I think writing about it will give me a lot more clarity about what I’ve personally done so far, and what I want to do with the project going forward. And two, I want to be able to talk about Cogent from an engineers perspective. When I started working on this project a year ago, I was overwhelmed. Having mostly used imperative languages for as long as I remember and having primarily done systems programming, everything looked daunting. It was a steep learning curve trying to understand functional programming, Haskell, software verification. And after having spent a year trying to implement ext2, f2fs and contributing a little to the compiler, which is written in Haskell(I forgot to mention that earlier), I’ve started to appreciate these ideas a lot more. I have barely scratched the surface, but I’m very enthusiastically digging deeper.

My idea is to write several posts about using Cogent and this post is meant to be a curtain-raiser for the posts(time permitting) that follow.

One last thing. Liam gave a talk titled, Refinement through Restraint: Bringing Down the Cost of Verification  at ICFP 2016, the video of which is available here. The 26 minute talk is both entertaining and informative and it is worth a watch.

seL4 now runs on Pi3

Very recently, we(the Trustworthy Systems group) merged Raspberry Pi 3 support for seL4. This excellent blog post by Kofi already highlights why this is great news. Personally, it is exciting because, a Raspberry Pi 3 is a lot more accessible than all the other boards we’ve previously ran our kernel on. I own one too, and it is absolutely amazing that I can run, muck around and put my Raspberry Pi 3 to some good use. I’ve been playing around with seL4 on my Pi3. The things that I’m currently trying to do are:

  • Get Haskell runtime on seL4. Some might say this is an exercise in futility, but this will help me get Cogent build on seL4, and then I can start building verified systems software(currently file systems) on top of seL4. Ofcourse, this involves a lot more than just the filesystem implementation(need a VFS layer, block cache, drivers etc.). It might be a long shot, but hey, I’m trying!
  • The OS that I’ve been trying to build, forever(Who hasn’t tried to build one, really?). With seL4, the kernel is taken care of, letting us build all the fun stuff on top(Students of the AOS class at UNSW do exactly this).

Good times!

Printing bits of a byte in C

Printing the bit representation of a byte is a very simple and interesting problem. There are numerous ways this can be done in many different programming languages out there. In C, one would do something like this for signed integers:


void print_bits(int num)
{
size_t bits = sizeof(num) * 8;
int max = 1 << (bits – 1);
int i;
for (i = 0; i < bits; i++) {
printf("%u", (num & max) ? 1 : 0);
num = num << 1;
}
printf("\n");
}

What exactly is this function doing?

Most of the code is pretty obvious. The key portions of the code are in lines 3 and 9.

Line 3, calculates the maximum possible positive value for the given type. In the example above, the given type is an int, and on my machine running GNU/Linux(x86_64),  the number of bits in an int is 32 and its maximum possible positive value for 2147483648.

The binary representation(shown in 4 chunks of 8 bits) of 1 on a 32 bit machine is:

00000000 00000000 00000000 00000001

which when left-shifted 31 times(line 3 in the above code), becomes(notice how the digit 1 has moved from being the LSB to being the MSB):

10000000 00000000 00000000 00000000

whose decimal value is 2147483648.

Having got the max value, we then bitwise-and(&) it with the give number(represented by num) and check the result(line 8). The bitwise-and will yield a number greater than 0, if there is a 1 in the MSB and will yield a 0 otherwise. Based on the result of this operation, we either print a 1 or a 0.

In each iteration, we left-shift  the given number by 1 bit(line 9). We continue to iterate until we’ve traversed all bits.

We can modify the function print_bits() to print the output in a much more readable fashion:


#include <stdint.h>
#include <limits.h>
#define numbits(x) (sizeof(x) * 8)
#define max_signed_value_of_type(t) \
(INTMAX_MAX >> (numbits(intmax_t) – numbits(t)))
void print_bits(int num)
{
size_t bits = numbits(num);
int max = max_signed_value_of_type(num) + 1;
int i;
for (i = 0; i < bits; i++) {
if (i && (i % 8) == 0)
printf(" ");
printf("%u", !!(num & max));
num = num << 1;
}
printf("\n");
}

We can also implement a function to count the number of 1’s and 0’s in the binary representation of a given number:


#include <stdint.h>
#include <limits.h>
#define numbits(x) (sizeof(x) * 8)
#define max_signed_value_of_type(t) \
(INTMAX_MAX >> (numbits(intmax_t) – numbits(t)))
void count_bits(unsigned int num)
{
size_t bits = numbits(num);
unsigned int max = max_signed_value_of_type(num) + 1;
int i;
unsigned int t = num;
int zeroes = 0, ones = 0;
for (i = 0; i < bits; i++) {
if (!!(num & max))
ones++;
else
zeroes++;
num = num << 1;
}
printf("%u has %d 0's and %d 1's in its binary representation.\n", t, zeroes, ones);
}

view raw

count_bits.c

hosted with ❤ by GitHub

The count_bits() function above for input number 2 which in binary is 00000000 00000000 00000000 00000010 we get an output:

2 has 31 0’s and 1 1’s in its binary representation.

 

This post is a result of a discussion with my nephew who recently took to programming and I was trying to explain bits and bytes to him. Although C is probably not the right language when introducing a high-schooler to programming, I thought it is the language that expresses bits and bytes well, and also, it is the language that I know the best.

K&RIP!

Dennis Ritchie passed away.

It is indeed a sad day.

I started using computers when in university (I was 18 then). And in the last 14 years, I have been using both C and Unix (and its variants, Linux mostly). And if I look around, most of the devices that I have, including but not limited to, my laptop, my phone, my desktop at work, use C and Unix (Linux in this case). I earn my 'daily-bread' as a C programmer, I have enjoyed being a C programmer. I just love the flexibility that it provides and its sheer simplicity. I've _always_ had a copy of The C programming language with me. And that is the book that I always keep going back to, between jobs (for interviews) and occasional reference.

I owe it to Dennis Ritchie big time.

My heartfelt condolences to his family and friends. May his soul rest in peace.