Introduction to Coq

Anton Trunov (Zilliqa Research)

21.09.2019

What is formal verification?

  • A technique providing means of increasing assurance for the correctness of systems by proving their correctness with respect to a certain formal specification, using formal methods of mathematics
  • In other words: it is the application of deductive proof to programs or to hardware designs
  • Formal ~ having a syntax and may be given semantics

Components of formal verification

  • Specification
  • Implementation
  • Formal proof
  • Checker

Formal specification

  • A faithful mathematical representation of what a program or hardware design was intended to do
  • Specifying systems is hard and is a form of art!

Formal proof

  • A formal proof is a proof in which every logical inference has been checked all the way back to the fundamental axioms (A definition by T.C. Hales)
  • All the intermediate logical steps are supplied, without exception
  • No appeal is made to intuition, even if the translation from intuition to logic is routine
  • A formal proof is less intuitive, and yet less susceptible to logical errors

Why is verification important?

  • Ensure systems are bug-free

    • Therac-25
    • Ariane 5 Disaster, Mars Climate Orbiter, Mariner 1, Patriot missile
    • The Pentium bug
    • The DAO Attack

Why is verification important?

  • Gain an insight about the system at hand

The proof is not absolute

  • A verified system is not "correct" or "dependable" in some absolute sense
  • The specification might not capture what was required for safety, security, etc.
  • An actual computer system might not behave in accordance with even the most detailed mathematical model of it

Using computers to do proofs

  • Formal verification is proof about computers
  • Closely related, but distinct, is the use of computers in proof.
  • Proofs about computer systems are usually highly intricate but not conceptually deep

There is lots of formal systems

  • Not all formalizms are created equal
  • E.g. to expand the definition of the number 1 fully in terms of Bourbaki primitives requires over 4 trillion symbols
  • With formal proofs one wants as much help as one can get

Formal methods techniques

The land of formal methods includes

  • Interactive theorem provers (e.g. Coq)
  • Automated theorem provers (SAT/SMT solvers, …)
  • Specification languages & Model checking
  • Program Logics

What is Coq?

Coq is a formal proof management system. It provides

  • a language to write mathematical definitions,
  • executable algorithms,
  • theorems (specifications),
  • environment for interactive development of machine-checked proofs.

Related systems

  • Lean prover (similar to Coq)
  • F* (used to verify crypto code in Firefox)
  • Isabelle/HOL (simple type theory, seL4)
  • Agda (predicative)
  • Idris (similar to Agda)

Why Coq?

  • Expressive
  • Industrial adoption
  • Mature and battle-tested
  • Lots of books and tutorials
  • Lots of libraries
  • Excellent community

What do people use Coq for?

  • Formalization of mathematics:
    • Four color theorem
    • Feit-Thompson theorem
    • Homotopy type theory
  • Education: it's a proof assistant.
  • Industry: Compcert (at Airbus)

More examples

  • FSCQ: a file system written and verified in Coq
  • Perennial: verifying concurrent storage systems
  • Cryptocurrencies (e.g. Tezos, Zilliqa)

Large-Scale Software Systems

Project Domain Assistant LoC
seL4 OS kernel Isabelle/HOL 200k
CompCert Compiler Coq 120k
FSCQ File system Coq 80k
Fiat-crypto Cryptocode generator Coq 65k
Verdi Raft Key value store Coq 50k

FSCQ stats (LoC)

Language files code
Coq 98 81049
C 36 4132
Haskell 8 1091
OCaml 10 687
Python 9 643
   

CompCert C Compiler stats (LoC)

Language files code
Coq 223 146226
C 223 65053
OCaml 147 28381
C/C++ Header 86 7834
Assembly 59 1542
   

Successes of Verified Software

  • "[T]he under-development version of CompCert is the only compiler we have tested for which Csmith cannot find wrong-code errors. This is not for lack of trying: we have devoted about six CPU-years to the task."

Yang et al., PLDI '11

Successes of Verified Software

  • "… none of these bugs were found in the distributed protocols of verified systems, despite that we specifically searched for protocol bugs and spent more than eight months in this process."

Fonseca et al., EuroSys '17

Bugs in verified systems

  • "Using Csmith, we found previously unknown bugs in unproved parts of CompCert—bugs that cause this compiler to silently produce incorrect code."

Yang et al., PLDI '11

Bugs in verified systems

  • "Surprisingly, we have found 16 bugs in the verified systems that have a negative impact on the server correctness or on the verification guarantees."

Fonseca et al., EuroSys '17

Proofs and Tests

  • @vj_chidambaram: Even verified file systems have unverified parts :)
  • FSCQ had a buggy optimization in the Haskell-C bindings
  • CompCert is known to also have bugs in the non-verified parts, invalid axioms and "out of verification scope" bugs

Proofs and Tests

  • QuickChick shows an amazing applicability of randomized testing in the context of theorem proving
  • Real-world verification projects have assumptions that might not be true

Coq, its ecosystem and community

Coq, its ecosystem and community

Coq repo stats (LoC)

Language files code
OCaml 949 203230
Coq 1970 196057
TeX 26 5270
Markdown 22 3362
Bourne Shell 107 2839
   

What is Coq based on?

Calculus of Inductive Constructions

Just some keywords:

  • Higher-order constructivist logic
  • Dependent types (expressivity!)
  • Curry-Howard Correspondence

Curry-Howard Correspondence

  • Main idea:
    • propositions are special case of types
    • a proof is a program of the required type
  • One language to rule 'em all
  • Proof checking = Type checking!
  • Proving = Programming

Proving is programming

  • High confidence in your code
  • It is as strong as strong your specs are (trust!)
  • It can be extremely hard to come up with a spec (think of browsers)
  • IMHO: the best kind of programming

Coq as Programming Language

  • Functional
  • Dependently-typed
  • Total language
  • Extraction

Extraction

Extraction: xmonad

Extraction: toychain

certichain / toychain - A Coq implementation of a minimalistic blockchain-based consensus protocol

Embedding

  • hs-to-coq - Haskell to Coq converter
  • coq-of-ocaml - OCaml to Coq converter
  • goose - Go to Coq conversion
  • clightgen (VST)
  • fiat-crypto - Synthesizing Correct-by-Construction Code for Cryptographic Primitives

hs-to-coq - Haskell to Coq converter

  • part of the CoreSpec component of the DeepSpec project
  • has been applied to verification Haskell’s containers library against specs derived from
    • type class laws;
    • library’s test suite;
    • interfaces from Coq’s stdlib.
  • challenge: partiality

Suggested reading (papers)

  • "Formal Proof" - T.C. Hales (2008)
  • "Position paper: the science of deep specification" - A.W. Appel (2017)
  • "QED at Large: A Survey of Engineering of Formally Verified Software" - T. Ringer, K. Palmskog, I. Sergey, M. Gligoric, Z. Tatlock (2019)

Suggested reading (books)