FMOℤ 2025

Home | FMOz 2026 | Previous years

Program

Monday, 2 June 2025



10.00 – 10.45	Gernot Heiser LionsOS: Designed for speed *and** verification* Miki Tanaka Verifying device drivers with Pancake



11.15 – 11.45	Kostyantyn Vorobyov Analysing the Security Posture of Cloud Applications
11.45 – 12.15	Alicia Michael Boogie-Backed Translation Validation for Binary Decompilation
12.15 – 12.45	Aaron Bembenek Inferring Loop Invariants in Lifted Binaries using Constrained Horn Clause Solving



14.45 –15.15	Sadra Bayat Tork BASIL-DSA: Context- and Field-Sensitive Memory Analysis for lifted binaries
15.15 – 15.45	Jack Stodart Interacting with Automatic Verifiers without Looking Under the Hood



16.15 – 16.45	Horacio Mijail Anton Quiles DARUM: diagnosing and pinpointing slow verification in Dafny
16.45 – 17.15	Ralf Huuck About Delivering Formal Methods to Millions of Developers. Weekly.

Tuesday, 3 June 2025

9.00 – 9.30	Ian Hayes & Larissa Meinicke Restructuring a concurrent refinement algebra
9.30 – 10.00	Scott Heiner Cylindric biquantales: taking steps towards strong realisability in hybrid models



10.50 – 11.20	Annabelle McIver Probabilistic Datatypes: Automating verification for abstract probabilistic reasoning
11.20 – 11.50	Luigi Soares A new measure for dynamic leakage based on quantitative information flow



14.00 – 14.30	Robert Colvin Structural operational semantics for functional and security verification of pipelined processors
14.30 – 15.00	Sash Pak Towards Extending Rust’s Type System: a Study of Formal Models



15.30 – 16.00	Zhe Hou An LLM agent for model checking using PAT
16.00 – 16.30	Daniel Cumming Verification of Rust with K!

Abstracts

Gernot Heiser (UNSW)

LionsOS: Designed for speed and verification

This talk will give an overview of the state and roadmap of the LionsOS project.

LionsOS is an seL4-based operating system (OS) designed and built from scratch and aimed at embedded/cyberphysical systems. Its design aims to enabling formal verification without compromising performance, in fact it outperforms Linux on networking and storage benchmarks. This is achieved by fine-grained modularity and adherence to the principle of radical simplicity, which results in individual security-critical modules comprising only a few 100 lines of code. We have already demonstrated verification of a high-performance Ethernet driver (details covered in Miki Tanaka’s talk on the Pancake language). LionsOS is currently deployed for hosting the sel4.systems web site, and we have built a number of other sample systems, including a simple firewall.

Miki Tanaka (UNSW)

Verifying device drivers with Pancake

Pancake is an imperative language for system programming that features a verified compiler and a well-defined, verification-friendly semantics. We now have integrated Pancake into Viper SMT verification infrastructure via an automatic transpiler to establish end-to-end functional correctness for Pancake programs. Using this framework, we have formally verified a performant real-world Ethernet driver. In this talk, we present the driver verification workflow, Viper front-end encodings for Pancake, and the future challenges.

Kostyantyn Vorobyov (Oracle Labs)

Analysing the Security Posture of Cloud Applications

A cloud environment, such as Oracle Cloud, allows to protect a deployed application and its assets using a multitude of security measures including encryption, authentication, identity and access control, network segmentation. However, even though these security measures can in general provide strong security guarantees, configuration of these mechanisms is complex issue. As a result, misconfigured or overly permissive policies can lead to security breaches.

Due to the fragmented nature of security configurations, security analysis of a deployed application is not straightforward. The tools targeting individual components exist, however, they cannot analyse interactions between different policies. To address this issue, we developed an intermediate representation that allows to model the different security policies as a single system and an associated policy language capable of analysing the security posture of a cloud application in its entirety using assertions based on first-order logic.

Alicia Michael (UQ - DSTG)

Boogie-Backed Translation Validation for Binary Decompilation

The problem of verification of source code programs subject to compilation is well-understood. In order for verification outcomes to transfer to the programs at execution we need some assurance that the compilation process preserves our proof. This has been addressed widely in the spaces of compiler verification, and translation validation. On the other hand, binary analysis approaches seek to perform program verification directly at the binary level. This however, incurs the problem of lifting, or decompilation, in order to increase the program context-window available to analysis techniques like abstract interpretation and deductive verification. We again find ourselves needing assurance that the decompilation preserves the behaviour of the original binary program. We present a work that re-uses the Boogie verification infrastructure traditionally used for source-level functional correctness proof, to additionally verify the correctness of decompilation code-transform passes using translation validation or “witness generation” techniques.

Aaron Bembenek (University of Melbourne)

Inferring Loop Invariants in Lifted Binaries using Constrained Horn Clause Solving

Inferring loop invariants is a key step for automating verification and, although loop invariant inference has been widely explored for source code and compiler intermediate languages, little work has been done on inferring loop invariants in lifted binaries, which lack high-level abstractions like program variables and types. Building on an existing framework that translates AArch64 binaries to Boogie code, we explore using Constrained Horn Clause (CHC) solving to infer loop invariants in assembly-like Boogie (“Basm”) programs. On the Basm encodings of binaries for 469 programs previously used to evaluate C loop invariant inference, CHC solving infers loop invariants for (or proves unsafe) 251 unoptimized binaries and 315 binaries compiled with optimizations, compared to 302 source programs solved by a leading C verification tool. Moreover, we show that CHC solving can infer loop invariants for Basm programs that encode sophisticated binary reasoning beyond functional correctness (e.g., rely/guarantee reasoning for information flow security in concurrent code). Finding that simplifications related to memory accesses are key to effective Basm loop invariant inference, we demonstrate how our CHC-based approach can be used to infer and validate memory regions in Basm programs. Our results demonstrate the feasibility of applying advanced reasoning techniques to lifted binaries and point to the potential to automate verification in scenarios where source code is unavailable or binary-level certification is necessary.

Sadra Bayat Tork (UQ)

BASIL-DSA: Context- and Field-Sensitive Memory Analysis for lifted binaries

Sound memory analysis is a requirement for reasoning about correctness of many programs and program transforms. Pointer-analysis, alias-analysis, value-analysis, indirect call resolution, etc. aim to soundly over-approximate memory locations and their values used in program operations. Performing memory analyses on binaries presents an additional set of challenges compared to source code. Lack of distinct memory variables and types prevent direct application of source analysis techniques without enacting significant precision cost. We present BASIL-DSA, a context- and field-sensitive unification-based pointer analysis adapted from Data Structure Analysis (DSA). We introduce a new unification scheme based on incrementally discovering memory access intervals instead of using type information. Additionally, we parameterize the analysis to allow for different sets of assumptions on memory access behaviours.

Jack Stodart ( ANU )

Interacting with Automatic Verifiers without Looking Under the Hood

Automatic software verifiers have an ambitious goal: quick formal verification of software at the level of source code annotations. While successful, modern verifiers are still unable to automatically verify complex real-world programs and when verification fails it can be incredibly difficult to interact with the verifier. Existing verifiers allow for proof obligations to be completed manually, but these tools often use complex logics with several layers of translation, making proof terms not human readable. Instead, we would like ways to extend automatic verifiers with interactive proofs, but at the level of the source code. This talk compares new methods for doing this with a case study: verifying the Bellman-Ford algorithm in Viper, a state of the art automatic verifier.

Horacio Mijail Anton Quiles ( UQ )

DARUM: diagnosing and pinpointing slow verification in Dafny

Dafny is a verification-friendly language that translates its high-level constructs to assertions discharged by automated theorem provers (ATP) like Z3. This automation significantly flattens the learning curve compared to that of interactive theorem provers (ITP), making Dafny a welcoming entry ramp for mainstream developers. However, a common issue encountered by Dafny users is that verification turns *brittle* as projects grow: trivial code changes cause big verification slowdowns of unrelated code. This problem arises from the expanding verification context generated by Dafny, combined with the inherent randomness in the Z3 black box. The official advice is to find a balance between adding context to the code and reducing code size, but the toolchain offers virtually no insight into the process. Here, we show that, by exploiting the randomness and analyzing the resulting verification costs reported by Z3, we can
(1) distinguish brittleness from inherently costly verification;
(2) detect brittleness before it turns into an acute problem;
(3) pinpoint brittleness down to specific Dafny expressions;
(4) rank brittle code by potential for improvement.
To automate these analysis, we introduce our user-friendly tool DARUM, and provide a couple of examples of its application. DARUM offers some light to guide the blind approaches to brittleness management used to date, and its techniques might apply to other toolchains based on Z3 or other ATPs.

Ralf Huuck (UNSW)

About Delivering Formal Methods to Millions of Developers. Weekly.*

Formal methods have the stigma of being hard to learn, sometimes a bit obscure, and with limited industrial applications. This presentation shines a light on some of the positive impact that formal reasoning and logics had in making a real difference to software engineers. In particular, we show that many modern application security tools are based on formal methods-inspired technology, are able to scale for closely defined domains and are deployed frequently with a high impact. We will give a background of our work with commercial static analysis vendors, how we implement security rules for vulnerability detection in domain specific languages and how those security rules get packaged and shipped. Moreover, we highlight some of the opportunities to work in that space and the shifting landscape of skillsets required.

Ian Hayes & Larissa Meinicke ( UQ )

Restructuring a concurrent refinement algebra

The concurrent refinement algebra has been developed to support rely/guarantee reasoning about concurrent programs. The algebra supports atomic commands and defines parallel composition as a synchronous operation, as in Milner’s SCCS. In order to allow specifications to be combined, the algebra also provides a weak conjunction operation, which is also a synchronous operation that shares many properties with parallel composition. The three main operations, sequential composition, parallel composition and weak conjunction, all respect a (weak) quantale structure over a lattice of commands. Further structure involves combinations of pairs of these operations: sequential/parallel, sequential/weak conjunction and parallel/weak conjunction, each pair satisfying a weak interchange law similar to Concurrent Kleene Algebra. Each of these pairs satisfies a common biquantale structure. Additional structure is incorporated in the form of compatible sets of commands, including tests and atomic commands. These facilitate stronger (equality) interchange and distributive laws. This talk describes the result of restructuring the algebra to better exploit these commonalities. The algebra is implemented in Isabelle/HOL.

Scott Heiner ( UQ )

Cylindric biquantales: taking steps towards strong realisability in hybrid models

A classical example of a cylindric operator is existential quantification, which liberates the value of a variable in a single state. While cylindric operators are based in boolean algebras, similar properties have been shown for lattice monoidal structures, such as Kleene algebra. We take this algebraic structure further by specifying cylindric biquantales, and further still by specifying a domain over time that a cylindric biquantale liberates the values of variables. Previously, we have used these cylindric operators over time domains to prove results about weak realisability or feasibility for hybrid systems, where a program can always produce an output for any given input. For this talk, we take some initial steps towards proving strong realisability, which enforces a delay between producing an output given some input. This includes definitions of deterministic commands and δ-guarded commands.

Annabelle McIver (Macquarie University)

Probabilistic Datatypes:
Automating verification for abstract probabilistic reasoning

Luigi D. C. Soares (Macquarie University)

A new measure for dynamic leakage based on quantitative information flow

Quantitative information flow (QIF) is concerned with assessing the leakage of information in computational systems. In QIF, there are two main perspectives for the quantification of leakage. On one hand, the static perspective considers all possible runs of the system in the computation of information flow, and is usually employed when preemptively deciding whether or not to run the system. On the other hand, the dynamic perspective considers only a specific, concrete run of the system that has been realised, while ignoring other all runs. The dynamic perspective is relevant for, e.g., system monitors and trackers, especially when deciding whether to continue or abort a particular run based on how much leakage has occurred up to a certain point. Although the static perspective of leakage is well developed in the literature, the dynamic perspective still lacks the same level of theoretical development. In this paper, we take relevant steps in bridging this gap by providing the following key contributions: (i) we provide a novel definition of dynamic leakage that decouples the adversary’s belief about the secret value from the secret’s real distribution; (ii) we demonstrate that our formalisation satisfies relevant information-theoretic axioms, including non-interference and relaxed versions of monotonicity and the data-processing inequality (DPI); (iii) we identify under what type of analysis strong versions of the axioms of monotonicity and the DPI might not hold, and explain the implications of this (counter-intuitive) outcome; (iv) we show that our definition of dynamic leakage is compatible with the well-established static perspective; and (v) we exemplify the use of our definition on the formalisation of privacy attacks against data releases.

Robert Colvin (UQ - DSTG)

Structural operational semantics for functional and security verification of pipelined processors (CAV ’25 practice)

The design of computer microarchitectures is a challenging task, involving a trade-off between efficiency and security. In the literature, two concerns which are affected by the details of the microarchitecture – formal assembly semantics, and vulnerability discovery and mitigation – are often addressed separately. In this paper we provide a structural operational semantics for pipelined microprocessors (Arm, x86, RISC-V) that is based on a regular understanding of microarchitectural features (pipelines, branch prediction, caches, privilege checks), conforms to established memory consistency models, and exposes the Spectre and Meltdown vulnerabilities. A key point is that the operational rules correspond to stages of the pipeline and are based almost entirely on syntactic aspects of fetched instructions, as is generally understood to be the case in real pipelines. We develop a model checker in Haskell based closely on the semantics, which we use to experimentally validate the model and to provide the basis for security analyses.

Sasha Pak (ANU)

Towards Extending Rust’s Type System: a Study of Formal Models

Rust aims to be a performant and secure systems programming language through a clever region-based type system. While effective in many cases, it can sometimes reject intuitive and safe code. To reduce reliance on common workarounds, future extensions are already being developed. However, making the type system more expressive without compromising soundness requires formal reasoning. Rust’s formal specifications are still under active development. In this talk, I review the current landscape of Rust formalizations to identify a suitable foundation for exploring and validating such extensions.

Zhe Hou (Griffith University)

An LLM agent for model checking using PAT

While LLMs exhibit exceptional language understanding capability and adaptability, their unreliable outputs pose a major challenge. Conversely, FMs provide rigorous verification and reasoning techniques but have steep learning curves and scalability limitations. We have been exploring the integration of LLMs and FMs in many ways, and this work is a case study on model checking using PAT. We hope to see the day when the industry can apply formal methods like other software engineering methods without much entry hurdle and needing expensive labour, and LLM agents have brought that day closer. With our prototype LLM PAT agent, the user can specify the problem in natural language and see it being modelled and verified automatically with minimal human intervention. Our method uses a multi-LLM-agent architecture with the latest models to effectively solve the auto-formalisation problem in model checking.

Daniel Cumming (Runtime Verification)

Verification of Rust with K!

I will present an explanation / demo of the approach to defining the operational semantics of Stable MIR (an intermediate representation of Rust) in K. An advantage to defining operational semantics in K is that the a partial semantics is still executable, if a production is reached during execution that a rule does not pattern match the interpreter / symbolic executor will get stuck. Since Stable MIR has no specification of semantics, we take an example / test driven approach to implementing the semantics. For this semantics, we also apply a constructor to any terms that the semantics does not currently support which can never be evaluated but is not considered a valid term of any evaluation sort. This delays halting in the semantics to a point where evaluation of an unimplemented term is required in an operation with that term, and not where the term is introduced, providing us with more information to define the semantics of the use of the term.