DE LA RECHERCHE À L'INDUSTRIE







# FORMAL VERIFICATION OF A SOFTWARE COUNTERMEASURE AGAINST INSTRUCTION SKIP ATTACKS

Karine Heydemann<sup>1</sup>, **Nicolas Moro**<sup>1,2</sup>, Emmanuelle Encrenaz<sup>1</sup>, Bruno Robisson<sup>2</sup>

#### <sup>1</sup> LIP6 - UPMC

Laboratoire d'Informatique de Paris 6 Université Pierre et Marie Curie

<sup>2</sup>**CEA** Commissariat à l'Energie Atomique et aux Energies Alternatives

PROOFS 2013 – AUGUST 24, SANTA BARBARA, USA





- Target: Fault attacks on embedded programs
- Fault model: assembly instruction skip



• A large set of harmful attacks may be possible with such a fault model



How can we **ensure a correct execution** of the embedded program with a **possible instruction skip fault**?

1 – Provide a fault-tolerant replacement sequence for each instruction

2 – Provide a formal proof for this fault tolerance





## I. Considered fault model

## **II.** Countermeasure scheme

## **III.** Formal proof of fault tolerance

## **IV. Application to an AES implementation**

## V. Conclusion







- Modify the circuit's environment to change its computation
- Many physical ways to inject such faults into a circuit
- Every fault injection means has a specific fault model





- We assume an attacker can skip an assembly instruction
- Observed for different architectures and injection means

| Fault injection means  | Reference                                    |
|------------------------|----------------------------------------------|
| Clock glitches         | Balasch et al. (FDTC 2011)                   |
| Voltage underfeeding   | Barenghi <i>et al.</i> (J. Syst. Soft. 2013) |
| Electromagnetic pulses | Dehbaoui et al. (FDTC 2012)                  |
| Laser                  | Trichina et al. (FDTC 2010)                  |

 In our own experiments, instruction skips are specific cases of instruction replacements (FDTC 2013)

# **Ceal How to deal with double faults**?

## Double faults are practical under some specific constraints

(a few tens of clock cycles, at a few KHz frequency)



 We assume performing a double fault on two consecutive clock cycles is significantly harder to do in practice

Usual redundancy countermeasures need to be adapted





- I. Considered fault model
- ➡ II. Countermeasure scheme
  - **III. Formal proof of fault tolerance**
  - **IV. Application to an AES implementation**
  - V. Conclusion



- Propose a fault-tolerant replacement scheme for each assembly instruction of the whole instruction set
- Each encoding has its own fault-tolerant replacement sequence

#### ARM Thumb2 instruction set

- -16/32 bit **RISC instruction set**
- 151 instructions
- Each instruction has up to 4 encodings

(depends on the operands, the registers used, conditional execution, ...)

## **Cea** Three Classes of Instructions



| Class                   | Examples                                      |
|-------------------------|-----------------------------------------------|
| Idempotent instructions | mov r1,r8<br>add r3,r1,r2                     |
| Separable instructions  | add r1,r1,#1<br>push {r4,r5,r6}               |
| Specific instructions   | bl <function><br/><b>IT blocks</b></function> |

#### **Standard code**

| ADD | R1,                                          | R1,  | #1 |
|-----|----------------------------------------------|------|----|
| CMP | R1,                                          | #9   |    |
| в   | <la!< th=""><th>bel&gt;</th><th></th></la!<> | bel> |    |

#### Code with replacement sequences

| ADD | R12, R1,        | #1 |
|-----|-----------------|----|
| ADD | R12, R1,        | #1 |
| MOV | R1, R12         |    |
| MOV | R1, R12         |    |
| CMP | R1, #9          |    |
| CMP | R1, #9          |    |
| В   | <label></label> |    |
| В   | <label></label> |    |
|     |                 |    |





Instructions that have the same effect if executed once or twice
 Simple instruction duplication

| Instruction                                    | Replacement sequence |
|------------------------------------------------|----------------------|
| mov r1, r8                                     | mov r1,r8            |
| (copies r8 into r1)                            | mov r1,r8            |
| ldr r1, [r8, r2]                               | ldr r1, [r8, r2]     |
| (loads the value at the address r8+r2 into r1) | ldr r1, [r8, r2]     |
| str r3, [r2, #10]                              | str r3, [r2, #10]    |
| (stores r3 at the address r2+10)               | str r3, [r2, #10]    |
| add r3,r1,r2                                   | add r3,r1,r2         |
| (puts r1+r2 into r3)                           | add r3,r1,r2         |

Doubles the code size, doubles the execution time





- Not idempotent, with a source register also destination
- ... but can be replaced by a sequence of idempotent instructions

| ADD R1, R1, #1                                                                 | <pre>PUSH {r1, r2, r3, lr} (equivalent to STMDB sp!, {r1, r2, r3, lr})</pre>                                               |  |
|--------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------|--|
| <pre>ADD r12, r1, #1 ADD r12, r1, #1 MOV r1, r12 MOV r1, r12 MOV r1, r12</pre> | <pre>STMDB sp, {r1, r2, r3, lr} STMDB sp, {r1, r2, r3, lr} SUB r12, sp, #16 SUB r12, sp, #16 MOV sp, r12 MOV sp, r12</pre> |  |

- Variable overhead cost in code size and performance
- Need for an available free register





- Some instructions cannot be easily replaced by such a sequence
- Branch instructions can be duplicated, but not subroutine calls (otherwise those subroutines would be executed twice)





#### Instructions that read and write the flags

- A replacement sequence can be found
  - if the flags are not alive
- Otherwise:
  - forbid the use of those instructions
  - use a fault detection sequence

#### IT blocks for conditional execution

- Convert IT blocks into branch-based if/then/else structures
- Then apply the individual countermeasure scheme
- A more tricky replacement is possible by keeping the IT structure

but it can quickly become very costly

| mrs  | r12 , APSR   |
|------|--------------|
| mrs  | r12 , APSR   |
| adcs | r1 , r2 , r3 |
| msr  | APSR, r12    |
| msr  | APSR, r12    |
| adcs | r1 , r2 , r3 |





- A fault-tolerant replacement sequence for all the instructions (except for the ones that read and write the flags)
- Can be directly applied as a transformation to an assembly code

→ Can we prove the replacement sequences are fault-tolerant?

→ Can we prove they are **equivalent to the initial instructions**?

We use a **model-checking approach** for such a proof





- I. Considered fault model
- II. Countermeasure scheme
- III. Formal proof of fault tolerance
  - **IV. Application to an AES implementation**
  - V. Conclusion





- Model-checking aims at ensuring an implementation satisfies a specification
- Specifications can be expressed with temporal logic formulas







- Model-checking approach at an instruction scale
- Specific construction of a state machine with an instruction and its replacement sequence
- We need to prove that the output state of a replacement sequence is equivalent to the output state of the initial instruction







- Each instruction is a function that maps a registers and memory configuration to a new registers and memory configuration
- A sequence of instructions is modeled as a transition system
- States are registers and memory configurations
- Transitions mimic the state transformations applied by the instructions
- Each instructions has specific properties that need to be proved

 $\mathcal{D}$  PROOF SYSTEM FOR THE BL INSTRUCTION  $\mathbb{P}$ 









- Flags are not equal if a fault targets the last ADCS instruction
- LIGHT RESULT releases this constraint

```
# MC: formula passed --- AG(AF(adcs.pc=PC1))
# MC: formula passed --- AG(AF(cm(adcs).pc=PC1))
# MC: formula passed --- AG(((adcs.pc=PC1 * cm(adcs).pc=PC1) -> LIGHT_RESULT=1))
# MC: formula failed --- AG(((adcs.pc=PC1 * cm(adcs).pc=PC1) -> RESULT=1))
```





- I. Considered fault model
- II. Countermeasure scheme
- **III. Formal proof of fault tolerance**
- IV. Application to an AES implementation
  - V. Conclusion



- Round keys calculated before each AddRoundKey operation
- Possible optimization: last two rounds with countermeasure

| Implementation                | Clock cycles           | Code size            |
|-------------------------------|------------------------|----------------------|
| AES - without countermeasure  | 9595                   | 490 bytes            |
| AES - whole code with CM      | 20503 (+113.7%)        | 1480 bytes (+202%)   |
| AES – last two rounds with CM | 11374 <b>(+18.6 %)</b> | 1874 bytes (+282.5%) |

#### → High overhead cost,

but **comparable** to the cost brought by usual redundancy approaches

→ Enables a fault tolerance at a cheaper cost compared to triplication





- I. Considered fault model
- II. Countermeasure scheme
- **III. Formal proof of fault tolerance**
- **IV. Application to an AES implementation**

# ➡ V. Conclusion





- Fault-tolerant countermeasure scheme
- Tolerant to multiple fault that do not target two consecutive instructions
- **Proof** of fault tolerance and equivalence to the initial instructions
- Adds a reasonably good security level, without any hardware countermeasure
- Cost comparable to usual algorithm-level redundancy schemes



PROOFS 2013 – Santa Barbara, USA | PAGE 24



Ce2 THANK YOU FOR YOUR ATTENTION



# Any questions ?