Symbolic Execution of x86 assembly in Isabelle/HOL
Freek Verbeek, Abhijith Bharadwaj, Joshua Bockenek, Ian Roessle, Binoy Ravindran
• Symbolic execution = instructions semantics + rewrite rules• As a proof technique• Requires reasoning over memory regions• Per-block symbolic execution can be used in Hoare-style reasoning
over assembly
Symbolic Execution of x86-64
s0 = sL[rsp� 16] := 164, [rsp� 24] := 2
32, RAX := 1
64M
<latexit sha1_base64="XLqNKj3RL0wGFTqF9uE8Hs+CEIo=">AAADXHicfVFNbxMxEPVmC5QtgRQkLlwsIgQHiLKbqHCgUoELx4JIG2l3ibxep7Hij63tDYos/xTO/CYu/BacTSo1bWAkW08z783Y84qKUW36/d9BK9y7c/fe/v3o4EH74aPO4eMzLWuFyQhLJtW4QJowKsjIUMPIuFIE8YKR82L+aVU/XxClqRTfzLIiOUcXgk4pRsanJp1f+iU8hjpjrEKKCDMjmuo048jMjLFKV+5NfJTDbDVJkMtLmBXUcFmS0h4NnY3da7jNToa72YPE2cSzr8hfP4zdf9rCTKlrL4omnW6/128C3gbxBnTBJk4nh8HPrJS45r4FZkjrNO5XJrdIGYoZcVFWa1IhPEcXJPVQIE50bpuNOvjCZ0o4lcofYWCTva6wiGu95IVnrv6jb9ZWyV21tDbTd7mloqoNEXg9aFozaCRc2QNLqgg2bOkBwor6t0I8Qwph403cmqINR2qpyq2f2Ga7UjLtoijKBPmBJedIlDZbIOXSOLeNAdTYbuzcNuPKApcmubVZLUqiCj96Tkwa+901l+0mbtKId6vrjVou/iH+vhZ7Q+Ob9t0GZ0kvHvSSL8PuyceNtfvgGXgOXoEYvAUn4DM4BSOAg3YwCN4Hx60/4V54ELbX1Faw0TwBWxE+/Qt+oxeU</latexit>
mov QWORD PTR [rsp-16], 1
mov DWORD PTR [rsp-24], 2
mov rax, QWORD PTR [rsp-16]<latexit sha1_base64="62teinSmtxy9am5fYNUrcBmlugw=">AAADJHicdVHPb9MwGHXCBiP86uDIxSJC4jCqOkPAcYIduNFN6zopCZXjuJ3V2I5sp1BZ+VO48tdwQztw4R/hgpNGgmzlk2I9fe89f/H7srJg2oxGPz3/1s7u7Tt7d4N79x88fDTYf3yuZaUInRBZSHWRYU0LJujEMFPQi1JRzLOCTrPl+4afrqjSTIozsy5pyvFCsDkj2LjWbCCTjC6YsE6TuRavAy5X8GT68fQYjs9OYax0+RK9Tg8gapnjPhO9ckzUMgp/OdhiDBIq8r/XzwbhaDhqC94EqAMh6Go82/e+JrkkFafCkAJrHaNRaVKLlWGkoHWQVJqWmCzxgsYOCsypTm2bTA2fu04O51K5TxjYdv91WMy1XvPMKTk2l/o61zS3cXFl5m9Ty0RZGSrIZtC8KqCRsIkZ5kxRYoq1A5go5v4VkkusMDFuGb0p2nCs1irvvcQ2E42Uha6DIEgE/Uwk59glmaywqmOU2qSRMGNDVNd9RcYMlznN6zhKrU0qkbv03eglNTFy2bWHDaN61pq3u6vOLVf/MX/amN1C0fX13QTn0RAdDqOTKDx61612DzwFz8ALgMAbcAQ+gDGYAAKuwG9vx9v1v/nf/R/+1Ubqe53nCeiV/+sPqasAtQ==</latexit>
Formal x86 semanticsx86-64 machine code:1. Comprehensive formal model of x86-64 semantics2. Testing setup for formal semantics3. Symbolic execution engine
Machine-learnedSemantics
[1] Heule et al.: Stratified Synthesis: Automatically Learning the x86-64 Instruction Set (PLDI’16)[2] Roessle et al.: Formally Verified Big Step Semantics out of x86-64 Binaries (CPP’19).
Test Lemma’s
r64 := 032
^h31, 0i(01^¬rmem(a, 4,�) + 1
33+ 0
1^h31, 0i(r(r64 ,�)))
ZF := h31, 0i(01^¬rmem(a, 4,�) + 1
33+ 0
1^h31, 0i(r(r64 ,�))) == 0
32
CF := h32, 32i(01^¬rmem(a, 4,�) + 1
33+ 0
1^h31, 0i(r(r64 ,�))) == 1
32
SF := h31, 31i(01^¬rmem(a, 4,�) + 1
33+ 0
1^h31, 0i(r(r64 ,�))) == 1
32
OF := ¬h31, 31i(rmem(a, 4,�)) == 11 ! h31, 31i(r(r64 ,�) == 1
1) ^
¬(¬(h31, 31i(rmem(a, 4,�))) == 11 !
h31, 31i(01^¬rmem(a, 4,�) + 1
33+ 0
1^h31, 0i(r(r64 ,�))) == 1
1)
<latexit sha1_base64="ebbonN8H+FKRwRICCzAXEEcr+NQ=">AAAIt3ic1VXbbhMxEN0WGtJwa+ERIVkkVAlEUTYpt4dKFUWIB6oWQS9qHCKvd7KxupfU9rYNlvtZfAU/wBuPfAbeTWibdFso4qXWrjU7npkzZ2a0dvo+E7Je/z41fe36TO5GfrZw89btO3fn5u9tiijmFDZo5Ed82yECfBbChmTSh+0+BxI4Pmw5uyvJ+dY+cMGi8JMc9KEdEC9kXUaJNKrO/MxX7IDHQiWJE/uEa+VTXxdKeJ9wxZ8v6hJaQCWcAIWwtzf8cpgMIhdc1WxoVddIdbAImA8aYZ+Eng+oaVfrmKdy+cTcnrQOwUO8k2IFEOhyKhFdXaxiwbyAVNBTdAqtqU2IMdWZiBn4vHxMpopGcSuVEsYjljtvM0leGSZoaQlNtuSE3crF7BrVZuOq0bNP0fv4p+Y17atMby2bXpJvJscLSUxg2SkLbOIaz67kzOtJwnl0cE7ojPQzAlaO8AG4HpSQITGbrAXzDHMuD7fLJ/63mZfGMHFP9AkF1ag9g0N92an470PxrzMxLGupgCF0j3/TnblivVZPFzor2COhuEy+PflxtDNY78xP/cRuROMAQkl9IkTLrvdlWxEuGTXZFXAswJRrl3jQMmJIAhBtld4wGj02Ghd1I27eUKJUe9pDkUCIQeAYy4DInpg8S5RZZ61Ydl+2FQv7sYSQDoG6sY9khJLrCrmMA5X+wAiEcmZyRbRHOKHSXGpjKEIGhA+4O8ZEJYgyinyhC4WCaekBjYKAmEImddctu61wYsKkKtra2OA3YKrDYdUo1/rAiYy4Mo0JpVbpfo5FTIkwJl/g0PBw9QTW717qVqOtjHHoAncMiV2QLdt0Id1UsaE7aRo60zseeUf75zh/Hjmb2bAnJ+GssNmo2c1a44MZktfWcOWtB9Yjq2zZ1gtr2XpnrVsbFs09zK3k3udW86/ynXw33xuaTk+NfO5bYyu/9wt5v/Mk</latexit>
Example: sub r32, m32
r64 := zextend(op1 � op2 )ZF := op1 = op2
CF := op1 < op2
SF := sint(op1 � op2 ) < 0OF := (op1 � 231)
! (op1 < 231) ! (sint(op1 � op2 ) � 0)
where op1 = h31, 0i(r(r64 ,�)), op2 = rmem(a, 4,�)<latexit sha1_base64="PA7Zz0kyArQZupQCm9TKc964ahQ=">AAAE2HicnZNbb9MwFICzUGArl23wyIvFytRJZUraCXgAaWII8YI2NHbR6q5yXDeN5ktmO9uKZeAN8crP4Z/wxhuCPwFOWq3r1oHEkZKcHJ/P52ZHKU2UDoJvU/6V0tVr16dnyjdu3ro9Ozd/Z1uJTGKyhQUVcjdCitCEky2daEp2U0kQiyjZiQ7W8vWdIyJVIvhb3U9Ji6GYJ90EI+1M7Xl/E0YkTrjRKMooktZQTG25Ao+QNPLRiq2ARVCBeSBODg8HfxlGSlcLF5GGFjwEQ71ulyoQDum9lxPhU+rZiBpBa/+Ank6CNidDKuGXZen2CUb8+kT+DApjcgjq+6YRugKB42YKWZwZfh3qwJiSrpZJ3NNISnEMqmfT/g/6b/kXGQVFtyHLqE5c6hnjpuEGaE0FskicmOMekeS9BWM9pygPBRphLYCy0KuyejruGnBhY4aWlmqjcA6T7eKPEWYHzsjWVmpD34otQ8I7p2eoPbcQLAeFgItKOFQWVjsf9tvBj98b7fkpCjsCZ4xwjSlSqhkGqW4ZJF1hlLjtM0VShA9QTJpO5YgR1TLF8bfggbN0QFdI93ANCutZwiCmVJ9FzpMh3VPn13LjpLVmprtPWibhaaYJx4NA3YwCLUB+l0AnkQRr2ncKwjJxuQLcQxJh7W7cWBSlGZJ92RmrxOQRtRBU2XK5DDk5xoIx5BqZd9g2w5aBuUuizUJonQ98QVx3JHntjOspkUgLaYpjYk3xvsSjuK/WvCMnrg6Xg5tPeH4aF5Xt+nLYWK6/cYN67g1k2rvn3feqXug99la9V96Gt+Vh/6v/3f/p/yrtlT6WPpU+D1z9qSFz1xuT0pc/dlmPig==</latexit>
r64 := 032
^h31, 0i(01^¬rmem(a, 4,�) + 1
33+ 0
1^h31, 0i(r(r64 ,�)))
ZF := h31, 0i(01^¬rmem(a, 4,�) + 1
33+ 0
1^h31, 0i(r(r64 ,�))) == 0
32
CF := h32, 32i(01^¬rmem(a, 4,�) + 1
33+ 0
1^h31, 0i(r(r64 ,�))) == 1
32
SF := h31, 31i(01^¬rmem(a, 4,�) + 1
33+ 0
1^h31, 0i(r(r64 ,�))) == 1
32
OF := ¬h31, 31i(rmem(a, 4,�)) == 11 ! h31, 31i(r(r64 ,�) == 1
1) ^
¬(¬(h31, 31i(rmem(a, 4,�))) == 11 !
h31, 31i(01^¬rmem(a, 4,�) + 1
33+ 0
1^h31, 0i(r(r64 ,�))) == 1
1)
<latexit sha1_base64="ebbonN8H+FKRwRICCzAXEEcr+NQ=">AAAIt3ic1VXbbhMxEN0WGtJwa+ERIVkkVAlEUTYpt4dKFUWIB6oWQS9qHCKvd7KxupfU9rYNlvtZfAU/wBuPfAbeTWibdFso4qXWrjU7npkzZ2a0dvo+E7Je/z41fe36TO5GfrZw89btO3fn5u9tiijmFDZo5Ed82yECfBbChmTSh+0+BxI4Pmw5uyvJ+dY+cMGi8JMc9KEdEC9kXUaJNKrO/MxX7IDHQiWJE/uEa+VTXxdKeJ9wxZ8v6hJaQCWcAIWwtzf8cpgMIhdc1WxoVddIdbAImA8aYZ+Eng+oaVfrmKdy+cTcnrQOwUO8k2IFEOhyKhFdXaxiwbyAVNBTdAqtqU2IMdWZiBn4vHxMpopGcSuVEsYjljtvM0leGSZoaQlNtuSE3crF7BrVZuOq0bNP0fv4p+Y17atMby2bXpJvJscLSUxg2SkLbOIaz67kzOtJwnl0cE7ojPQzAlaO8AG4HpSQITGbrAXzDHMuD7fLJ/63mZfGMHFP9AkF1ag9g0N92an470PxrzMxLGupgCF0j3/TnblivVZPFzor2COhuEy+PflxtDNY78xP/cRuROMAQkl9IkTLrvdlWxEuGTXZFXAswJRrl3jQMmJIAhBtld4wGj02Ghd1I27eUKJUe9pDkUCIQeAYy4DInpg8S5RZZ61Ydl+2FQv7sYSQDoG6sY9khJLrCrmMA5X+wAiEcmZyRbRHOKHSXGpjKEIGhA+4O8ZEJYgyinyhC4WCaekBjYKAmEImddctu61wYsKkKtra2OA3YKrDYdUo1/rAiYy4Mo0JpVbpfo5FTIkwJl/g0PBw9QTW717qVqOtjHHoAncMiV2QLdt0Id1UsaE7aRo60zseeUf75zh/Hjmb2bAnJ+GssNmo2c1a44MZktfWcOWtB9Yjq2zZ1gtr2XpnrVsbFs09zK3k3udW86/ynXw33xuaTk+NfO5bYyu/9wt5v/Mk</latexit>
• Machine-learned semantics for 1625 instruction variants• Floating-points operations supported (non-executable) • Embedded into Isabelle/HOL
Formal x86 semantics
0000000000003e90 <dump_snapshots>:3e90: push r123e92: push rbp3e93: push rbx3e94: sub rsp,0x1203e9b: mov rax,QWORD PTR fs:0x283ea4: mov QWORD PTR [rsp+0x118],rax3eac: xor eax,eax3eae: lea rsi,[rsp+0x8]3eb3: call 19cc0 <bdrv_snapshot_list>3eb8: test eax,eax3eba: jle 3f33 <dump_snapshots+0xa3>3ebc: mov ebx,eax3ebe: lea rdi,[rip+0x1914e3ec5: lea r12,[rsp+0x10]3eca: call 3130 <puts@plt>3ecf: lea ebp,[rbx-0x1]3ed2: xor edx,edx3ed4: mov esi,0x1003ed9: mov rdi,r123edc: add rbp,0x13ee0: xor ebx,ebx3ee2: call 19e60 <bdrv_snapshot_dump>3ee7: imul rbp,rbp,0x1983eee: mov rdi,rax3ef1: call 3130 <puts@plt>3ef6: nop WORD PTR cs:[rax+rax*1+0x0]3f00: mov rdx,QWORD PTR [rsp+0x8]3f05: mov esi,0x1003f0a: mov rdi,r123f0d: add rdx,rbx3f10: add rbx,0x1983f17: call 19e60 <bdrv_snapshot_dump>3f1c: mov rdi,rax3f1f: call 3130 <puts@plt>3f24: cmp rbx,rbp3f27: jne 3f00 <dump_snapshots+0x70>3f29: mov rdi,QWORD PTR [rsp+0x8]3f2e: call 5960 <qemu_free>3f33: mov rax,QWORD PTR [rsp+0x118]3f3b: xor rax,QWORD PTR fs:0x283f44: jne 3f52 <dump_snapshots+0xc2>3f46: add rsp,0x1203f4d: pop rbx3f4e: pop rbp3f4f: pop r123f51: ret3f52: call 31c0 <__stack_chk_fail@plt>3f57: nop WORD PTR [rax+rax*1+0x0]
Apply Hoare logic to assembly
1. Extract control flow
Control Flow Extraction
Apply Hoare logic to assembly
1. Extract control flow2. Formulate a Hoare triple over each basic block
Basic block: Introduction rule:
Formal Symbolic Execution requires memory reasoning
s0 = sL[rsp� 16] := 164, [rsp� 24] := 2
32, RAX := 1
64M
<latexit sha1_base64="XLqNKj3RL0wGFTqF9uE8Hs+CEIo=">AAADXHicfVFNbxMxEPVmC5QtgRQkLlwsIgQHiLKbqHCgUoELx4JIG2l3ibxep7Hij63tDYos/xTO/CYu/BacTSo1bWAkW08z783Y84qKUW36/d9BK9y7c/fe/v3o4EH74aPO4eMzLWuFyQhLJtW4QJowKsjIUMPIuFIE8YKR82L+aVU/XxClqRTfzLIiOUcXgk4pRsanJp1f+iU8hjpjrEKKCDMjmuo048jMjLFKV+5NfJTDbDVJkMtLmBXUcFmS0h4NnY3da7jNToa72YPE2cSzr8hfP4zdf9rCTKlrL4omnW6/128C3gbxBnTBJk4nh8HPrJS45r4FZkjrNO5XJrdIGYoZcVFWa1IhPEcXJPVQIE50bpuNOvjCZ0o4lcofYWCTva6wiGu95IVnrv6jb9ZWyV21tDbTd7mloqoNEXg9aFozaCRc2QNLqgg2bOkBwor6t0I8Qwph403cmqINR2qpyq2f2Ga7UjLtoijKBPmBJedIlDZbIOXSOLeNAdTYbuzcNuPKApcmubVZLUqiCj96Tkwa+901l+0mbtKId6vrjVou/iH+vhZ7Q+Ob9t0GZ0kvHvSSL8PuyceNtfvgGXgOXoEYvAUn4DM4BSOAg3YwCN4Hx60/4V54ELbX1Faw0TwBWxE+/Qt+oxeU</latexit>
mov QWORD PTR [rsp-16], 1
mov DWORD PTR [rsp-24], 2
mov rax, QWORD PTR [rsp-16]<latexit sha1_base64="62teinSmtxy9am5fYNUrcBmlugw=">AAADJHicdVHPb9MwGHXCBiP86uDIxSJC4jCqOkPAcYIduNFN6zopCZXjuJ3V2I5sp1BZ+VO48tdwQztw4R/hgpNGgmzlk2I9fe89f/H7srJg2oxGPz3/1s7u7Tt7d4N79x88fDTYf3yuZaUInRBZSHWRYU0LJujEMFPQi1JRzLOCTrPl+4afrqjSTIozsy5pyvFCsDkj2LjWbCCTjC6YsE6TuRavAy5X8GT68fQYjs9OYax0+RK9Tg8gapnjPhO9ckzUMgp/OdhiDBIq8r/XzwbhaDhqC94EqAMh6Go82/e+JrkkFafCkAJrHaNRaVKLlWGkoHWQVJqWmCzxgsYOCsypTm2bTA2fu04O51K5TxjYdv91WMy1XvPMKTk2l/o61zS3cXFl5m9Ty0RZGSrIZtC8KqCRsIkZ5kxRYoq1A5go5v4VkkusMDFuGb0p2nCs1irvvcQ2E42Uha6DIEgE/Uwk59glmaywqmOU2qSRMGNDVNd9RcYMlznN6zhKrU0qkbv03eglNTFy2bWHDaN61pq3u6vOLVf/MX/amN1C0fX13QTn0RAdDqOTKDx61612DzwFz8ALgMAbcAQ+gDGYAAKuwG9vx9v1v/nf/R/+1Ubqe53nCeiV/+sPqasAtQ==</latexit>
rsp� 16 + 8 rsp� 24 _ rsp� 24 + 4 rsp� 16<latexit sha1_base64="keubsGbOeWs9GwybVD0zLHVXWcE=">AAACRXicbVC7SgNBFJ31GeNr1dJmMAiCGHZjiCmDNpYRTBSyS5id3NXB2QczdwNhyc/Z2Nv5BzYWitjq5FFozIGBwzn3cO+cIJVCo+O8WAuLS8srq4W14vrG5ta2vbPb1kmmOLR4IhN1GzANUsTQQoESblMFLAok3AQPFyP/pg9KiyS+xkEKfsTuYhEKztBIXdvzIob3iLnS6ZCeULdGj2mdehLojFOpUq8P8+RjWp0XcGtdu+SUnTHof+JOSYlM0ezaz14v4VkEMXLJtO64Top+zhQKLmFY9DINKeMP7A46hsYsAu3n4xaG9NAoPRomyrwY6Vj9nchZpPUgCszk6FA9643EeV4nw7Du5yJOM4SYTxaFmaSY0FGltCcUcJQDQxhXwtxK+T1TjKMpvmhKcGe//J+0K2X3tFy5qpYa59M6CmSfHJAj4pIz0iCXpElahJNH8kreyYf1ZL1Zn9bXZHTBmmb2yB9Y3z8uEq55</latexit>
requires a proof that two memory regions are separate:
• Linear equations modulo 264
• Address computations can become very complicated• Multi-dimensional arrays• Structs• Pointer arithmetic
Basic block:
Hoare Triple for basic block:
Introduction rule:
Demo
Apply Hoare logic to assembly
1. Extract control flow2. Formulate a Hoare triple over each basic block3. Compose proof over whole function using Hoare logic
Hoare Logic with Memory Usage
0000000000003e90 <dump_snapshots>:3e90: push r123e92: push rbp3e93: push rbx3e94: sub rsp,0x1203e9b: mov rax,QWORD PTR fs:0x283ea4: mov QWORD PTR [rsp+0x118],rax3eac: xor eax,eax3eae: lea rsi,[rsp+0x8]3eb3: call 19cc0 <bdrv_snapshot_list>3eb8: test eax,eax3eba: jle 3f33 <dump_snapshots+0xa3>3ebc: mov ebx,eax3ebe: lea rdi,[rip+0x1914e3ec5: lea r12,[rsp+0x10]3eca: call 3130 <puts@plt>3ecf: lea ebp,[rbx-0x1]3ed2: xor edx,edx3ed4: mov esi,0x1003ed9: mov rdi,r123edc: add rbp,0x13ee0: xor ebx,ebx3ee2: call 19e60 <bdrv_snapshot_dump>3ee7: imul rbp,rbp,0x1983eee: mov rdi,rax3ef1: call 3130 <puts@plt>3ef6: nop WORD PTR cs:[rax+rax*1+0x0]3f00: mov rdx,QWORD PTR [rsp+0x8]3f05: mov esi,0x1003f0a: mov rdi,r123f0d: add rdx,rbx3f10: add rbx,0x1983f17: call 19e60 <bdrv_snapshot_dump>3f1c: mov rdi,rax3f1f: call 3130 <puts@plt>3f24: cmp rbx,rbp3f27: jne 3f00 <dump_snapshots+0x70>3f29: mov rdi,QWORD PTR [rsp+0x8]3f2e: call 5960 <qemu_free>3f33: mov rax,QWORD PTR [rsp+0x118]3f3b: xor rax,QWORD PTR fs:0x283f44: jne 3f52 <dump_snapshots+0xc2>3f46: add rsp,0x1203f4d: pop rbx3f4e: pop rbp3f4f: pop r123f51: ret3f52: call 31c0 <__stack_chk_fail@plt>3f57: nop WORD PTR [rax+rax*1+0x0]
Demo
Conclusion
• Symbolic execution of x86-64 assembly in Isabelle/HOL• Based on machine-learned instruction semantics• Generate the proof code:
• Generate assumptions on memory layout (using Z3 theorem prover)• Generate assumptions on called functions• Generate pre- and postconditions• Generate invariants• Generate information on memory usage (for compositionality)
• Final proof is manual in case of loops
Future Work
• Deal with indirect branching• Combine with pointer-analysis• Formally prove correctness of diversified binaries• …