From vulnerability discovery to code execution
Exploiting Alpine LinuxBy Ariel Zelivansky, Security Researcher
What is Alpine Linux?
● Lightweight Linux distribution
● Alpine’s motto: Small, simple and secure
● Alpine docker image only 5 MB in size
● Security in mind
○ The kernel is patched with a port of grsecurity/PaX
○ Userspace binaries compiled as PIE, NX enabled, full RELRO,
with stack smashing protection
Who uses Alpine?
● Alpine has become widely popular for use with containers (10M+ pulls)
● Many Docker images are now based on Alpine
● Docker has officially stated their support of Alpine
Researching Alpine
● What does an alpine container consist of?
○ musl libc
○ busybox userspace binaries
○ apk-tools
● What do people do with Alpine containers?
○ Download more programs!
○ apk - Alpine’s package manager
Apk
● A tool to install, upgrade and delete packages (aka a package manager)
● Historically a collection of shell scripts, now written in C
● To add a package - apk update and apk add [name]
○ Or just apk add [name] -U/--update
● Can I somehow alter packages or convince apk to downgrade packages?
Apk
● Documentation first (Alpine’s wiki)
○ /etc/apk/repositories - list of local/remote repositories
○ By default with docker image - plain http
● Prone to MITM attack
● Fortunately, an attack is not so simple
○ Packages are signed
○ See /etc/apk/keys
● What about update?
○ “A repository is simply a directory with a collection of *.apk files. The directory must include a
special index file, named APKINDEX.tar.gz to be considered a repository.”
○ Update essentially downloads and parses the APKINDEX.tar.gz file
Apk
● Signature inside archive?
● Sounds like fuzzing time
○ What’s fuzzing?
○ american fuzzy lop (afl-fuzz)
■ Finds lots of bugs (and vulnerabilities) in open
source software project)
■ Compile with afl-gcc to instrument file
Apk
● Clone apk-tools from alpine’s git repository
● Empty README
● Relevant code seems likely to be in update.c
● main is in apk.c
● After inspecting the code for a while, it appears each action is defined as an applet
Apk
● Update.c doesn't seem to do anything
○ Actual code in database.c looks for
APK_UPDATE_CACHE flag
○ After briefly learning the code, I was ready to fuzz it
● Writing my own applet
○ Read data from file (fuzzer will provide)
○ Call apk_bstream_from_file to read the file
○ Call apk_db_index_read with the data
○ Define applet, add to Makefile
● Running afl inside docker container
○ Easy to setup and reproduce
Fuzzing Apk
● Fuzzer does nothing
● Tried fuzzing different other functions, tweaked the code to allow fuzzing
● Finally, decided on fuzzing apk_tar_parse
○ Looks promising
Fuzzing Apk
● Fuzzing very slow to my experience
● Diving into the code again
○ Removed anything that might slow down the fuzzer and I don’t need
○ init_openssl
○ apk_db_init / apk_db_open
● Fuzz time
Fuzzing Apk
● Multiple crashes
● Triaging crashes with crashwalk
○ Runs through all crashes and identify the crash type
○ Suggests if exploitable
○ My final summary results in 6 different crashes
Reproducing the crash
● So far I was only able to reach the crashes in my modified code
● To reproduce with the real apk, I used a crash as a bad tar.gz file
○ cat crash | gzip -9 > ~/docker/files/alpine/v3.6/main/x86_64/APKINDEX.tar.gz
○ Served the file from my local server
○ docker run -ti --add-host dl-cdn.alpinelinux.org:172.17.0.2 alpine:3.6
○ Upon running apk update, a segfault occurred!
● After a debugging session with gdb, I determined the origin of the crash
Explaining the bugs
● The result is two (similar) heap overflow vulnerabilities
● Let’s examine the relevant code (inside archive.c)
● Tar consists of blocks of 512 bytes, starting with a tar header block for each file
○ Reads tar stream in chunks, runs callback function on each chunk
● One of the fields of the header is a typeflag
○ One of its uses is to indicate special blocks, such as the “GNU long name extension”
○ This extension indicates the following block includes the name of the file (only 100 bytes
otherwise)
● How is this implemented?
Explaining the bugs
● Uses blob_realloc to allocate the buffer for the name
Explaining the bugs
● int is naturally signed
○ b->len is long, also signed
○ The comparison is signed
● Any integer bigger than the maximum of a signed integer (0x80000000)
will result in the buffer unmodified
Explaining the bugs
● The following call to is->read a huge amount of bytes will be copied to the buffer
○ AKA Heap overflow
○ As long as is->read accepts the size as unsigned
○ In the case of a tar.gz, is->read is gzi_read which accepts size_t (unsigned)
Explaining the bugs
● So to fix, make blob_realloc accept size_t!
○ Yes, but also make sure entry.size is not max int (because a +1 would overflow it)
● A similar bug occurred with a pax header block (another special block)
Developing an exploit
● I built a minimalistic tar file
○ To trigger the bug, I put a longname block with a
negative size
○ In tar size is an octal number in ASCII, I went with
0o77777777777 (-1 for a signed 32-bit integer)
Developing an exploit
● The execution crashed as expected
○ The crash was on the copy of a null-terminating zero meant for the entry.name buffer
○ entry.name was not allocated, so it pointed to null
○ entry.size was 0xffffffffffffffff (it was implicitly converted to 64-bit, it’s of type off_t)
Developing an exploit
● I created another file, with two blocks
○ First block to allocate the buffer with a size I want
○ Second buffer exploits the vulnerability with the allocated buffer
● Debugging the execution, it seems everything goes as expected
○ The buffer is allocated then overwritten
○ The code works to my advantage - is->read is gzip_read
■ gzip_read copies chunks from the source stream to the target and stop once
the source runs out!
■ No need to worry about the source’s size
Developing an exploit
● There are various known ways to exploit a heap overflow
○ Remember musl libc? Memory allocation (malloc, realloc) is done by it
○ I preferred not to research it
○ I can workaround an exploit using the code
■ Is there anything useful on the heap? A flag to change? Structs with callbacks?
■ I could simply change a callback address to execv or system
● Mitigations?
○ ASLR
○ For the sake of a proof-of-concept, ignoring ASLR
Developing an exploit
● Lots of trial and error, trying to find structs after entry.name I should overwrite
● I realized I can just use the is struct, which is used on is->read
● It is of type apk_istream
● I put a breakpoint on the call to is->read
● I calculated the delta between my buffer (entry.name) to the is struct
Developing an exploit
● I filled my tar file with 0x153a0 bytes, following 16 zero bytes
● It worked!
○ The execution crashed on 0x0000000000000000
● Next step - call system with a string I control
Developing an exploit
● is->read parameters?
○ is->read(is, entry.name, entry.size);
● Since the first parameter is itself, I could overwrite the first 8 bytes of it
with my shell string
○ The first 8 bytes are of get_meta which is not called in our context
○ I used “echo 1” as the string
○ It worked!
● New problems
○ Shell string limit is 8 bytes, too short
○ The next day I failed to reproduce the exploit
■ is->read seems to write the data in chunks, so it only writes 4 bytes and calls
is->read again (which is only partly modified)
Developing an exploit
● How would I find what’s after the is struct?
● I recover is in the file (copy the actual addresses)
● I added random bytes after it
● gis->bs pointer seems like a good choice
● It is of type apk_bstream
Developing an exploit
● gis->bs->read is used in the same manner as is->read
● It has 8 more bytes to use for the shell string (used for flags)
● I overwrote a pointer to the struct unlike is where I had overwritten the actual struct
● I put my data just 32 bytes before the is struct
○ I could put it anywhere I have control of
gis->bs->flags gis->bs->get_meta gis->bs->read gis->bs->close, is->get_meta….
overwritten to system
Developing an exploit
● It works!
Demonstration
Real attack vector
● Man-in-the-middle in an organization
○ Attacker gets code execution on any alpine
package install or update
○ Attacker gets code execution on building alpine
images
○ Signature did not help since it’s taken from inside
the tar
Final steps
● I’ve found a vulnerability, what next?
● Responsible/Coordinated disclosure
○ Estimate the impact, write a proof-of-concept if it makes sense
○ Contact the developers
■ Nearly always privately, you don’t want public disclosure
■ Work on a fix
○ Assign CVE IDs
■ Check for the correct CNA (CVE Numbering Authority)
■ Otherwise contact MITRE through their web form
○ Disclose the vulnerability online
■ For open source the oss-security mailing list is a good choice
Final steps
● The bugs I found affect all apk versions since 2.5.0_rc1
● I reached alpine’s developers on IRC
○ Discussed the issues with Timo Teräs in private emails
○ A patch was released very quickly and was pushed to apk-tools 2.7.2 and 2.6.9
■ All alpine versions from current to 3.2-stable include the fix
○ Besides fixing the bugs, Timo also implemented additional hardenings to restrict
attackers from creating a similar exploit
■ This is done by removing the use of function pointers that are saved on structs on the
heap
● I sent an advisory to oss-sec and wrote about the issue in the Twistlock’s blog
Future ideas
● Fuzzing other parts of apk
● Fuzzing other alpine tools
● Fuzzing libfetch