Understanding the ELF file structure

I recently took it upon myself to, at least at the surface level, understand how the ELF ("Executable and Linkable Format") works. In order to do that, I set out to create a tiny python3 module for parsing (meta-)data out of ELF files.

The result of that is p3elf, during the development of which I familiarized myself with ELF, binary IO in python, as well as setuptools and publishing python packages on PyPI.

This post will mainly serve as a short future reference of the structure of ELF files and will most likely be updated in the future.

Structure

The Executable and Linkable Format Wikipedia article is an excellent reference for the structure and fields of ELF files, though I found it to be a little ambiguous in certain places. Some concepts are not completely obvious unless you have prior experience with ELF.

File header

Every ELF file begins with a file header. It contains general metadata about the binary, and its size is known in advance - it is 64 bytes long on 64bit binaries, and 56 bytes long on 32bit ones.

Notable fields here include:

  • EI_CLASS: denotes the byteclass of the binary (0x1: 32bit, 0x2: 64bit)
    • this field is particularly important because the lengths and offsets of many other fields in the file depend on it
  • EI_DATA: denotes the endianness of the binary
    • important for the same reasons as EI_CLASS
  • EI_OSABI: denotes the ABI, but is often set to 0x0 (System V) regardless of the actual platform
  • E_TYPE: type of object file (executable, relocatable, etc.)
  • E_MACHINE: denotes the target ISA, e.g. 0x3E for 'amd64'
  • E_PHOFF, E_SHOFF, E_PHOFF, E_SHNUM, E_PHNUM, E_SHENTSIZE, E_PHENTSIZE
    • offsets, counts, and sizes of section headers and program headers - more on these later

Segments & Program headers

Every ELF file is divided into segments, which are further divided into one or more sections.

Every program header corresponds to one segment and provides info and metadata about it. Because of that, we know the number of segments in advance - it’s the E_PHNUM field in the file header.

The first segment is the “program header table” - i.e. the segment that contains all of the program headers (which in turn describe other segments).

The first program header (PT_TYPE 0x00000006, i.e. PT_PHDR) (the one that describes the entire program header table segment) is found at offset E_PHOFF and is E_PHENTSIZE bytes long. Because every program header is the same size, and because they all follow the first one (of which we know the offset), we can traverse through them pretty easily.

Notable fields in every program header include:

  • P_TYPE
    • the type of segment that this header describes
  • P_OFFSET
    • the offset from the beginning of the file to the segment that this header describes
  • P_FILESZ
    • the size of the segment that this header describes
  • P_FLAGS
    • segment-specific flags that provide additional information about the segment that this header describes

Sections & Section headers

Every one of the aforementioned segments consists of one or more sections, each of which is associated with a section header that describes it.

The first section header is found at offset E_SHOFF, and is E_SHENTSIZE bytes long. It is always “empty” - its SH_TYPE is 0x0 (i.e. SHT_NULL) and it doesn’t point to any section.

There are E_SHNUM section headers, from which we can conduct that there are also E_SHNUM sections.

Notable fields in every section header include:

  • SH_NAME
    • an offset to a null-terminated string in the .shstrtab section denoting the section’s name
    • .shstrtab is a special section (that has its own section header, like any other) that contains null terminating strings describing the names of all other sections
    • programs can use the EI_SHSTRNDX (the integer index of the .shstrndx section, where 0 < EI_SHSTRNDX < (EI_SHNUM - 1) ) field in the file header to easily seek to this section
  • SH_TYPE
    • the type of the section that this headers describes
  • SH_FLAGS
    • addtional attributes about the section
  • SH_OFFSET
    • the offset from the beginning of the file to the section that this header describes
  • SH_SIZE
    • the size of this section
  • SH_ENTSIZE
    • size of each entry in this section, for sections that have dynamic entries, otherwise 0
    • for example, this field is set to 0 in the header that describes the .shstrab section as it contains null-terminating strings of varying lengths
  • SH_LINK
    • section index of an associated section (its use depends on the type of section that this header describes)

Famous examples of sections include .text (actual executable instructions), .data, .rodata (read-only data), .symtab (the symbol table) and others. There may also exist platform-specific sections, for example a really cool one produced by binutils’ ld linker, .note.gnu.build-id, which is a sha1/md5 hash of the output file. Or .comment, which includes the version of the GCC stack that was used to build the ELF.

Tools and references