Understanding the ELF file structure
I recently took it upon myself to, at least at the surface level, understand how the ELF ("Executable and Linkable Format") works. In order to do that, I set out to create a tiny python3 module for parsing (meta-)data out of ELF files.
The result of that is p3elf, during the development of which I familiarized myself with ELF, binary IO in python, as well as setuptools
and publishing python packages on PyPI.
This post will mainly serve as a short future reference of the structure of ELF files and will most likely be updated in the future.
Structure
The Executable and Linkable Format Wikipedia article is an excellent reference for the structure and fields of ELF files, though I found it to be a little ambiguous in certain places. Some concepts are not completely obvious unless you have prior experience with ELF.
File header
Every ELF file begins with a file header. It contains general metadata about the binary, and its size is known in advance - it is 64
bytes long on 64bit binaries, and 56
bytes long on 32bit ones.
Notable fields here include:
EI_CLASS
: denotes the byteclass of the binary (0x1
: 32bit,0x2
: 64bit)- this field is particularly important because the lengths and offsets of many other fields in the file depend on it
EI_DATA
: denotes the endianness of the binary- important for the same reasons as EI_CLASS
EI_OSABI
: denotes the ABI, but is often set to0x0
(System V) regardless of the actual platformE_TYPE
: type of object file (executable, relocatable, etc.)E_MACHINE
: denotes the target ISA, e.g.0x3E
for'amd64'
E_PHOFF
,E_SHOFF
,E_PHOFF
,E_SHNUM
,E_PHNUM
,E_SHENTSIZE
,E_PHENTSIZE
- offsets, counts, and sizes of section headers and program headers - more on these later
Segments & Program headers
Every ELF file is divided into segments, which are further divided into one or more sections.
Every program header corresponds to one segment and provides info and metadata about it. Because of that, we know the number of segments in advance - it’s the E_PHNUM
field in the file header.
The first segment is the “program header table” - i.e. the segment that contains all of the program headers (which in turn describe other segments).
The first program header (PT_TYPE
0x00000006, i.e. PT_PHDR
) (the one that describes the entire program header table segment) is found at offset E_PHOFF
and is E_PHENTSIZE
bytes long. Because every program header is the same size, and because they all follow the first one (of which we know the offset), we can traverse through them pretty easily.
Notable fields in every program header include:
P_TYPE
- the type of segment that this header describes
P_OFFSET
- the offset from the beginning of the file to the segment that this header describes
P_FILESZ
- the size of the segment that this header describes
P_FLAGS
- segment-specific flags that provide additional information about the segment that this header describes
Sections & Section headers
Every one of the aforementioned segments consists of one or more sections, each of which is associated with a section header that describes it.
The first section header is found at offset E_SHOFF
, and is E_SHENTSIZE
bytes long. It is always “empty” - its SH_TYPE
is 0x0 (i.e. SHT_NULL
) and it doesn’t point to any section.
There are E_SHNUM
section headers, from which we can conduct that there are also E_SHNUM
sections.
Notable fields in every section header include:
SH_NAME
- an offset to a null-terminated string in the
.shstrtab
section denoting the section’s name .shstrtab
is a special section (that has its own section header, like any other) that contains null terminating strings describing the names of all other sections- programs can use the
EI_SHSTRNDX
(the integer index of the.shstrndx
section, where 0 <EI_SHSTRNDX
< (EI_SHNUM
- 1) ) field in the file header to easily seek to this section
- an offset to a null-terminated string in the
SH_TYPE
- the type of the section that this headers describes
SH_FLAGS
- addtional attributes about the section
SH_OFFSET
- the offset from the beginning of the file to the section that this header describes
SH_SIZE
- the size of this section
SH_ENTSIZE
- size of each entry in this section, for sections that have dynamic entries, otherwise
0
- for example, this field is set to
0
in the header that describes the.shstrab
section as it contains null-terminating strings of varying lengths
- size of each entry in this section, for sections that have dynamic entries, otherwise
SH_LINK
- section index of an associated section (its use depends on the type of section that this header describes)
Famous examples of sections include .text (actual executable instructions), .data, .rodata (read-only data), .symtab (the symbol table) and others. There may also exist platform-specific sections, for example a really cool one produced by binutils’ ld
linker, .note.gnu.build-id
, which is a sha1/md5 hash of the output file. Or .comment
, which includes the version of the GCC stack that was used to build the ELF.
Tools and references
- the best tools to use for interacting with ELF files are
readelf
,objdump
, andsize
. They are all part of GNU’sbinutils
so you most likely already have them installed. - SysV ABI Specification DRAFT
- ELF tag on StackOverflow
- Executable and Linkable Format - Wikipedia
pyelftools
on GitHub