ECFS Specification document, April 2015 - elfmaster [at] zoho.com

[See also, man.txt for documentation on using libecfs API]

-= Intro

ECFS (Extended core file snapshots) is an extensible ELF format
that is somewhat like a hybrid between an ELF executable and
and ELF core file. The purpose of an ecfs file (referred to as
ecfs-core, or ecfs-snapshot files) is to aggregate and compile
everything about a running program (A process) into a single
snapshot file that follows and builds on the ELF file format.
This file format allows reverse engineers and forensics analysts
to quickly vet an entire process, while quickly identifying 
anomalous characteristics that pertain to malware including
rootkits, viruses, backdoors, and hot-patches. The ecfs technology
has been covered in some detail in several other papers including
POC||GTFO 0x7.

spec.txt covers the ecfs specific extensions that augment
the existing ELF format. Although some new section types
and features exist, they still essentially stay within the
bounds of the original ELF specification. There is an API named
libecfs for parsing ecfs-snapshots that is documented in
libecfs_manual.txt. 

Here we will cover the specs of the ecfs file format itself
as they differ from the original ELF format.

-= ECFS philosophy

ECFS is all about making runtime analysis of a program easier than
ever before. The entire process is encased within a single file,
and organized in such a way that locating and accessing data and
code that is critical for detecting anomalies and infections is
almost trivial to do. Part of this philosophy is that all process
code and data is stored and accessible through ELF section headers.
Everything from the shared library memory maps to the stack, the
heap, and ecfs customly prepared data is available as a section header.
Even the entire /proc/$pid directory tree is compressed and stored
in an ELF section. Perhaps the most useful and clever aspect of ecfs
is its ability to fully reconstruct the symbol tables for every single
function in the local ('.symtab') and dynamic ('.dynsym') symbol tables.
As any reverse engineer knows, symbols making reverse engineering so
much easier. Symbols help connect all of the dots between reference
points and code transfers. ECFS uses specialized techniques to fully
reconstruct symbol tables.

A good example of using ecfs and its corresponding libecfs parsing
API is in https://www.alchemistowl.org/pocorgtfo/pocorgtfo07.pdf
which demonstrates how easy it is to write a malware analyzer for
detecting DLL injection and PLT hooks in a process that has been
compromised by an attacker.

-= ECFS Execution

ecfs-snapshot files are fecund with information about the program
and process, and are therefore equipped with everything that is
needed to re-execute. In other words, ecfs technology can be used
to snapshot/stop a process, and then restart it later in time. This
is a side project of ecfs and is still in early prototype stages.
See https://github.com/elfmaster/ecfs_exec


-= KCORE-ECFS

The /proc/kcore file that is available in Linux is a core dump of
the running kernel. There is a side-project that I have been working
on for some time that includes special software to create an ecfs-snapshot
out of /proc/kcore. The output file is fully instrumented with a symbol
table and section headers. It includes code and data from the text, data,
.bss, slab, vmalloc, and all of the kernel modules that are loaded. 
The kcore-ecfs snapshot utility will not be made available to the public
right away, but at some point down the line should make an appearance.


-= ECFS custom sections

Here we will list any ELF section types that are newly introduced
with the ecfs format. Any sections other than what we are discussing
below, were in existence prior to ecfs technology, and generally 
retain the same definition, other than the fact that they will reflect
the state of code or data as they are at runtime.


	******************
	***   ._TEXT   ***
	******************

This section is not to be confused with .text which is the text
section of the executable. The ._TEXT section actually refers to
the entire text segment of the executable. This way a user can quickly
locate the text segment without having to deduce where it exists by
looking at the program headers.




	******************
	***   ._DATA   ***
	******************

This section is not to be confused with .data which is the data
section of the executable. The ._DATA section actually refers to
the entire data segment of the executable. This way a user can quickly
locate the data segment without having to deduce where it exists by
looking at the program headers.




	***********************
	***   .procfs.tgz   ***
	***********************

This section was originally created so that certain
technology (Such as google breakpad) that require a complete
/proc/$pid in order to run, can work with offline ecfs-snapshots.
The '.procfs.tgz' section contains the results of 'tar -czf /proc/$pid'.
This section can be extracted from the ecfs file using either
objcopy (Such as with the script/extract_proc.sh) or with the
'readecfs -X <ecfs_file> <output.tgz>' command. Many of the 
sections contained with an ecfs-core file are abstractions of
data that were taken from /proc/$pid, but it certainly can't hurt
to have a copy of the entire directory structure for analysis that
requires it.




	*********************
	***   .prstatus   ***
	*********************

This section contains the 'struct elf_prstatus' structures. 
There will be one struct for every thread. Traditional ELF core
files also have these structs, but they are usually contained
within the ELF notes segment, which is more difficult to parse
than section headers. 

/* These are contained within .prstatus section */

struct elf_prstatus
  {
    struct elf_siginfo pr_info;         /* Info associated with signal.  */
    short int pr_cursig;                /* Current signal.  */
    unsigned long int pr_sigpend;       /* Set of pending signals.  */
    unsigned long int pr_sighold;       /* Set of held signals.  */
    __pid_t pr_pid;
    __pid_t pr_ppid;
    __pid_t pr_pgrp;
    __pid_t pr_sid;
    struct timeval pr_utime;            /* User time.  */
    struct timeval pr_stime;            /* System time.  */
    struct timeval pr_cutime;           /* Cumulative user time.  */
    struct timeval pr_cstime;           /* Cumulative system time.  */
    elf_gregset_t pr_reg;               /* GP registers.  */
    int pr_fpvalid;                     /* True if math copro being used.  */
  };


	
	*******************
	***   .fdinfo   ***
	*******************


This section contains an ecfs custom struct that describes
the files, pipes, and sockets that are open. An example of
using libecfs to parse this information is available in the
libecfs_manual.txt file. 

typedef struct fdinfo {
        int fd;			// fd number
        char path[MAX_PATH];	// path to file, or may contain pipe or socket info
        loff_t pos;		// position/offset within a file stream
        unsigned int perms;	// perms of open() file descriptor (i.e. O_RDONLY)
        struct {
                struct in_addr src_addr; // src address of socket
                struct in_addr dst_addr; // dst address of socket
                uint16_t src_port;	 // src port of socket
                uint16_t dst_port;	 // dst port of socket
        } socket;	
        char net;		// if net is 1 then TCP, if 2, then UDP
} fd_info_t;



	
	********************
	***   .siginfo   ***
	********************

This section contains the siginfo_t struct which obviously
holds signal data about the process, including what signal
killed it, and what address it faulted at (If it faulted).


           siginfo_t {
               int      si_signo;    /* Signal number */
               int      si_errno;    /* An errno value */
               int      si_code;     /* Signal code */
               int      si_trapno;   /* Trap number that caused
                                        hardware-generated signal
                                        (unused on most architectures) */
               pid_t    si_pid;      /* Sending process ID */
               uid_t    si_uid;      /* Real user ID of sending process */
               int      si_status;   /* Exit value or signal */
               clock_t  si_utime;    /* User time consumed */
               clock_t  si_stime;    /* System time consumed */
               sigval_t si_value;    /* Signal value */
               int      si_int;      /* POSIX.1b signal */
               void    *si_ptr;      /* POSIX.1b signal */
               int      si_overrun;  /* Timer overrun count; POSIX.1b timers */
               int      si_timerid;  /* Timer ID; POSIX.1b timers */
               void    *si_addr;     /* Memory location which caused fault */
               long     si_band;     /* Band event (was int in
                                        glibc 2.3.2 and earlier) */
               int      si_fd;       /* File descriptor */
               short    si_addr_lsb; /* Least significant bit of address
                                        (since Linux 2.6.32) */
           }




	**********************
	***   .auxvector   ***
	**********************

This section contains the processes auxiliary vector which
is always setup at the bottom of the stack during process
startup. This is taken directly from /proc/$pid/auxv and can
be parsed using the functionality in libecfs API. An example
of this can be found in the libecfs_manual.txt or the readecfs.c
source code.




	********************
	***   .exepath   ***
	******************** 

This section simply contains the executable path of the program
who's process it is.




	************************
	***   .personality   ***
	************************

This section contains certain personality traits of the ECFS
file. The ecfs_elf_t struct contains a member of type elf_stat_t
and you may see the readecfs.c source code for an example of how
to check the presonality traits.

	typedef struct elf_stats {

#define ELF_STATIC (1 << 1) // if its statically linked (instead of dynamically)
#define ELF_PIE (1 << 2)    // if its position indepdendent executable
#define ELF_LOCSYM (1 << 3) // local symtab exists?
#define ELF_HEURISTICS (1 << 4) // were detection heuristics used by ecfs?
#define ELF_STRIPPED_SHDRS (1 << 8) // was the original executable stripped of section headers?
 	 
	      unsigned int personality; // if (personality & ELF_STATIC)
	} elf_stat_t;


NOTE: Currently the ELF_LOCSYM personality trait is not in effect
so please ignore the results associated with testing the ELF_LOCSYM bit.

	




	********************
	***   .arglist   ***
	********************

This section contains the argv array of the programs main() function
so that you can see the command line options that were used. An
example of using this function can be found in readecfs.c source
code.

	argv[0] '/bin/ls'
	argv[1] '-l'
	argv[2] '/etc'

	


	*********************
	***   .fpregset   ***
	*********************

This section contains the floating pointer register set for each
thread of the process. The general registers are available in elf_prstatus
structs, but the floating pointer registers are not, which is why
they have their own section.


	- x86_64 -

	struct user_fpregs_struct
	{
 	 unsigned short int    cwd;
  	 unsigned short int    swd;
  	 unsigned short int    ftw;
  	 unsigned short int    fop;
  	 __extension__ unsigned long long int rip;
  	 __extension__ unsigned long long int rdp;
  	 unsigned int          mxcsr;
  	 unsigned int          mxcr_mask;
  	 unsigned int          st_space[32];   /* 8*16 bytes for each FP-reg = 128 bytes */
  	 unsigned int          xmm_space[64];  /* 16*16 bytes for each XMM-reg = 256 bytes */
  	 unsigned int          padding[24];
	};


	- x86_32 - 
	
	struct user_fpregs_struct
	{
  	 long int cwd;
  	 long int swd;
  	 long int twd;
  	 long int fip;
  	 long int fcs;
  	 long int foo;
  	 long int fos;
  	 long int st_space [20];
	};


	

	******************
	***   .stack   ***
	******************

This section contains the stack segment for the process.
In multi-threaded processes there will be a separate stack
for each thread. Currently these extra stacks are only accessible
via the program headers of an ecfs-file, but in the near future
additional sections that will be named '.stack.<tid>' will be
added, as it is on the design road map.




	*****************
	***   .heap   ***
	*****************

This section contains the heap segment for the process. In Linux
the glibc heap is allocated by the sbrk() syscall at runtime.




	****************
	***   .bss   ***
	****************

This section is not new, but we are including it here because the
.bss section in an ecfs file is not SHT_NOBITS, in other words it
actually is allocated and initialized and contains the .bss area
from the actual memory image.




	*****************
	***   .vdso   ***
	*****************

This section contains the [vdso] page from the process




	*********************
	***   .vsyscall   ***
	*********************

This section contains the [vsyscall] page from the process

	

	*******************************
	*** SHT_SHLIB type sections ***
	*******************************

Sections that are marked as type SHLIB are sections that contain
a segment from a shared library. The segments are marked accordingly

Here is an example of the dynamic linker which is a shared library
itself and mapped into every dynamically linked program.

  [334] ld-2.19.so.text   SHLIB           f54d9000 14316000 020000 00   A  0   0  8
  [335] ld-2.19.so.relro  SHLIB           f54f9000 14317000 001000 00   A  0   0  8
  [336] ld-2.19.so.data   SHLIB           f54fa000 14318000 001000 00   A  0   0  8

We can see that the text segment is mapped in and is 0x20000 bytes.
We can see that the relro page is mapped in, which is the first 4K
of the data segment, marked as read-only as a security mechanism.
And we can see that the data segment itself is mapped in.

In some cases there are more than one data segment, in which case
you would see a number appended to 'data'. i.e. 'libc.so.6.data.1'.
	
	
	**********************************
	*** SHT_INJECTED type sections ***
	**********************************

If the ecfs heuristics are enabled, then certain heuristics are ran
when analyzing the process and creating the ecfs-core file. One of these
heuristics is the ability to be able to tell if a shared library has 
been injected; that is to say not loaded by the dynamic linker, and not
loaded by a legitimate dlopen() call, and not loaded by LD_PRELOAD.
This indicates that an attacker has injected the shared library, probably
through __libc_dlopen_mode() or through mmap(). In this case the shared
library gets marked as SHT_INJECTED instead of SHT_SHLIB.
file.  One of these




	Example of parsing an ecfs file to detect DLL injection and
	PLT/GOT hooks. This example was taken from POC||GTFO 0x7.
	
	NOTE: The ecfs heuristics '-h' must be enabled for the auto-DLL detection
	to have worked.

#include "../include/libecfs.h"

int main(int argc, char **argv)
{
	ecfs_elf_t *desc;
	ecfs_sym_t *dsyms;
	char *progname;
	int i;
	char *libname;
	long evil_addr;
	
	if (argc < 2) {
		printf("Usage: %s <ecfs_file>\n", argv[0]);
		exit(0);
	}
	
	/*
	 * Load the ecfs file
	 */
	desc = load_ecfs_file(argv[1]);
	
	/*
	 * Get the executable path
	 */
	progname = get_exe_path(desc);
	
	printf("Performing analysis on '%s' which corresponds to executable: %s\n", argv[1], progname);

	/*
	 * Look to see if we see any INJECTED shared libraries
 	 */
	for (i = 0; i < desc->ehdr->e_shnum; i++) {
		if (desc->shdr[i].sh_type == SHT_INJECTED) {
			libname = strdup(&desc->shstrtab[desc->shdr[i].sh_name]);
			printf("[!] Found malicously injected shared library: %s\n", libname);
		}
	}

	/*
 	 * Get the PLT/GOT info which will show us the
	 * GOT values, and will show us what values we
	 * should expect to see there.
	 */
	pltgot_info_t *pltgot;
        int ret = get_pltgot_info(desc, &pltgot);
	for (i = 0; i < ret; i++) {
		if (pltgot[i].got_entry_va != pltgot[i].shl_entry_va && pltgot[i].got_entry_va != pltgot[i].plt_entry_va)
			printf("[!] Found PLT/GOT hook, function 'name' is pointing at %lx instead of %lx\n", 
				pltgot[i].got_entry_va, evil_addr = pltgot[i].shl_entry_va);
	}
	/*
 	 * Get the dynamic symbols so we can lookup
	 * which function was hooked in the GOT.
	 */
	ret = get_dynamic_symbols(desc, &dsyms);
        for (i = 0; i < ret; i++) {
                if (dsyms[i].symval == evil_addr) {
                        printf("[!] %lx corresponds to hijacked function: %s\n", dsyms[i].symval, &dsyms[i].strtab[dsyms[i].nameoffset]);
                        break;
                }
        }

}


-= More documentation coming soon!


