PyEBPF — eBPF proxy routines generation and Python callbacks (iovisor/bcc wrapper)Danny ShemeshBlockedUnblockFollowFollowingJan 19Hey folks !A couple of weeks ago, I’ve stumbled upon Brendan Gregg’s BPF tracing tools diagram (shown below) in the context of a small read-up I’ve done on performance engineering.
BPF tracing tools taken from the iovisor/bcc repositoryLooking at the diagram above, and shamefully realizing I’m not familiar with the vast majority of the tools within, I’ve decided to explore the available utilities alongside the infrastructure that enabled their creation.
And so, today I’m going to talk a bit about eBPF (extended-berkeley-packet-filter), bcc (BPF-compiler-collection) and how both of those could be used with Python more easily.
A quick intro to eBPFYou might be familiar with BPF filters as a tool of filtering packets, a common example would be using a BPF filter in tcpdump in order to filter incoming or outgoing network traffic.
Filtering incoming SSH traffic via tcpdumpeBPF (extended-BPF) is an enhancement to BPF, which allows one to trace and filter much more than just packets; For example, eBPF could be used in order to trace all SYSENTER operations to a specific syscall, say, open(2).
Code needs to be compiled to the BPF instruction set, which in turn, is loaded and ran by the kernel’s BPF virtual machine.
Using eBPF usually involves the following steps:You compile a source code to BPF byte-code (BPF is ran in a dedicated Virtual-Machine, and has its own instruction set) using a suitable compiler (e.
g.
LLVM)Your user mode program requests the kernel to store the compiled instructions, via the bpf(2) syscall with the BPF_PROG_LOAD command and a program type, which determines the type of our BPF module (e.
g.
BPF_PROG_TYPE_SOCKET_FILTER for packet filtering, BPF_PROG_TYPE_KPROBE for kernel probes)The kernel will now run a static-analyzer (bpf_check, source available here) and verify the control-flow-graph of the code is safe for use (The BPF VM is quite strict, most I/O operations are forbidden, infinite loops are detected, there could be up to BPF_MAXINSNS [2¹² under v.
4.
20], etc…)If the static checks above pass, the bpf(2) syscall results in a file-descriptor, which the user-mode program may operate againstIn addition to the above, one may create and use a predefined set of BPF data-structures in order to communicate between BPF routines, and between your routine and user-mode programLastly, we may operate against our BPF module; For example, we may attach a kernel probe to it, so our routine would be called before or after a specific syscall is calledA simplified version of the process above could be depicted as follows:Loading and operating with an eBPF kernel probeA few words about BCCWhen searching for eBPF related resources, a toolchain that kept popping up was BCC (The BPF-Compiler-Collection).
BCC is a very neat wrapper (with Python & Lua native bindings) for BPF code generation (using LLVM) and kernel metric collection.
When you work with BCC, you pass your source (as a file or buffer) to the library, which in turn enhances it, compiles it and wraps the relevant file-descriptors for use.
Looking at Python’s bcc library, there are a few things to note:The bcc library exposes a BPF Python object that operates against a native library — libbccIn turn, libbcc uses libbpf and bpf_module, where bpf_module deals with code generation via LLVM and libbpf handles program loading, program-attachment and BPF data-structure managementCode is generated via LLVM on the fly, and your eBPF routines are usually written as a string in Python, and passed to BCC (and internally to LLVM) for compilationBCC has quite an extensive code base with various convenient wrappers and lots of useful c-macros for generating and using BPF data structures.
Let’s look at an example that traces all bind(2) syscalls, and prints the pid, process name and socket-fd of the process which called bind()from threading import Threadfrom socket import socketimport ctypes as ctimport timefrom bcc import BPFprog = '''#include <linux/sched.
h>/* This is a BCC macro for that creates a BPF data-structure that we can communicate with in user-mode */BPF_PERF_OUTPUT(events);// The following is the data-structure we'll pass to our user-land programstruct data_t { char process_name[TASK_COMM_LEN]; // Process name u32 pid; // Process ID int socket_fd; // Bound Socket FD};/* This is our BPF routine, it contains two arguments:- A pt_regs* struct, which contains the BPF VM registers- A socket fd – this will actually be transformed by bcc to a local variable that is set by the registers, see note below*/int on_bind(struct pt_regs* ctx, int sockfd) { struct data_t data = {}; // A bpf helper that gets the process name that invoked the bind operation bpf_get_current_comm(&data.
process_name, sizeof(data.
process_name)); // Gets the pid via the bpf helper (pid is the upper 32 bits) data.
pid = (u32) (bpf_get_current_pid_tgid() >> 32); data.
socket_fd = sockfd; // Copies the data to the BPF structure, it is now available to user-mode events.
perf_submit(ctx, &data, sizeof(data)); return 0;}'''# Compiles the BPF program via LLVMb = BPF(text=prog)# Represents the native data-structure aboveclass Data(ct.
Structure): _fields_ = [ ('process_name', ct.
c_char * 16), ('pid', ct.
c_uint32), ('socket_fd', ct.
c_int32) ]# Prints headerprint 'COMM PID SOCKETFD'# A callback to be called for every record in the 'events' BPF data structuredef print_event(cpu, data, size): data = ct.
cast(data, ct.
POINTER(Data)).
contents print '{process_name} {pid} {socket_fd}'.
format(process_name=data.
process_name, pid=data.
pid, socket_fd=data.
socket_fd)# This calls libbpf, which in turns calls the bpf(2) syscall, and does a few more tricks to attach the kernel probeb.
attach_kprobe(event=b.
get_syscall_fnname('bind'), fn_name='on_bind')# An async function that binds to localhost:31337 (To get an output for the above)def call_bind_async(): time.
sleep(2) print 'Calling bind.
' s = socket() s.
bind(('localhost', 31337))t = Thread(target=call_bind_async)t.
start()# This will open the BPF data structure for pollingb['events'].
open_perf_buffer(print_event)while True: try: # Poll the data structure till Ctrl+C b.
perf_buffer_poll() except KeyboardInterrupt: print 'Bye !' breakRunning the example above yields the following output:Output from trace_bind.
pyTo reveal some of the pre-processing BCC brings to table, we can pass the debug=DEBUG_PREPROCESSOR flag to BPF’s constructor.
trace_bind.
py with the debug=DEBUG_PREPROCESSOR flagAs we can see, there are a few interesting things to note here:BCC loads an in-memory filesystem mapped to the /virtual directoryWe’ve got additional #ifdefs at the top, preventing us from setting the BPF_LICENSE macro and conditionally defining the CONFIG_CC_STACKPROTECTOR macroThe BPF_PERF_OUTPUT macro was not expanded, but can be observed hereOur routine has been annotated to be put in its own section (This is not mandatory for loading BPF routines, but is used later by bcc itself)Our extra parameter was stripped from our method, and it now resides on stack and copied from the di registerOur use of events was replaced with a call to bpf_perf_event_output that uses bpf_pseudo_fd (Note that fd 3 is the first fd to be created after stdin, out, err)There is an additional footer included — this footer defines the BPF_LICENSE macro as “GPL” (relevant code) [Note that BCC’s license is actually Apache v2.
0, this is only a parameters that is later passed on to the bpf(2) syscall]Now that we’ve dipped our toe in BCC’s waters, let’s continue in explaining what is PyEBPF and why it was created.
PyEBPF — Yet Another WrapperOne aspect that felt a bit daunting when working with BCC’s python library was the fact that even with supposedly trivial examples (Like trace_bind above), there had to be written some boilerplate that deals with sharing data between our routine and our user-mode application.
I thought it would be fun, for educational purposes, to try and automate the process a bit, so one could have the ability to write simple tracing routines, without writing extra native code.
PyEBPF provides a simple wrapper that helps you attach kernel probes that are attached to any syscall without writing a single line of native code.
It does so by a few key steps:For a given kprobe_attach request, it inspects the syscall’s arguments, by trying to parse the /sys/kernel/debug/tracing/events/syscalls/sys_enter_<syscall>/format fileThen, it generates an appropriate BPF routine and inject the syscall arguments to itA native data structure is generated, it will hold the syscall arguments, along with the pid, tid, current time in ns, uid, gid and invoking process nameA ctypes data structure that represents the native one is generated as wellA BPF map will be injected as well, to copy the information from our routine back to user modeFinally, the wrapper will spawn a daemon thread that polls on our BPF map, casts the data object to our ctypes structure, and invokes a user-passed python function with itThus, our trace_bind example above, transforms to the following:from threading import Threadfrom socket import socketimport timefrom pyebpf.
ebpf_wrapper import EBPFWrapper# Note that there is no native code passing hereb = EBPFWrapper()# Prints headerprint 'COMM PID SOCKETFD'# A callback to be called by pyebpfdef on_bind(cpu, data, size): print '{process_name} {pid} {socket_fd}'.
format(process_name=data.
process_name, pid=data.
process_id, socket_fd=data.
fd)# Note that we pass a function objectb.
attach_kprobe(event=b.
get_syscall_fnname('bind'), fn=on_bind)# An async function that binds to localhost:31337 (To get an output for the above)def call_bind_async(): time.
sleep(2) print 'Calling bind.
' s = socket() s.
bind(('localhost', 31337))t = Thread(target=call_bind_async)t.
start()while True: try: time.
sleep(1) except KeyboardInterrupt: print 'Bye !' breakNote how in the example above, we’ve basically halved the amount of code we needed to write.
ConclusionPyEBPF is an tool written for educational purposes to lessen the burden of writing simple BPF routines.
Complex examples may, more often than not, use helpers and routines that are only available in kernel mode — and it would most definitely not fit those purposes.
However, if you’re just starting around with tracing syscalls, and you don’t mind having all of the control flow in user-mode, feel free to use and improve the library further (For example, it does not cover USDT’s, kretprobes, tracepoints, and most other very-cool functionality eBPF offers).
PyEBPF is available under an MIT license, you can find it’s source code here and you may install it via pip:$> pip install pyebpfNote: You need to install BCC separately, please refer to this guide if you haven’t done so already.
Hope you enjoyed the reading !.. More details