Replies: 5 comments 3 replies
-
William P. Findlay, in his A Practical, Lightweight, and Flexible Confinement Framework in eBPF research (August, 2021), summarises what we are facing: "BPFBox supports a limited globbing syntax when defining pathnames, allowing multiple rules matching similar files to be combined into one. Although filesystem rules are specified using pathnames, BPFBox internally uses inode and device numbers rather than the pathnames themselves. When loading policies, BPFBox automatically resolves the provided pathnames into their respective inode-device number pairs. This information is then used to look up the correct policy whenever a sandboxed application attempts to access an inode. Since BPFBox does not check the pathnames themselves when referring to files, it is able to defeat TOCTTOU (Time of Check to Time of Use) attacks, where an attacker quickly swaps out one file with a link to another in an attempt to circumvent access control restrictions in a privileged (most often setuid) binary [15]. In such a situation, BPFBox would simply see a different inode and deny access. [...] 8.2.1. Semantic Issues in the Policy Language It is challenging to refer to files from eBPF. In the kernel, files are generally uniquely described by an inode structure, which in turn maps to one or more pathnames via a file structure. Each inode belongs to a distinct filesystem and is uniquely enumerated within that filesystem by an inode number. In BPFBox and BPFContain, we uniquely identify inodes using a combination of their inode number and the unique device identifier of the filesystem on which the inode resides. While this is an effective technique for runtime monitoring, things begin to fall apart when dealing with a user-facing data store, such as a policy map. While the kernel refers to files by their inodes within a filesystem, users do not. For the most part, userspace does not deal in inode-level semantics - instead, we deal in pathnames, a string that describes the path required to move from the filesystem root to a given file. Indeed, the BPFBox and BPFContain policy languages use pathnames rather than inodes to refer to files. Unfortunately, this creates an undesired dichotomy between the user-facing components of BPFBox and BPFContain and the kernelspace implementation. To resolve this dichotomy, we translate the pathnames into inode and device pairs at policy load-time. This is a workaround and is subject to several fundamental limitations. In particular, referring to a pathname that doesn’t yet exist becomes difficult, as inode numbers do not yet exist; inodes that are deleted or freshly created at runtime must be treated as special cases, dynamically updating the policy as required; finally, globbing pathnames can result in an explosion in the size of maps storing file rules, as each globbed file is translated into a unique inode-device pair. To resolve these issues, it would be ideal if we could refer to pathnames directly from BPF programs. In particular, a design using this capability might resolve pathnames within eBPF programs and define a finite state machine to match globbing rules over the pathname. Unfortunately, current support for pathname resolution in eBPF is primitive. Difficulties arise due to a few fundamental limitations imposed by the verifier and the eBPF runtime: 1. The verifier imposes a hard limit of 512 bytes of stack space for each BPF program. This makes it unrealistic to store strings on the stack, instead requiring that a buffer be allocated in the heap. In the context of BPF, this can only be done using a dummy map as a scratch buffer. 2. The verifier also imposes restrictions on how eBPF programs can loop and how these loops can access map data. Specifically, loops must provably terminate and any array access within a loop must be appropriately bounded by a fixed constant (to ensure no buffer overflows or similar issues). In practice, enforcing these restrictions is difficult, and the verifier errs on the side caution when reasoning about a loop is unclear. This can result in safe programs that manipulate long strings being erroneously rejected. 3. While helper functions can get around such restrictions, the current ecosystem for string manipulation helpers in eBPF is immature. For instance, Linux 5.10 added a bpf_d_path() helper to extract pathnames from a kernel directory entry. However, this helper is only available for sleepable BPF programs, since allocating a buffer for the string can result in a page fault. Support for sleepable BPF is very new and has not had a chance to mature; currently, sleepable programs are restricted to a small subset of LSM programs. Aside from pathname resolution, no other string helpers currently exist, although they have been on the radar of eBPF developers for some time. Although the current state of the eBPF ecosystem makes it impossible for BPFBox and BPFContain to directly use pathname-based enforcement in the kernel, this will not necessarily be the case in the future. eBPF is in active development, and each subsequent kernel version adds new features and capabilities. For instance, the kernel community is currently working on a generic solution for sleepable BPF that will greatly expand the number of programs that can handle page faults. When this support arrives, it is likely that working with strings will become much easier. |
Beta Was this translation helpful? Give feedback.
-
This works in userspace: https://godbolt.org/z/MYGvq8Kjc But doesn't work in bpf land (starting with 5 R0=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=inv3 R2_w=inv1 R3_w=inv12884901888 R4=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R5_w=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R6=inv2896 R7=fp-448 R8=fp-16 R9=fp-288 R10=fp0 fp-16=mmmmmmmm fp-32=????mmmm fp-40=mmmmmmmm fp-48=mmmmmmmm fp-56=mmmmmmmm fp-64=mmmmmmmm fp-72=mmmmmmmm fp-80=mmmmmmmm fp-88=mmmmmmmm fp-96=mmmmmmmm fp-104=mmmmmmmm fp-112=mmmmmmmm fp-120=mmmmmmmm fp-128=mmmmmmmm fp-136=mmmmmmmm fp-144=mmmmmmmm fp-152=mmmmmmmm fp-160=mmmmmmmm fp-168=mmmmmmmm fp-176=mmmmmmmm fp-184=mmmmmmmm fp-192=mmmmmmmm fp-200=mmmmmmmm fp-208=mmmmmmmm fp-216=mmmmmmmm fp-224=mmmmmmmm fp-232=mmmmmmmm fp-240=mmmmmmmm fp-248=mmmmmmmm fp-256=mmmmmmmm fp-264=mmmmmmmm fp-272=mmmmmmmm fp-280=mmmmmmmm fp-288=mmmmmmmm fp-296=mmmmmmmm fp-304=mmmmmmmm fp-312=mmmmmmmm fp-320=mmmmmmmm fp-328=00000000 fp-336=00000000 fp-344=00000000 fp-352=00000000 fp-360=00000000 fp-368=00000000 fp-376=00000000 fp-384=00000000 fp-392=00000000 fp-400=00000000 fp-408=00000000 fp-416=00000000 fp-424=00000000 fp-432=00000000 fp-440=00000000 fp-448=inv7526405785470460463 fp-456=fp
407: (b7) r1 = 4
408: (18) r3 = 0xffffffff00000000
410: (b7) r2 = 0
411: (bf) r4 = r10
;
412: (07) r4 += -320
; do_logic();
413: (4f) r4 |= r1
last_idx 413 first_idx 407
regs=2 stack=0 before 412: (07) r4 += -320
regs=2 stack=0 before 411: (bf) r4 = r10
regs=2 stack=0 before 410: (b7) r2 = 0
regs=2 stack=0 before 408: (18) r3 = 0xffffffff00000000
regs=2 stack=0 before 407: (b7) r1 = 4
R4 bitwise operator |= on pointer prohibited
processed 3752 insns (limit 1000000) max_states_per_insn 4 total_states 51 peak_states 51 mark_read 44
libbpf: -- END LOG --
libbpf: failed to load program 'ka_ea_sched_process_exec'
libbpf: failed to load object '/home/vagrant/KubeArmor/KubeArmor/BPF/objs/ka_ea_process.bpf.o'
BPF C: static bool
match(const char *pat, const char *str)
{
if (!pat || !str)
return false;
int str_track = -1;
int pat_track = -1;
int i = 0;
int j = 0;
#define do_logic() \
if (i >= MAX_FILENAME_LEN || str[i] == '\0') \
goto loop; \
if (j >= MAX_PATTERN_LEN) \
return false; \
if (pat[j] == '\0' && str[i] == '\0') \
return true; \
if (pat[j] == '*') { \
str_track = i; \
pat_track = j; \
j++; \
} else if (pat[j] != '?' && pat[j] != str[i]) { \
if (pat_track == -1) \
return false; \
str_track++; \
i = str_track; \
j = pat_track; \
} else { \
i++; \
j++; \
}
do_logic();
do_logic();
do_logic();
do_logic();
do_logic();
loop:
while (j < MAX_PATTERN_LEN - 1 && pat[j] == '*')
j++;
return pat[j] == '\0';
} BPF assembly: 0000000000000a80 <LBB0_44>:
336: 0f 67 00 00 00 00 00 00 r7 += r6
337: bf a8 00 00 00 00 00 00 r8 = r10
338: 07 08 00 00 f0 ff ff ff r8 += -16
; return BPF_CORE_READ(task, nsproxy, mnt_ns, ns).inum;
339: bf 81 00 00 00 00 00 00 r1 = r8
340: b7 02 00 00 08 00 00 00 r2 = 8
341: bf 73 00 00 00 00 00 00 r3 = r7
342: 85 00 00 00 71 00 00 00 call 113
343: b7 01 00 00 18 00 00 00 r1 = 24
; return BPF_CORE_READ(task, nsproxy, mnt_ns, ns).inum;
344: 79 a3 f0 ff 00 00 00 00 r3 = *(u64 *)(r10 - 16)
345: 0f 13 00 00 00 00 00 00 r3 += r1
; return BPF_CORE_READ(task, nsproxy, mnt_ns, ns).inum;
346: bf 81 00 00 00 00 00 00 r1 = r8
347: b7 02 00 00 08 00 00 00 r2 = 8
348: 85 00 00 00 71 00 00 00 call 113
349: b7 01 00 00 00 00 00 00 r1 = 0
; return BPF_CORE_READ(task, nsproxy, mnt_ns, ns).inum;
350: 79 a3 f0 ff 00 00 00 00 r3 = *(u64 *)(r10 - 16)
351: 0f 13 00 00 00 00 00 00 r3 += r1
352: bf a7 00 00 00 00 00 00 r7 = r10
353: 07 07 00 00 40 fe ff ff r7 += -448
; return BPF_CORE_READ(task, nsproxy, mnt_ns, ns).inum;
354: bf 71 00 00 00 00 00 00 r1 = r7
355: b7 02 00 00 18 00 00 00 r2 = 24
356: 85 00 00 00 71 00 00 00 call 113
357: 61 71 10 00 00 00 00 00 r1 = *(u32 *)(r7 + 16)
; ctask->mnt_ns = task_get_mnt_ns(task);
358: 63 1a c4 fe 00 00 00 00 *(u32 *)(r10 - 316) = r1
; u64 id = bpf_get_current_pid_tgid();
359: 85 00 00 00 0e 00 00 00 call 14
; ctask->tid = (u32) id;
360: 63 0a cc fe 00 00 00 00 *(u32 *)(r10 - 308) = r0
; ctask->pid = id >> 32;
361: 77 00 00 00 20 00 00 00 r0 >>= 32
362: 63 0a c8 fe 00 00 00 00 *(u32 *)(r10 - 312) = r0
363: 18 01 00 00 2f 62 69 6e 00 00 00 00 2f 2a 73 68 r1 = 7526405785470460463 ll
; char pattern[128] = "/bin/*sh";
365: 7b 1a 40 fe 00 00 00 00 *(u64 *)(r10 - 448) = r1
366: b7 01 00 00 00 00 00 00 r1 = 0
367: 7b 1a b8 fe 00 00 00 00 *(u64 *)(r10 - 328) = r1
368: 7b 1a b0 fe 00 00 00 00 *(u64 *)(r10 - 336) = r1
369: 7b 1a a8 fe 00 00 00 00 *(u64 *)(r10 - 344) = r1
370: 7b 1a a0 fe 00 00 00 00 *(u64 *)(r10 - 352) = r1
371: 7b 1a 98 fe 00 00 00 00 *(u64 *)(r10 - 360) = r1
372: 7b 1a 90 fe 00 00 00 00 *(u64 *)(r10 - 368) = r1
373: 7b 1a 88 fe 00 00 00 00 *(u64 *)(r10 - 376) = r1
374: 7b 1a 80 fe 00 00 00 00 *(u64 *)(r10 - 384) = r1
375: 7b 1a 78 fe 00 00 00 00 *(u64 *)(r10 - 392) = r1
376: 7b 1a 70 fe 00 00 00 00 *(u64 *)(r10 - 400) = r1
377: 7b 1a 68 fe 00 00 00 00 *(u64 *)(r10 - 408) = r1
378: 7b 1a 60 fe 00 00 00 00 *(u64 *)(r10 - 416) = r1
379: 7b 1a 58 fe 00 00 00 00 *(u64 *)(r10 - 424) = r1
380: 7b 1a 50 fe 00 00 00 00 *(u64 *)(r10 - 432) = r1
381: 7b 1a 48 fe 00 00 00 00 *(u64 *)(r10 - 440) = r1
; do_logic();
382: 71 a2 e0 fe 00 00 00 00 r2 = *(u8 *)(r10 - 288)
383: 15 02 30 00 00 00 00 00 if r2 == 0 goto +48 <LBB0_63>
384: 55 02 52 00 2f 00 00 00 if r2 != 47 goto +82 <LBB0_69>
385: 18 01 00 00 00 00 00 00 00 00 00 00 01 00 00 00 r1 = 4294967296 ll
; do_logic();
387: 71 a2 e1 fe 00 00 00 00 r2 = *(u8 *)(r10 - 287)
388: 15 02 2b 00 00 00 00 00 if r2 == 0 goto +43 <LBB0_63>
389: 55 02 4d 00 62 00 00 00 if r2 != 98 goto +77 <LBB0_69>
390: 18 01 00 00 00 00 00 00 00 00 00 00 02 00 00 00 r1 = 8589934592 ll
; do_logic();
392: 71 a2 e2 fe 00 00 00 00 r2 = *(u8 *)(r10 - 286)
393: 15 02 26 00 00 00 00 00 if r2 == 0 goto +38 <LBB0_63>
394: 55 02 48 00 69 00 00 00 if r2 != 105 goto +72 <LBB0_69>
395: 18 01 00 00 00 00 00 00 00 00 00 00 03 00 00 00 r1 = 12884901888 ll
; do_logic();
397: 71 a4 e3 fe 00 00 00 00 r4 = *(u8 *)(r10 - 285)
398: 15 04 21 00 00 00 00 00 if r4 == 0 goto +33 <LBB0_63>
399: b7 01 00 00 03 00 00 00 r1 = 3
400: 18 03 00 00 00 00 00 00 00 00 00 00 03 00 00 00 r3 = 12884901888 ll
402: b7 02 00 00 01 00 00 00 r2 = 1
; do_logic();
403: 71 a5 43 fe 00 00 00 00 r5 = *(u8 *)(r10 - 445)
404: 15 05 06 00 2a 00 00 00 if r5 == 42 goto +6 <LBB0_55>
405: 15 05 01 00 3f 00 00 00 if r5 == 63 goto +1 <LBB0_54>
406: 5d 45 3c 00 00 00 00 00 if r5 != r4 goto +60 <LBB0_69>
0000000000000cb8 <LBB0_54>:
407: b7 01 00 00 04 00 00 00 r1 = 4
408: 18 03 00 00 00 00 00 00 00 00 00 00 ff ff ff ff r3 = -4294967296 ll
410: b7 02 00 00 00 00 00 00 r2 = 0
0000000000000cd8 <LBB0_55>:
411: bf a4 00 00 00 00 00 00 r4 = r10
412: 07 04 00 00 c0 fe ff ff r4 += -320
; do_logic();
413: 4f 14 00 00 00 00 00 00 r4 |= r1
414: 18 01 00 00 00 00 00 00 00 00 00 00 04 00 00 00 r1 = 17179869184 ll
416: 71 44 20 00 00 00 00 00 r4 = *(u8 *)(r4 + 32)
417: 15 04 0e 00 00 00 00 00 if r4 == 0 goto +14 <LBB0_63>
418: 18 01 00 00 00 00 00 00 00 00 00 00 05 00 00 00 r1 = 21474836480 ll
; do_logic();
420: 71 a5 44 fe 00 00 00 00 r5 = *(u8 *)(r10 - 444)
421: 15 05 0a 00 2a 00 00 00 if r5 == 42 goto +10 <LBB0_63>
422: 15 05 09 00 3f 00 00 00 if r5 == 63 goto +9 <LBB0_63>
423: 18 01 00 00 00 00 00 00 00 00 00 00 05 00 00 00 r1 = 21474836480 ll
; do_logic();
425: 1d 45 01 00 00 00 00 00 if r5 == r4 goto +1 <LBB0_60>
426: bf 31 00 00 00 00 00 00 r1 = r3
0000000000000d58 <LBB0_60>:
427: b7 03 00 00 01 00 00 00 r3 = 1
; do_logic();
428: 1d 45 01 00 00 00 00 00 if r5 == r4 goto +1 <LBB0_62>
429: b7 03 00 00 00 00 00 00 r3 = 0
0000000000000d70 <LBB0_62>:
; do_logic();
430: 4f 23 00 00 00 00 00 00 r3 |= r2
431: 55 03 23 00 01 00 00 00 if r3 != 1 goto +35 <LBB0_69>
0000000000000d80 <LBB0_63>:
432: 18 03 00 00 00 00 00 00 00 00 00 00 ff ff ff ff r3 = -4294967296 ll
; while (j < MAX_PATTERN_LEN - 1 && pat[j] == '*')
434: bf 12 00 00 00 00 00 00 r2 = r1
435: 5f 32 00 00 00 00 00 00 r2 &= r3
436: c7 01 00 00 20 00 00 00 r1 s>>= 32
437: 18 03 00 00 00 00 00 00 00 00 00 00 01 00 00 00 r3 = 4294967296 ll
439: b7 04 00 00 fe 00 00 00 r4 = 254
440: b7 00 00 00 ff 00 00 00 r0 = 255
0000000000000dc8 <LBB0_64>:
441: bf 15 00 00 00 00 00 00 r5 = r1
442: bf a1 00 00 00 00 00 00 r1 = r10
443: 07 01 00 00 40 fe ff ff r1 += -448
; while (j < MAX_PATTERN_LEN - 1 && pat[j] == '*')
444: 0f 51 00 00 00 00 00 00 r1 += r5
445: 71 11 00 00 00 00 00 00 r1 = *(u8 *)(r1 + 0)
446: 55 01 05 00 2a 00 00 00 if r1 != 42 goto +5 <LBB0_66>
447: 0f 32 00 00 00 00 00 00 r2 += r3
; j++;
448: bf 51 00 00 00 00 00 00 r1 = r5
449: 07 01 00 00 01 00 00 00 r1 += 1
; while (j < MAX_PATTERN_LEN - 1 && pat[j] == '*')
450: 6d 54 f6 ff 00 00 00 00 if r4 s> r5 goto -10 <LBB0_64>
451: 05 00 02 00 00 00 00 00 goto +2 <LBB0_67>
0000000000000e20 <LBB0_66>:
; return pat[j] == '\0';
452: c7 02 00 00 20 00 00 00 r2 s>>= 32
453: bf 20 00 00 00 00 00 00 r0 = r2
0000000000000e30 <LBB0_67>:
454: bf a1 00 00 00 00 00 00 r1 = r10
; return pat[j] == '\0';
455: 07 01 00 00 40 fe ff ff r1 += -448
456: 0f 01 00 00 00 00 00 00 r1 += r0
457: 71 11 00 00 00 00 00 00 r1 = *(u8 *)(r1 + 0)
; if (r)
458: 55 01 08 00 00 00 00 00 if r1 != 0 goto +8 <LBB0_69>
459: bf a3 00 00 00 00 00 00 r3 = r10
; bpf_printk("pattern: %s - filename: %s - result: %d", pattern,
460: 07 03 00 00 40 fe ff ff r3 += -448
461: 18 01 00 00 80 00 00 00 00 00 00 00 00 00 00 00 r1 = 128 ll
463: b7 02 00 00 28 00 00 00 r2 = 40
464: bf 94 00 00 00 00 00 00 r4 = r9
465: b7 05 00 00 01 00 00 00 r5 = 1
466: 85 00 00 00 06 00 00 00 call 6
|
Beta Was this translation helpful? Give feedback.
-
@geyslan, lot of curious things in your journey through the ebpf pattern matching… I took time trying to understand the problem and would like to share few notes: 1- Curiously, the problem with the bad code generation (which uses bitwise on pointers) seems to disappear when we define the function 2- I tried to “simplify” as much as I could the logic of the function to allow more iterations but I only achieved 12 iterations for a modified C version, and 14 iterations for a inline assembly version (both shown bellow): Unfortunately, with 14 iterations, we gonna need around 16 tails calls for anything relevant. #define __noinline __attribute__((noinline))
static inline bool
__match_pattern(const char **ppat, const char **pstr, const char **pat_track)
{
if (**ppat == '*') {
(*ppat)++;
*pat_track = *ppat;
} else if ((**ppat != '?') && (**ppat != **pstr)) {
if (*pat_track == NULL)
return false;
if (*ppat == *pat_track)
(*pstr)++;
else
*ppat = *pat_track;
} else {
(*pstr)++;
(*ppat)++;
}
return true;
}
static inline bool
__match_trailing_star(const char *pat_begin, const char *pat_cur)
{
while (pat_cur < (pat_begin + MAX_PATTERN_LEN - 1)) {
if (*pat_cur != '*')
break;
pat_cur++;
}
return (*pat_cur == '\0');
}
#define __do_match_round() \
do { \
if (!__match_pattern(&pp, &sp, &pat_track)) \
return false; \
if (pp >= (pat + MAX_PATTERN_LEN)) \
return false; \
if (sp >= (str + MAX_FILENAME_LEN)) \
return false; \
if (*sp == '\0') { \
if (*pp == '\0') { \
return true; \
} \
\
goto Ltrailing_star; \
} \
} while(0)
static bool __noinline
match(const char *pat, const char *str)
{
const char *pp = pat;
const char *sp = str;
const char *pat_track = NULL;
if (!pat || !str)
return false;
__do_match_round();
__do_match_round();
__do_match_round();
__do_match_round();
__do_match_round();
__do_match_round();
__do_match_round();
__do_match_round();
__do_match_round();
__do_match_round();
__do_match_round();
__do_match_round();
/* inconclusive (need more rounds) */
return false;
Ltrailing_star:
return __match_trailing_star(pat, pp);
} #define __noinline __attribute__((noinline))
static bool __noinline
match(const char *pat, const char *str)
{
if (!pat || !str)
return false;
/* r3 = pat_track */
asm volatile("r3 = 0");
asm volatile("r0 = 0");
/* define bounds */
asm volatile("r8 = r1");
asm volatile("r8 += %[max_pat]": : [max_pat]"i"(MAX_PATTERN_LEN));
asm volatile("r9 = r2");
asm volatile("r9 += %[max_str]": : [max_str]"i"(MAX_FILENAME_LEN));
asm volatile("r4 = *(u8 *)(r1 +0)");
asm volatile("r5 = *(u8 *)(r2 +0)");
#define __match_round() \
asm volatile("if r4 != 42 goto +3"); \
asm volatile("r1 += 1"); \
asm volatile("r3 = r1"); \
asm volatile("goto +8"); \
\
asm volatile("if r4 == 63 goto +5"); \
asm volatile("if r4 == r5 goto +4"); \
asm goto("if r3 == 0 goto %[Lret_false]" : : : : Lret_false); \
\
asm volatile("if r1 == r3 goto +3"); \
asm volatile("r1 = r3"); \
asm volatile("goto +2"); \
\
asm volatile("r1 += 1"); \
asm volatile("r2 += 1"); \
\
asm goto("if r1 >= r8 goto %[Lret_false]" : : : : Lret_false); \
asm goto("if r2 >= r9 goto %[Lret_false]" : : : : Lret_false); \
\
asm volatile("r4 = *(u8 *)(r1 +0)"); \
asm volatile("r5 = *(u8 *)(r2 +0)"); \
\
asm volatile("if r5 != 0 goto +2"); \
asm goto("if r4 == 0 goto %[Lret_true]" : : : : Lret_true); \
asm goto("goto %[Lmatch_trailing_start]" : : : : Lmatch_trailing_start)
__match_round(); __match_round(); __match_round(); __match_round();
__match_round(); __match_round(); __match_round(); __match_round();
__match_round(); __match_round(); __match_round(); __match_round();
__match_round(); __match_round();
Lret_false:
asm volatile("r0 = 0");
asm volatile("exit");
Lret_true:
asm volatile("r0 = 1");
asm volatile("exit");
Lmatch_trailing_start:
asm volatile("r0 = 1");
asm volatile("r8 += -1");
asm volatile("if r4 != 42 goto +4");
asm volatile("r1 += 1");
asm volatile("if r1 >= r8 goto +2");
asm volatile("r4 = *(u8 *)(r1 +0)");
asm volatile("goto -5");
asm volatile("if r4 == 0 goto +1");
asm volatile("r0 = 0");
asm volatile("exit");
/* clang trick */
return true;
} |
Beta Was this translation helpful? Give feedback.
-
@nyrahul @nam-jaehyun Take a look at https://lwn.net/Articles/877062/ 👍 |
Beta Was this translation helpful? Give feedback.
-
In user space, the current use of hashing (jenkins) converts the full filename into a u32 hash key when applying an audit policy by saving this key in ka_ea_filename_map. In ebpf land, when an execution event is triggered, this key is generated again and fetched in ka_ea_filename_map to check if the execution should be audited. This approach requires the definition of full filenames within the policies, that is, if someone wanted to audit /bin/sh, /bin/dash and /bin/bash, the three full filenames should be defined in the policy.
When trying to use wildcards within the ebpf code to match the processes to be audited, we are kind of transporting some of the business logic into the kernel. The use of these wildcards makes it practical to create policies, since if one wants to audit the binaries /bin/sh, /bin/dash and /bin/bash, it would be enough to define a single pattern as /bin/*sh.
However, a simple wildcard matching logic, which uses only * (star) and ? (question mark) as wildcards, converted to use bounded loops and accepted by the verifier, exceeds the 1 million instruction limit. This is the biggest issue being faced at the moment. It has been tried to split the program, using tailcalls, and modify the algorithm without much success.
More information about this attempt:
8c9e547
Version of match() working with a pattern of maximum 7 bytes: https://github.com/kubearmor/KubeArmor/blob/8c9e54746dd58afef18fbf24f59004f91c0b2ba0/KubeArmor/BPF/ka_ea_process.bpf.c
Version of match() modified to work with a pattern with a maximum length of 9 bytes:
9 bytes match() pattern
So the question that started this discussion is:
How to get around the blocking of this complex algorithm (in the view of ebpf verifier) making the use of wildcard patterns possible?
Beta Was this translation helpful? Give feedback.
All reactions