WEBVTT

00:00.000 --> 00:10.000
Please be quiet, we need to start now, thank you.

00:10.000 --> 00:15.000
Quiet, please.

00:15.000 --> 00:17.000
All right, next time we've got access,

00:17.000 --> 00:19.000
going to be talking to us about reducing continually

00:19.000 --> 00:24.000
major sizes with EBPF and Podman.

00:24.000 --> 00:29.000
Hello everyone, so my name is Access Fennany.

00:29.000 --> 00:31.000
I'm also French and you're right for that.

00:31.000 --> 00:34.000
And today I'm going to talk about how can you tackle

00:34.000 --> 00:37.000
the problem of bloated images.

00:37.000 --> 00:42.000
And we will see how Podman and EBPF can be used

00:42.000 --> 00:45.000
to target this problem.

00:45.000 --> 00:51.000
So, why would you see today why you should

00:51.000 --> 00:56.000
reduce your container images, how Podman and the OCI

00:56.000 --> 01:00.000
or container initiative, spec and be used.

01:00.000 --> 01:04.000
EBPF and containers have it can work together.

01:04.000 --> 01:08.000
And we'll do a demo because I think that

01:08.000 --> 01:11.000
showing my code running is probably way clearer

01:11.000 --> 01:15.000
on the slide that I'm going to show.

01:15.000 --> 01:20.000
So depending on the context and what you are doing

01:20.000 --> 01:24.000
with your images, on surprise context,

01:24.000 --> 01:27.000
usually you want to reduce your container size

01:27.000 --> 01:29.000
just to avoid CVs.

01:29.000 --> 01:33.000
You don't want to be called at 2 a.m. in the morning

01:33.000 --> 01:37.000
because there is one image tag with 10 CVs

01:37.000 --> 01:39.000
whenever it is.

01:39.000 --> 01:43.000
And especially sometimes your images tag that

01:43.000 --> 01:46.000
this is just shared library that is not even used.

01:46.000 --> 01:50.000
But how can you determine what's being used and what's

01:50.000 --> 01:53.000
not inside your image?

01:54.000 --> 01:57.000
There is obviously the network bandwidth.

01:57.000 --> 02:00.000
The bigger is your image, more network

02:00.000 --> 02:02.000
it will be, it will use.

02:02.000 --> 02:04.000
And also the starting time.

02:04.000 --> 02:06.000
Five megabytes image will obviously start

02:06.000 --> 02:09.000
faster than the five gigabyte one.

02:09.000 --> 02:13.000
So how do you determine what is going to be

02:13.000 --> 02:15.000
using your container?

02:15.000 --> 02:18.000
But so pretty tricky problem and static analysis

02:18.000 --> 02:20.000
may work in some languages.

02:20.000 --> 02:24.000
But when you have a very big image with different

02:24.000 --> 02:27.000
component, iterative range of binaries,

02:27.000 --> 02:29.000
utility stuff, different programming language,

02:29.000 --> 02:33.000
you may want to go into runtime analysis.

02:33.000 --> 02:37.000
But yeah, there is different approach

02:37.000 --> 02:39.000
but I tried for this problem because it's something

02:39.000 --> 02:42.000
that I was thinking about pretty long time.

02:42.000 --> 02:46.000
I tried first to create my own five system

02:46.000 --> 02:49.000
of whose five stemming user space trying to intercept

02:49.000 --> 02:52.000
every file which is open that this has

02:52.000 --> 02:54.000
way too much overhead and performance.

02:54.000 --> 02:59.000
But it does not make my application work here.

02:59.000 --> 03:03.000
But I recently came across like a few months ago

03:03.000 --> 03:06.000
an article from the Long-time Wentberg and Dan Wars.

03:06.000 --> 03:10.000
They main idea was how can you limit

03:10.000 --> 03:13.000
a container system called access.

03:13.000 --> 03:17.000
So the occur when they first created the containers.

03:17.000 --> 03:21.000
They limit the number of system call that the

03:21.000 --> 03:24.000
processes inside your container can call.

03:24.000 --> 03:28.000
They created a subset of something like 240 system

03:28.000 --> 03:30.000
calls, but that's still a lot.

03:30.000 --> 03:33.000
And most applications don't need that much.

03:33.000 --> 03:36.000
So they made a tool combining

03:36.000 --> 03:39.000
the same way.

03:39.000 --> 03:43.000
So they created a code called the IPPF and

03:43.000 --> 03:45.000
Podman to be able to attract

03:45.000 --> 03:49.000
every system call that the container is doing.

03:49.000 --> 03:53.000
And then, later on in production, you can use this list

03:53.000 --> 03:56.000
to restrict the container and if it's called

03:56.000 --> 03:59.000
remote execution and attacker, if they try to use

03:59.000 --> 04:04.000
the system call but it's not in the list, it's just

04:04.000 --> 04:08.000
to do the same for file access.

04:08.000 --> 04:12.000
So Podman, so Podman is just a container

04:12.000 --> 04:15.000
on giant, very similar to Docker, almost a

04:15.000 --> 04:16.000
drop in replacement.

04:16.000 --> 04:19.000
You can run Podman, run Podman, pull everything.

04:19.000 --> 04:23.000
And they implement the open container initiative

04:23.000 --> 04:27.000
spec which is spec to define how containers

04:27.000 --> 04:29.000
should work.

04:29.000 --> 04:30.000
And why Podman?

04:30.000 --> 04:33.000
Because I'm working at that and I'm working on

04:33.000 --> 04:36.000
Podman desktop, so I would have

04:36.000 --> 04:39.000
issue if I had not used that.

04:39.000 --> 04:42.000
So you want to, to be able to do your

04:42.000 --> 04:43.000
runtime analysis.

04:43.000 --> 04:46.000
You want to be able to be called,

04:46.000 --> 04:49.000
which was your program, which is going to run the

04:49.000 --> 04:52.000
profiting called before the container is running.

04:52.000 --> 04:54.000
So you can use the pre-start,

04:54.000 --> 04:59.000
how hooks in containers are a bit tricky.

05:00.000 --> 05:03.000
So you don't want to have your container running,

05:03.000 --> 05:06.000
when this is used, when you want to

05:06.000 --> 05:09.000
container, you call my binary in a synchronous

05:09.000 --> 05:12.000
manner until I return, you do not start the

05:12.000 --> 05:13.000
container.

05:13.000 --> 05:15.000
Because you don't want to have your container running,

05:15.000 --> 05:16.000
do you want to run it?

05:16.000 --> 05:21.000
Because you would probably miss some loading,

05:21.000 --> 05:26.000
live-reason.

05:26.000 --> 05:30.000
over things. And for example, you can define very simple

05:30.000 --> 05:36.000
hook and your hook, your binary, we'll have some information,

05:36.000 --> 05:42.000
which is the PID of your container. And the annotation value

05:42.000 --> 05:48.000
that you said that is required. But containers,

05:48.000 --> 05:52.000
you can create some processes, you can have tons of

05:52.000 --> 05:56.000
thing happening in it. So how do you

05:56.000 --> 06:00.000
provide everything that's happening inside one container,

06:00.000 --> 06:04.000
you check the mountain space, in the Linux world for

06:04.000 --> 06:08.000
containers, for isolation, the online is usually

06:08.000 --> 06:12.000
created mountain space, this mountain space as an ID,

06:12.000 --> 06:16.000
and every process inside your container, without

06:16.000 --> 06:18.000
privileged or without not go outside the container,

06:18.000 --> 06:24.000
we'll have the same mountain space. So with this ID,

06:24.000 --> 06:28.000
you now have a way to identify process, but you want to be

06:28.000 --> 06:32.000
able to capture everything that is happening in

06:32.000 --> 06:38.000
the performance wise manner. So EPPF is the solution. So

06:38.000 --> 06:40.000
to go quickly, there is a room at first

06:40.000 --> 06:44.000
them on EPPF, it's a very big subject, very nice.

06:44.000 --> 06:48.000
It's allow you, it allows you in the big line, to run code

06:48.000 --> 06:52.000
in a privileged manner inside the Linux kernel. You don't need to

06:52.000 --> 06:56.000
recompile your kernel to run some custom logic. Why it's

06:56.000 --> 07:00.000
important? Because it's very, you can hook in

07:00.000 --> 07:04.000
any place in most of places inside your Linux kernel, and you can

07:04.000 --> 07:08.000
access the strict internal data structure in

07:08.000 --> 07:12.000
some way of it. So you should do that because it's very efficient,

07:12.000 --> 07:16.000
there is almost no overhead if you do your EPPF program

07:16.000 --> 07:20.000
pretty nicely. And it allows you a lot of

07:20.000 --> 07:24.000
flexibility. So there is tons of EPPF

07:24.000 --> 07:28.000
program, there is for everything, you can hook into

07:28.000 --> 07:30.000
system course, you can hook into five system, you can

07:30.000 --> 07:34.000
hook into drivers, but there is one which is very interesting,

07:34.000 --> 07:38.000
is that you can hook in through a Linux security module, and the specific

07:38.000 --> 07:42.000
one is the file open. I tried first to hook into a

07:42.000 --> 07:46.000
system called open, but some new application now are using open

07:46.000 --> 07:48.000
hat, and there is, when you use exact, it's not

07:48.000 --> 07:52.000
mandatory opening some five. So it was very hard to

07:52.000 --> 07:56.000
be sure that everything that is read, opened, accessed,

07:56.000 --> 08:02.000
execute, I want to catch this information. So this one, this Linux

08:02.000 --> 08:06.000
security module, file open hook, which is a list of

08:06.000 --> 08:10.000
applications that you can attach your EPPF program to, is what

08:10.000 --> 08:16.000
has been used. So now you have your, you have

08:16.000 --> 08:20.000
podman, before starting the container, it's called your binary.

08:20.000 --> 08:24.000
Your binary will be able to load your EPPF program.

08:24.000 --> 08:28.000
You, just in your binary, you will be able to

08:28.000 --> 08:32.000
identify the mountain space, so you now have a way to determine

08:32.000 --> 08:36.000
your processes from within your container, and inside the

08:36.000 --> 08:40.000
EPPF program, every time a file is open, it's an

08:40.000 --> 08:44.000
event, it's a task, and in this task, you can access

08:44.000 --> 08:50.000
the mountain space. So now every time a file is open on everything

08:50.000 --> 08:54.000
on your system, you will get a pull, you can filter

08:54.000 --> 08:56.000
of a pull saying, okay, this is not relevant to my

08:56.000 --> 09:00.000
problem, and then you can just send back your

09:00.000 --> 09:04.000
information, so you receive an event, you see your

09:04.000 --> 09:08.000
file, this file is from within the container, and

09:08.000 --> 09:12.000
EPPF allows you to define some map that has

09:12.000 --> 09:16.000
structured, communicate between the kernel space, and you

09:16.000 --> 09:20.000
just trim the data, and in your user program, because you

09:20.000 --> 09:24.000
use the annotation, the nice value, you just put an

09:24.000 --> 09:28.000
absolute path where you want this data to be, to be

09:28.000 --> 09:36.000
different, so next time,

09:36.000 --> 09:42.000
okay, can I, okay, I may need to

09:46.000 --> 09:52.000
just let me roll.

09:52.000 --> 09:56.000
Yeah, we're a bit there, so

09:56.000 --> 09:58.000
you could, I don't have, yeah, thanks

09:58.000 --> 10:00.000
you, I don't have much time, so in an

10:00.000 --> 10:04.000
idea world, you want to have your production container,

10:04.000 --> 10:06.000
and at least you want to produce

10:06.000 --> 10:08.000
production like use case, because let's say you have

10:08.000 --> 10:10.000
two end points, one end point is reading a

10:10.000 --> 10:12.000
config file, the over is not, if you only test

10:12.000 --> 10:16.000
the one that is not opening, or

10:16.000 --> 10:18.000
fully covering your use case, you will just get data, which

10:18.000 --> 10:22.000
are not relevant, or not representative of what's

10:22.000 --> 10:24.000
happening inside the container, so you could say, okay,

10:24.000 --> 10:28.000
we could use it in the CI, you could use it with your

10:28.000 --> 10:32.000
end-to-end test to at least have an idea.

10:32.000 --> 10:38.000
So, for example, we could run, we could run just for

10:38.000 --> 10:40.000
the demonstration, we have the federal

10:40.000 --> 10:44.000
image, which is utility-based image, just like

10:44.000 --> 10:48.000
Ubuntu, but it does, like, tons of binaries, there is a lot of things

10:48.000 --> 10:52.000
in it, but it's expected, because it's a utility-based image.

10:52.000 --> 10:56.000
What I want to do is, here, no overhead, I'm not using

10:56.000 --> 11:00.000
my annotation, so nothing is happening, I'm just going to

11:00.000 --> 11:08.000
copy paste the annotation.

11:08.000 --> 11:14.000
So, now I'm just doing the same, but I'm adding the annotation,

11:14.000 --> 11:20.000
and the path where I want the content to be, to be

11:20.000 --> 11:24.000
doing the same, let's use some of the binary that we

11:24.000 --> 11:28.000
have in it, so we can use dates, I can use

11:28.000 --> 11:36.000
rep, I can use cat, we can see the profile,

11:36.000 --> 11:40.000
what types could we do, we can use some binary

11:40.000 --> 11:44.000
here, but I think that's all, we can see now in our

11:44.000 --> 11:48.000
disk folder that we have a profiling file that has been created,

11:48.000 --> 11:52.000
and it's not really nice to represent, but I

11:52.000 --> 11:56.000
made a quick UI tool, which allow you

11:56.000 --> 12:02.000
to take this file, and what it does, it just

12:02.000 --> 12:06.000
go to the pan manager's tree, dump the image,

12:06.000 --> 12:10.000
check all the layers, and create a tree structure, just

12:10.000 --> 12:14.000
file tree of your system, and it combined it with the

12:14.000 --> 12:18.000
done file that you got, and it just tells you if

12:18.000 --> 12:22.000
a file has been opened or not, and tells you how many

12:22.000 --> 12:24.000
percentage of the content is used or not.

12:24.000 --> 12:28.000
The percentage are using the file size, not the number of

12:28.000 --> 12:32.000
files, so for very small file you will not see, but

12:32.000 --> 12:34.000
all of this could be configured. So, let's go into

12:34.000 --> 12:38.000
our bin folder, and we can see here that we are

12:38.000 --> 12:42.000
all in all of the binaries, the batch one is obviously used,

12:42.000 --> 12:46.000
because it's our entry point, so we use the cat to

12:46.000 --> 12:50.000
see the profile, we see the date, we see

12:50.000 --> 12:54.000
zero colors, okay, that's the used, I probably

12:54.000 --> 12:56.000
for something, and yeah, over things like

12:56.000 --> 13:00.000
reps, so with this method you can just see everything that has

13:00.000 --> 13:02.000
been used in your container, and when everything which has

13:02.000 --> 13:06.000
not been used, here also we can't do

13:06.000 --> 13:14.000
it, we can see it, one slide, why is it empty?

13:14.000 --> 13:20.000
Next, this operation is not spotted, is it?

13:20.000 --> 13:24.000
But the next slide, the next slide is a

13:24.000 --> 13:30.000
thing, so probably this thing, just, okay, it's not working,

13:30.000 --> 13:32.000
so let's name.

13:54.000 --> 13:58.000
Hello, thank you for that presentation.

13:58.000 --> 14:02.000
Sure, is this tool already integrated in

14:02.000 --> 14:06.000
format, sorry, I don't know.

14:06.000 --> 14:10.000
Hello, it does, just, it's just very,

14:10.000 --> 14:16.000
is this tool already integrated in the

14:16.000 --> 14:20.000
Fedora image or import map?

14:20.000 --> 14:24.000
I don't know, I'm sorry, I can't, I don't see.

14:24.000 --> 14:26.000
Is it already integrated in the Fedora image or import map?

14:26.000 --> 14:32.000
Oh no, no, this is just a thing that I work on my

14:32.000 --> 14:36.000
free time, I linked the repository, and you need to

14:36.000 --> 14:38.000
install it yourself.

14:50.000 --> 14:52.000
Any other question?

14:56.000 --> 15:00.000
Thank you.

