WEBVTT

00:00.000 --> 00:06.000
Alex, are you ready?

00:06.000 --> 00:07.000
Yeah, I'm ready.

00:07.000 --> 00:08.000
OK.

00:08.000 --> 00:11.000
Our next talk is by Alexa.

00:11.000 --> 00:13.000
It's right.

00:13.000 --> 00:16.000
And it's about half safety in the trenches.

00:16.000 --> 00:17.000
Yes.

00:17.000 --> 00:19.000
So hello, everyone.

00:19.000 --> 00:21.000
It's lovely to be back at Falston.

00:21.000 --> 00:24.000
I haven't been for quite a few years, because I'm stuck on the walls.

00:24.000 --> 00:27.000
Pinocola, you may know, called Australia.

00:27.000 --> 00:31.400
But yeah, I'll be talking quickly about, relatively quickly, about some work we did in

00:31.400 --> 00:32.400
on C recently.

00:32.400 --> 00:36.600
But it affects a lot of projects, but it's kind of, particularly, an ACN4

00:36.600 --> 00:37.600
currency.

00:37.600 --> 00:40.600
And I'm also at this new company that we've learned, called Amidall.

00:40.600 --> 00:42.600
So run C, I'm sure all of you know, what run C is.

00:42.600 --> 00:45.000
This is going to be a very, very quick battle for these slides.

00:45.000 --> 00:47.600
So run C, as you may sure you all know, is sort of like the entry point

00:47.600 --> 00:51.600
to doing most operations that are capable of specific on the next.

00:51.600 --> 00:56.200
And basically the entire OCA system runs to run C or run C like programs.

00:56.200 --> 00:58.200
And run C has this, you know, CI.

00:58.200 --> 01:03.000
We have this specification, which defines this particular JSON structure, which you specify

01:03.000 --> 01:04.600
what kind of container options you like.

01:04.600 --> 01:06.400
I'm sure all of you are saying it's before.

01:06.400 --> 01:09.400
And with the JSON file, you do run C, run and you can run containers.

01:09.400 --> 01:10.000
All of a reason.

01:10.000 --> 01:12.600
So the thing I'm going to talk about today is path safety.

01:12.600 --> 01:15.720
And so I suspect most of you have actually already run into what I call regular

01:15.720 --> 01:19.200
path safety in this talk with some interesting bits.

01:19.200 --> 01:22.000
But the bit about street path safety is something which I've been working.

01:22.000 --> 01:25.200
I've been working for quite a few years and is one something that is not obvious

01:25.200 --> 01:26.200
when you first run into it.

01:26.200 --> 01:28.800
And it's very, very frustrating when you have to fix bugs related to it.

01:28.800 --> 01:31.800
So regular path safety is like the very classic time of check time.

01:31.800 --> 01:34.800
I've used a tack that you have with any kind of file system operation.

01:34.800 --> 01:35.800
Paths are just strings.

01:35.800 --> 01:38.800
So when you operate on a path, the path can get swapped out from underneath you.

01:38.800 --> 01:39.800
This is the classic.

01:39.800 --> 01:43.000
I want to sanitize the path problem and you can't sanitize paths because it is

01:43.000 --> 01:44.000
not as the path.

01:44.000 --> 01:46.200
So I'm going to change it after you sanitize it.

01:46.200 --> 01:49.600
And so like the classic example, there are lots of classic examples you can have.

01:49.600 --> 01:54.200
But you know, you have some slash root of first, which is owned by an untrusted process.

01:54.200 --> 01:56.000
They can mess around with any of the files in there.

01:56.000 --> 01:58.800
So if you have an administrative process on the host that is like a container

01:58.800 --> 02:00.600
on time, that is trying to operate on these things.

02:00.600 --> 02:03.200
You can get tricky and opening random files that doesn't expect.

02:03.200 --> 02:06.200
And I want to point out a lot of people think this own-o-follow thing protects you.

02:06.200 --> 02:09.400
It doesn't own-o-follow is extremely limited in whatever texture against.

02:09.400 --> 02:12.400
And in a particular in this case, if Etsy is a similar to somewhere on the host,

02:12.400 --> 02:16.000
it doesn't matter if host is not a similar because own-o-follow only stops host

02:16.000 --> 02:18.200
from being a similar, not anything else.

02:18.200 --> 02:22.600
And basically, almost all of the classic VFS APIs have this problem.

02:22.600 --> 02:27.400
So you would think this problem has existed since you existed before.

02:27.400 --> 02:30.000
You know, this is a problem that doesn't happen very often.

02:30.000 --> 02:34.600
So this is a list of CVs, which are in the ones that I could think of when I was making this slide.

02:34.600 --> 02:38.200
You could easily make five slides or six slides that are just lists of CVs or this kind of problem.

02:38.200 --> 02:41.600
It is an incredibly, incredibly common problem even today.

02:41.600 --> 02:45.600
And so this is really not a new problem, but we still keep running into it.

02:45.600 --> 02:49.600
So in 2019, I worked on this thing called Open Add 2, which is a system called

02:49.600 --> 02:50.600
on Linux.

02:50.600 --> 02:51.600
It's no longer new.

02:51.600 --> 02:53.600
It's quite old now.

02:53.600 --> 02:56.600
But what this is allowed you to do is in addition to the regular Open Add 2 phases,

02:56.600 --> 02:59.600
which lets you operate on a file repository, like directory files purpose.

02:59.600 --> 03:05.600
You also have this resolution flag thing, so the result thing, which lets you control

03:05.600 --> 03:07.600
resolution.

03:07.600 --> 03:12.600
And so what this allows is it allows you to control aspects of how the entire path is

03:12.600 --> 03:13.600
resolved.

03:13.600 --> 03:18.600
And so I would say the vast majority of programs and Linux should be using at least one of these options.

03:18.600 --> 03:23.600
So either you want to use Resolve InRoot, which is a kind of like CHRoot, but it only applies

03:23.600 --> 03:27.600
for that particular Open Call, and it adds some additional protection to CHRoot doesn't have.

03:27.600 --> 03:32.600
Resolve Beneath is kind of like CHRoot, except if you try to jump out of the route you get

03:32.600 --> 03:35.600
an error rather than it being scope, like a CHRoot or scoped.

03:35.600 --> 03:38.600
This is a free PSD kind of API that we, that actually found enough.

03:38.600 --> 03:42.600
We, I took inspiration from the previous API, and then they took inspiration from the Linux API,

03:42.600 --> 03:43.600
for this.

03:43.600 --> 03:44.600
But yeah.

03:44.600 --> 03:48.600
And then Resolve No Simlings is basically better owner-follow, so it actually blocks

03:48.600 --> 03:50.600
Simlings for any path component.

03:50.600 --> 03:53.600
You can still escape with absolute parts, which is why you want to do more than that.

03:53.600 --> 03:57.600
But these are incredibly useful things, and I find myself using them all the time now.

03:57.600 --> 04:01.600
The one thing though is that OpenAt only solve one problem, which is opening files.

04:01.600 --> 04:05.600
Now, in terms of this, it's actually sufficient for most programs, because Linux has these,

04:05.600 --> 04:11.600
and not just Linux, they can cause X2, has these app system calls like

04:11.600 --> 04:15.600
make, make, make, direct, and, you know, link at and so on and so on,

04:15.600 --> 04:17.600
which lets you operate relative to a file descriptor.

04:17.600 --> 04:21.600
And so, as long as programs are written to primarily operate on file descriptors,

04:21.600 --> 04:25.600
not paths, you can actually protect against these issues, basically entirely.

04:25.600 --> 04:30.600
So, here's the example, one of the examples from before, but using the OpenAt2 and the path

04:30.600 --> 04:31.600
based off.

04:31.600 --> 04:36.600
So, for Open Operations, you just, rather than using, so you open the root of the first.

04:36.600 --> 04:39.600
This is a trusted direct, it's directly constructed, because it's in,

04:39.600 --> 04:43.600
the whole, it's wrong by the host, and then you use Resolve in root,

04:43.600 --> 04:45.600
and then you just use regular OpenFlags, like you would have anything else,

04:45.600 --> 04:48.600
and this protects you against these kinds of race attacks.

04:48.600 --> 04:51.600
The more complicated case, this is just one example.

04:51.600 --> 04:53.600
The more complicated cases, if you wanted something like make, dear,

04:53.600 --> 04:57.600
because make did doesn't take results, what you need to do is you need to open the parent directory,

04:57.600 --> 04:59.600
and then underneath it, you just open underneath it.

04:59.600 --> 05:02.600
This is safe, because make their act does not follow trail links and links,

05:02.600 --> 05:05.600
and because there's no slashes in the path, this is safe through.

05:05.600 --> 05:10.600
And you then do this for every single, you know, act system pull,

05:10.600 --> 05:12.600
and every VFS thing you're interested in.

05:12.600 --> 05:16.600
For all the kernels, which when I first started working on this,

05:16.600 --> 05:17.600
this was a more pressing issue.

05:17.600 --> 05:21.600
These days, all the kernels means like, you know, seven-year-old kernel,

05:21.600 --> 05:23.600
which, some of you were running, but less than nowadays.

05:23.600 --> 05:26.600
Though, it also should be noted, because OpenAt2 used this structure thing.

05:26.600 --> 05:30.600
Some systems block it with second system, D blocks it with second for its own usage.

05:30.600 --> 05:34.600
And some people block it with second, because second currently can't filter

05:35.600 --> 05:39.600
arguments, so effectively, they don't trust that we might add a backdoor to it.

05:39.600 --> 05:42.600
Like, not backdoor, but like, a problem with it in the future that they couldn't block.

05:42.600 --> 05:45.600
Anyway, point being that there is this old school way of doing it,

05:45.600 --> 05:49.600
which is rather than using the kernels facility for doing these saved lookups,

05:49.600 --> 05:53.600
you can use Opaf to like emulate lookups in user space.

05:53.600 --> 05:55.600
This is quite finicky and a little bit annoying to get right.

05:55.600 --> 05:57.600
But it is something you can do.

05:57.600 --> 06:00.600
Some systems are done this in the past.

06:00.600 --> 06:04.600
And the key thing is that this is basically swapping out the OpenAt2 bits.

06:04.600 --> 06:07.600
The rest of your program still needs to use Fabricus for things.

06:07.600 --> 06:10.600
And usually you end up having to use this Proxile FD thing,

06:10.600 --> 06:13.600
which lets you, which is this thing called a magic link,

06:13.600 --> 06:14.600
which I'll talk about in the little bit later,

06:14.600 --> 06:17.600
but it depends on this for validating that you are out there as well as you want.

06:17.600 --> 06:21.600
Yeah, system D does this in Chaser, for its own operations,

06:21.600 --> 06:26.600
go last year added OS.ruth, which does this in a more complicated way,

06:26.600 --> 06:29.600
and has a couple of limitations that I'm not a huge fan of.

06:29.600 --> 06:31.600
And the Potheres, which is a library I wrote.

06:31.600 --> 06:34.600
So the Potheres is a rust library I wrote,

06:34.600 --> 06:38.600
which basically provides all of the main file system of file APIs,

06:38.600 --> 06:42.600
but withness, within a root operation stuff.

06:42.600 --> 06:47.600
And it's written in rust, but it has CFFI interface that you can use from going Python.

06:47.600 --> 06:49.600
I had Python bindings.

06:49.600 --> 06:52.600
And now the work I'm talking about here last year,

06:52.600 --> 06:54.600
you may have heard we have some security issues in on C.

06:54.600 --> 06:58.600
And so this work I've been working on for a while,

06:58.600 --> 07:01.600
but you do the security update we couldn't just require

07:01.600 --> 07:03.600
to use rust to fix the problem.

07:03.600 --> 07:07.600
So I ended up porting the code, the code, the code you needed to go.

07:07.600 --> 07:09.600
It's kind of a little bit funny because Potheres,

07:09.600 --> 07:13.600
the Potheres started as a go library that I was going to use from C.

07:13.600 --> 07:15.600
And then I thought, oh, other programs I need it, so I read it

07:15.600 --> 07:18.600
to rust and I read it to go for this Potheres lighting.

07:18.600 --> 07:20.600
So that was one of the way.

07:20.600 --> 07:24.600
And the nice thing is that this library, it supports OpenI2 and OPoff,

07:24.600 --> 07:26.600
and it's what's a bunch of other new kernel features,

07:26.600 --> 07:28.600
but it transparently doesn't use them.

07:28.600 --> 07:30.600
If it's not supported.

07:30.600 --> 07:36.600
So this is the C API example, where it's rather than using OpenI2.

07:36.600 --> 07:39.600
OpenI2, or whatever, you just use OpenRoot to get a root of date.

07:39.600 --> 07:40.600
You can also use it.

07:40.600 --> 07:42.600
It's all file, it's going to basically, you can just use a regular file,

07:42.600 --> 07:43.600
because if you like.

07:43.600 --> 07:45.600
And then yeah, Potheres in root do whatever.

07:45.600 --> 07:47.600
Yeah, it's pretty, pretty easy to use.

07:47.600 --> 07:49.600
And then in rust, I'm not going to go through something

07:49.600 --> 07:50.600
to do it's all because I don't have a time,

07:50.600 --> 07:53.600
but the rust API uses actual structures in rust,

07:53.600 --> 07:55.600
but like the basic API is the exact same idea.

07:55.600 --> 07:58.600
You have a handle, which is wrapped in a structure,

07:58.600 --> 08:01.600
and then you can operate on it individually.

08:01.600 --> 08:03.600
Yeah, if you want more information,

08:03.600 --> 08:05.600
you can look at the docs for the Potheres.

08:05.600 --> 08:06.600
Yeah.

08:06.600 --> 08:08.600
So that was regular Potheres.

08:08.600 --> 08:11.600
If the strict Poth, safety is more annoying.

08:11.600 --> 08:14.600
So with previously, when we're looking at operating on random files,

08:14.600 --> 08:15.600
you don't actually care.

08:15.600 --> 08:18.600
As long as the file you end up operating on is within the root of us,

08:18.600 --> 08:20.600
you don't really care which file it is.

08:20.600 --> 08:22.600
Because an attacker can also swap the files around,

08:22.600 --> 08:25.600
which we're assuming the person has privilege over the entire directory.

08:25.600 --> 08:28.600
So they could replace the file with it of a content that they wanted anyway.

08:28.600 --> 08:29.600
So it doesn't really matter.

08:29.600 --> 08:31.600
As long as the files inside the container, you don't really matter.

08:31.600 --> 08:32.600
It doesn't really matter to you which file you get.

08:32.600 --> 08:33.600
Because you can't trust it.

08:33.600 --> 08:34.600
It wasn't the anyway.

08:34.600 --> 08:36.600
Because the attacker can write 25.

08:36.600 --> 08:40.600
Unfortunately, this is not the case for strict Poth's safety.

08:40.600 --> 08:43.600
So in on Linux, a lot of operations,

08:43.600 --> 08:46.600
go through PROCFS, which is a file system.

08:46.600 --> 08:49.600
And so the problem is with this,

08:49.600 --> 08:52.600
that if you want to operate, for instance,

08:52.600 --> 08:55.600
PROCFS will add our exact as how you specify what LSM labels apply to you.

08:55.600 --> 08:57.600
So what Apple, I'm going to explain to the labels apply to you.

08:57.600 --> 09:01.600
If you're writing to that file, you need to be really, really sure you're writing to that exact file.

09:01.600 --> 09:04.600
You don't write to other PROCFS files, or then you're writing to some dummy file.

09:04.600 --> 09:06.600
That doesn't do anything.

09:06.600 --> 09:09.600
Because if you don't write to the right file, then you won't actually get LSM label applied to you.

09:09.600 --> 09:12.600
And the same thing applies to most PROCFS files.

09:12.600 --> 09:13.600
So yeah.

09:13.600 --> 09:15.600
PROCFS is the most critical one.

09:15.600 --> 09:19.600
And the key thing here is that if someone makes a PROCFS that is fake,

09:19.600 --> 09:21.600
so it's a TEMPFS that has fake files in it,

09:21.600 --> 09:23.600
or they mount on top of PROCFS from somewhere else.

09:23.600 --> 09:25.600
They bind them out a different bit of PROCFS on top of it.

09:25.600 --> 09:26.600
The PROCFS you care about.

09:26.600 --> 09:28.600
You kind of operating on the wrong files.

09:28.600 --> 09:30.600
And it's a bit, it's a big nightmare.

09:30.600 --> 09:33.600
So an example of something which would be a strict Poth unsafety would be,

09:33.600 --> 09:35.600
oh, I just opened PROCFS that are exact.

09:35.600 --> 09:37.600
And I just write to it.

09:37.600 --> 09:38.600
And so on as on.

09:38.600 --> 09:40.600
So actually, every single one of these examples,

09:40.600 --> 09:42.600
could both a security issue if someone have it no amount.

09:43.600 --> 09:46.600
And I'll give it some examples in a little bit later of how that happens.

09:46.600 --> 09:49.600
Now you think surely not that this is like not a real issue,

09:49.600 --> 09:50.600
whatever.

09:50.600 --> 09:52.600
But here are CVs that we had in run C for this.

09:52.600 --> 09:54.600
There was a 21.

09:54.600 --> 09:57.600
And actually this was what initially inspired me to work on the Pothress

09:57.600 --> 09:58.600
and all this other work.

09:58.600 --> 09:59.600
Because we had this issue and then I realized,

09:59.600 --> 10:02.600
oh, this is actually much bigger issue than it seemed at first glance.

10:02.600 --> 10:06.600
And then we had the issue from November last year that we also fixed.

10:06.600 --> 10:11.600
So the core issue here is that now you would think,

10:11.600 --> 10:13.600
okay, what can I do to fix this problem?

10:13.600 --> 10:15.600
What I'll do is I'll open slash PROC.

10:15.600 --> 10:20.600
And because PROCFS provides you the guarantee that the root I know of PROC

10:20.600 --> 10:21.600
has a fixed I know number.

10:21.600 --> 10:22.600
It's one.

10:22.600 --> 10:24.600
And you can check with statuses that it is actually PROCFS.

10:24.600 --> 10:25.600
Okay, fine, that's fine.

10:25.600 --> 10:27.600
I will open slash PROC.

10:27.600 --> 10:28.600
And I will check this.

10:28.600 --> 10:31.600
And then I'll use OpenOut2 to make sure that I go open

10:31.600 --> 10:33.600
that I go to self header exact.

10:33.600 --> 10:36.600
And this result, noxative here is protecting you from Overmout.

10:36.600 --> 10:40.600
So result, noxative rejects going through mount points.

10:40.600 --> 10:44.600
So if you do this for regular files, this is all you need.

10:44.600 --> 10:46.600
So this problem is basically already solved.

10:46.600 --> 10:49.600
Very few programs do this properly, but this is the problem that's already solved.

10:49.600 --> 10:52.600
So if self was mounted over, or add it was mounted over,

10:52.600 --> 10:55.600
or exactly not over, you would reject the operation.

10:55.600 --> 10:57.600
Unfortunately, this is not all you need.

10:57.600 --> 10:59.600
You have these things called magic links.

10:59.600 --> 11:01.600
So magic links are, they look like siblings.

11:01.600 --> 11:03.600
And if you have a laptop in front of you, you can open,

11:03.600 --> 11:04.600
look at PROC self-exy.

11:04.600 --> 11:06.600
And it shows me a link to the binary.

11:06.600 --> 11:08.600
But these are actually not siblings.

11:08.600 --> 11:10.600
Similar links are dealt with lexically.

11:10.600 --> 11:13.600
So in other words, you can imagine, it's as if you take the link,

11:13.600 --> 11:16.600
and you replace it with a content, and then you continue to look up.

11:16.600 --> 11:18.600
That's not how magic links work.

11:18.600 --> 11:21.600
Magic links are a kernel concept, where basically all of them are in PROCFS.

11:21.600 --> 11:24.600
Where when you do the open, rather than being resolved lexically,

11:24.600 --> 11:27.600
the kernel knows what's kernel object this link prefers to.

11:27.600 --> 11:32.600
And it will pipe you through to the underlying object without going through regular path look up.

11:32.600 --> 11:37.600
And this has been very, this is very useful, because it allows you to make quite useful

11:37.600 --> 11:40.600
security hardening for various things.

11:40.600 --> 11:43.600
But I have a hand, it also lets you do container breakouts and stuff like that,

11:43.600 --> 11:46.600
because this stuff does not care about main spaces.

11:46.600 --> 11:49.600
So you can open anything with the magic link.

11:49.600 --> 11:53.600
So to give an example, I want to open, reopen a file,

11:53.600 --> 11:54.600
which can perhaps have open.

11:54.600 --> 11:58.600
So I want to open PROC, Fred, self-FT, once and three.

11:58.600 --> 12:00.600
And you would think, oh, this is totally fine.

12:00.600 --> 12:03.600
I can do it because this is the exact same example as the regular file example before.

12:03.600 --> 12:05.600
But unfortunately, resolve the external block this,

12:05.600 --> 12:08.600
because when you try to open the magic link,

12:08.600 --> 12:11.600
because this is in a different mount, most likely,

12:11.600 --> 12:15.600
then it, because you go for PROCFS, some other mount,

12:15.600 --> 12:18.600
resolve the external blocket, which is doing exactly what you wanted to do.

12:18.600 --> 12:21.600
So what you might then say is, oh, okay, right,

12:21.600 --> 12:24.600
I will open the FD directory, no X-dev,

12:24.600 --> 12:27.600
and then I will validate there's nothing mounted on top of the sibling,

12:27.600 --> 12:30.600
because if you didn't know this, you can actually mount top of siblings and Linux,

12:30.600 --> 12:33.600
which just makes my life very fun.

12:33.600 --> 12:36.600
And so you're like, oh, okay, that's fine.

12:36.600 --> 12:38.600
I'll just validate there's no, there's no of amounts,

12:38.600 --> 12:39.600
I'll just open the file.

12:39.600 --> 12:42.600
Unfortunately, it's valid, no of amounts problem is actually unsafe,

12:42.600 --> 12:44.600
because if you are dealing with an attacker,

12:44.600 --> 12:46.600
they can change the mount table from underneath you,

12:46.600 --> 12:48.600
then you can check it, nothing,

12:48.600 --> 12:51.600
and it's, well, back to classic time, check time with you stacks.

12:51.600 --> 12:53.600
Now, I will say, by the way, if there's example,

12:53.600 --> 12:56.600
since Linux X-12 Christian, I mentioned to convince them to block this,

12:56.600 --> 12:59.600
so you actually can't do this over a mount anymore,

12:59.600 --> 13:02.600
for this particular case, but there are other cases where you can do it.

13:03.600 --> 13:05.600
And besides me, you just solve these fold kernels anyway.

13:05.600 --> 13:10.600
So, the solution comes in the form of the new mount API,

13:10.600 --> 13:13.600
which was merged seven, eight years ago now.

13:13.600 --> 13:16.600
And however, I will say, it has nem pages now.

13:16.600 --> 13:18.600
For those of people who have dealt with using the mount mount API,

13:18.600 --> 13:20.600
you had to like read this kernel sources

13:20.600 --> 13:22.600
to figure out what the API looks like.

13:22.600 --> 13:24.600
Last year, I wrote nem pages for it, so finally,

13:24.600 --> 13:26.600
you can actually read the documentation for how to use this thing.

13:26.600 --> 13:28.600
So hopefully, more people will get to use it.

13:28.600 --> 13:30.600
But the key thing about this is that the new mount API

13:30.600 --> 13:34.600
allows you to create file systems without having to mount them to a bigger directory.

13:34.600 --> 13:37.600
And because it's a file descriptor, nothing else can mess with it.

13:37.600 --> 13:40.600
And actually, there are some nice protections that protect other people from

13:40.600 --> 13:42.600
like propagating mounts into it and so on and so on.

13:42.600 --> 13:45.600
So once you have one of these, once you create a progress handle this way,

13:45.600 --> 13:48.600
nothing else can interfere with it.

13:48.600 --> 13:52.600
So, for this example before, we swap out the first bit with the open proc,

13:52.600 --> 13:55.600
to you create a new progress instance.

13:55.600 --> 13:58.600
This is the simple example in the more complicated example,

13:58.600 --> 14:01.600
you need to have a full back for certain cases,

14:01.600 --> 14:05.600
but this is like the basic example, where you create a proc of a sample,

14:05.600 --> 14:08.600
which is a file descriptor to a directory like anything else.

14:08.600 --> 14:10.600
But the nice thing is I think I'm overmounted.

14:10.600 --> 14:13.600
And now this value, no overmount stuff is actually totally safe through

14:13.600 --> 14:16.600
because nothing can overmount it other than your own process.

14:16.600 --> 14:19.600
And so, this is the way you would do it.

14:19.600 --> 14:22.600
And there are other things you have to do that are also kind of annoying,

14:22.600 --> 14:25.600
but I made an API for this in Lepathras, so in Lepathras,

14:25.600 --> 14:27.600
if you just say I want to open this proc of a file,

14:27.600 --> 14:29.600
it will do all this awful stuff for you,

14:29.600 --> 14:32.600
and you just get to open a proc of as the way you like.

14:32.600 --> 14:34.600
And it works on new end old kernels,

14:34.600 --> 14:36.600
and it works with additional protections for a bunch of other stuff

14:36.600 --> 14:38.600
that I don't have time to get into.

14:38.600 --> 14:41.600
And yeah, it also has a Rust API, which again,

14:41.600 --> 14:44.600
I'm not going to need to know, but yeah, it's kind of really similar idea.

14:44.600 --> 14:46.600
The one thing to notice is that,

14:46.600 --> 14:49.600
yeah, you can make a proc of a handle that you can cache

14:49.600 --> 14:52.600
for some usage and so on and so on.

14:52.600 --> 14:54.600
Well, yes, so now some of you are probably thinking,

14:54.600 --> 14:56.600
and I've been asked this every time I give a talk to talk to another,

14:56.600 --> 14:57.600
this is like hang on.

14:57.600 --> 14:58.600
Why do I care about any of this, right?

14:58.600 --> 15:00.600
Like only root can mount, like why on earth,

15:00.600 --> 15:02.600
would you care about someone mounting a top of a proc?

15:02.600 --> 15:04.600
Well, unfortunately we're dealing with runc here.

15:04.600 --> 15:07.600
And one of the cool problems we have with runc is that runc

15:07.600 --> 15:09.600
actually does not currently have a threat model.

15:09.600 --> 15:12.600
So, one of the problems we run into is that Kubernetes,

15:12.600 --> 15:15.600
and Docker, and community, and podmen, etc., etc.,

15:15.600 --> 15:18.600
they all like to provide users with lots of flexibility

15:18.600 --> 15:19.600
but what they can configure.

15:19.600 --> 15:21.600
And because we don't have a fixed threat model,

15:21.600 --> 15:23.600
sometimes they allow users to configure things you wouldn't,

15:23.600 --> 15:24.600
you wish they wouldn't configure.

15:24.600 --> 15:25.600
And that's includes mounts,

15:25.600 --> 15:27.600
because if you do a volume configuration for a container,

15:27.600 --> 15:28.600
you've just configured a mount.

15:28.600 --> 15:31.600
And so every single time, and we have to provide volumes to people.

15:31.600 --> 15:34.600
So, this means that mounts are very much in the picture

15:34.600 --> 15:36.600
of things that attack a control.

15:36.600 --> 15:40.600
So, and also some people allow people to run containers,

15:40.600 --> 15:42.600
you know, I'm probably going to be able to run containers

15:42.600 --> 15:44.600
in their systems because they assume that the container

15:44.600 --> 15:45.600
on time will block malicious things,

15:45.600 --> 15:47.600
so we have to protect against these things.

15:47.600 --> 15:48.600
This is something I want to work on,

15:48.600 --> 15:50.600
and I've been speaking with some runc containers about doing

15:50.600 --> 15:51.600
a threat model.

15:51.600 --> 15:53.600
But at the moment there's no proper threat model,

15:53.600 --> 15:55.600
and everything will vulnerability we have to do

15:55.600 --> 15:56.600
on a case by case basis.

15:56.600 --> 15:58.600
And the key thing to actually notice that,

15:58.600 --> 16:00.600
because of this, a lot of vulnerabilities in the past

16:00.600 --> 16:02.600
decade with runc, most of them have been

16:02.600 --> 16:04.600
basically misconfiguration bugs.

16:04.600 --> 16:06.600
But the problem is that is that a regular user

16:06.600 --> 16:10.600
on a system using Docker, or community, or podmen,

16:10.600 --> 16:12.600
or Kubernetes could misconfigure that way.

16:12.600 --> 16:14.600
So, even though it was strictly speaking,

16:14.600 --> 16:17.600
misconfiguration, we have to treat a security bug.

16:17.600 --> 16:19.600
And one thing that really helps me

16:19.600 --> 16:21.600
is that people still don't use this name spaces.

16:21.600 --> 16:23.600
However, Kubernetes, I want to thank Rodrigo.

16:23.600 --> 16:25.600
Kubernetes, in the next release of Kubernetes,

16:25.600 --> 16:27.600
we'll have use name spaces like fully supportive

16:27.600 --> 16:28.600
thing for containers.

16:28.600 --> 16:30.600
So, hopefully soon, we will live in a world

16:30.600 --> 16:32.600
who will ever use these name spaces

16:32.600 --> 16:34.600
because it's crazy to be not used as an engineer

16:34.600 --> 16:36.600
as containers, like it's completely insecure.

16:36.600 --> 16:38.600
Anyway, I'm going to do very quick pop quiz.

16:38.600 --> 16:39.600
This will have to be lightning run

16:39.600 --> 16:40.600
because we're running out of time.

16:40.600 --> 16:42.600
So, I want you to put your hands up.

16:42.600 --> 16:43.600
Okay, how about this?

16:43.600 --> 16:45.600
Put your hands up if you think that one

16:45.600 --> 16:48.600
about to show you is or is not a vulnerability.

16:48.600 --> 16:49.600
No, is a vulnerability.

16:49.600 --> 16:51.600
So, is this a vulnerability?

16:51.600 --> 16:54.600
Where I disable all the name spaces for a container?

16:54.600 --> 16:55.600
This is the easy one.

16:55.600 --> 16:56.600
The answer is no.

16:56.600 --> 16:57.600
Because if you disable all security options,

16:57.600 --> 16:58.600
obviously it's not a vulnerability.

16:58.600 --> 17:02.600
Okay, I bind mount slash to my host file system

17:02.600 --> 17:03.600
into the container.

17:03.600 --> 17:04.600
Is this a vulnerability?

17:04.600 --> 17:06.600
Yeah, some people say yes?

17:06.600 --> 17:08.600
Okay, so the thing is that if Docker allows you,

17:08.600 --> 17:11.600
so the problem is that for some people,

17:11.600 --> 17:12.600
I mean, this is a bad idea.

17:12.600 --> 17:14.600
I'm not saying this is a good thing to do, right?

17:14.600 --> 17:17.600
But the problem is that this is a management decision

17:17.600 --> 17:19.600
to make, or Kubernetes makes about what you can do.

17:19.600 --> 17:21.600
From runcase perspective, this is just a misconfiguration.

17:21.600 --> 17:23.600
Like, you can take any amount you like.

17:23.600 --> 17:25.600
I can't tell us runcase what permissions you have,

17:25.600 --> 17:27.600
what are the dimensions you have in Kubernetes,

17:27.600 --> 17:28.600
or what are the dimensions you have in Kubernetes.

17:28.600 --> 17:31.600
So, from runcase perspective, this is not a vulnerability.

17:31.600 --> 17:32.600
Okay, how about this?

17:32.600 --> 17:34.600
I have a bind mount to some sub-director.

17:34.600 --> 17:36.600
This is a vulnerability.

17:36.600 --> 17:38.600
It is a vulnerability.

17:38.600 --> 17:41.600
And in fact, this is the two CDs we had in November.

17:41.600 --> 17:43.600
That what I'm not telling you is that,

17:43.600 --> 17:45.600
so, I just make sure we clear what's happening here

17:45.600 --> 17:47.600
and that I'm buying something some directory

17:47.600 --> 17:49.600
into some path slash vulnerability.

17:49.600 --> 17:50.600
How can that be a vulnerability you ask?

17:50.600 --> 17:51.600
Well, the problem is that volume is actually

17:51.600 --> 17:53.600
a sibling to slash dev.

17:53.600 --> 17:55.600
And there, you actually have two processes.

17:55.600 --> 17:57.600
You have one process that is supporting this.

17:57.600 --> 18:00.600
And then another process is now creating fun files

18:00.600 --> 18:02.600
in some volume path.

18:02.600 --> 18:04.600
For instance, you make dev null,

18:04.600 --> 18:06.600
or dev PTSD, a zero, a sibling

18:06.600 --> 18:09.600
to process core kernel pattern.

18:09.600 --> 18:11.600
And what happens is that in runcase,

18:11.600 --> 18:13.600
we have to mask certain things in progress

18:13.600 --> 18:14.600
that people don't use this namespace.

18:14.600 --> 18:16.600
So, you need to mask them and we mask them.

18:16.600 --> 18:19.600
So, the first example, we mask them with dev null.

18:19.600 --> 18:21.600
So, we bind now dev null on top of the thing.

18:21.600 --> 18:23.600
But we don't use dev null from the host,

18:23.600 --> 18:24.600
because there are security issues

18:24.600 --> 18:26.600
with taking photos taken from the host.

18:26.600 --> 18:27.600
So, we use dev null on the container.

18:27.600 --> 18:28.600
We have done that in the container.

18:28.600 --> 18:30.600
It is a similar to proxy kernel core pattern.

18:30.600 --> 18:33.600
I've now bind mounted proxy kernel core pattern on top of

18:33.600 --> 18:34.600
the files you like.

18:34.600 --> 18:36.600
And proxy kernel core pattern lets you break out of containers

18:36.600 --> 18:38.600
because what you can do is you can specify

18:38.600 --> 18:41.600
a core dump helper, which is a binary,

18:41.600 --> 18:43.600
that the kernel will then spawn

18:43.600 --> 18:46.600
with full privileges on the host completely unnamed space.

18:46.600 --> 18:49.600
And so, the similar problem applies to dev console

18:49.600 --> 18:50.600
and dev PTSD as zero.

18:50.600 --> 18:52.600
It's basically the exact same problem.

18:52.600 --> 18:53.600
So, that's fun.

18:53.600 --> 18:55.600
We did fix this, obviously.

18:55.600 --> 18:57.600
So, one of the things we did was we added

18:57.600 --> 18:59.600
much strength of validation of the ioners we use.

18:59.600 --> 19:01.600
So, we now validate that dev null is actually dev null

19:01.600 --> 19:03.600
in order to take things.

19:03.600 --> 19:05.600
And it uses photosquefers to do the mouse.

19:05.600 --> 19:08.600
So, it's safe against time-checking our use.

19:08.600 --> 19:12.600
And it's basically all the mount points

19:12.600 --> 19:13.600
creation code in runc.

19:13.600 --> 19:15.600
It was moved over to lipothress or rather

19:15.600 --> 19:17.600
pathress light, which is the go version.

19:17.600 --> 19:21.600
And now, most of the, almost all of the path stuff

19:21.600 --> 19:23.600
in runc is now a file is going to be based,

19:23.600 --> 19:26.600
which fixes this and a bunch of other possible issues.

19:26.600 --> 19:28.600
And yeah, it turns out, well,

19:28.600 --> 19:30.600
you maybe you should care about regular path safety.

19:30.600 --> 19:32.600
And for consoles, this is thing that I worked

19:32.600 --> 19:34.600
on many years ago called GetPier.

19:34.600 --> 19:37.600
I often older actually makes lot things a lot more safe.

19:37.600 --> 19:39.600
If you're making consoles.

19:39.600 --> 19:41.600
So, now, next one, it is vulnerability.

19:41.600 --> 19:44.600
So, what I'm doing is I'm mounting some fake files

19:44.600 --> 19:45.600
system or surface slash product.

19:45.600 --> 19:47.600
This is vulnerability.

19:47.600 --> 19:49.600
It is not a vulnerability.

19:49.600 --> 19:53.600
And because, well, it's not vulnerability

19:53.600 --> 19:55.600
because, well, in this case now that we've done this,

19:55.600 --> 19:56.600
we actually blocked this.

19:56.600 --> 20:00.600
But, well, it used to be, you know.

20:00.600 --> 20:03.600
I would say if you had asked me two years ago

20:03.600 --> 20:04.600
whether I would consider this a vulnerability,

20:04.600 --> 20:05.600
I would say no.

20:05.600 --> 20:08.600
I would say that Docker should block you from doing this.

20:08.600 --> 20:10.600
In one scene, we have code to block

20:10.600 --> 20:12.600
matching on Proc but like here.

20:12.600 --> 20:14.600
It's, okay, maybe half, yes, half knows.

20:14.600 --> 20:16.600
If you want to be technical about it.

20:16.600 --> 20:19.600
And you would think, okay, is this vulnerability.

20:19.600 --> 20:21.600
So, this is the exact.

20:21.600 --> 20:24.600
So, I bind mount Proc one shed over Proc at or off.

20:24.600 --> 20:25.600
I'm going to exact.

20:25.600 --> 20:27.600
Okay, now Proc, for a bit of background,

20:27.600 --> 20:31.600
Proc, pitch, a Proc any pitch shed is a file

20:31.600 --> 20:34.600
which is just shows you like random information about a process.

20:34.600 --> 20:36.600
But, you can write to it and it does nothing.

20:36.600 --> 20:39.600
So, if you bind mount this file on top of Proc's of error,

20:39.600 --> 20:41.600
it's a ProcFS file and actually in 2019,

20:41.600 --> 20:43.600
the fix we had for, as I should say.

20:43.600 --> 20:45.600
In 2019, when we had this security,

20:45.600 --> 20:46.600
it looks last time.

20:46.600 --> 20:48.600
The preliminary fix we had because I knew

20:48.600 --> 20:49.600
that we didn't have this infrastructure yet.

20:49.600 --> 20:51.600
The preliminary fix was, oh, I opened the file.

20:51.600 --> 20:53.600
Is it a ProcFS file?

20:53.600 --> 20:56.600
If it is a ProcFS file, then I will write to it.

20:56.600 --> 20:59.600
The problem with this is that Proc1 shed is a ProcFS file.

20:59.600 --> 21:00.600
It's just not the one you wanted, right?

21:00.600 --> 21:01.600
And so, you end up writing sign to it.

21:01.600 --> 21:04.600
It does nothing, and now the container starts with no app on it.

21:04.600 --> 21:07.600
However, this is not a vulnerability,

21:07.600 --> 21:09.600
because Runcy blocks this,

21:09.600 --> 21:12.600
because we have a restriction of what you can mount at the top of Proc.

21:12.600 --> 21:15.600
For LXC reasons, there are some Procs things you can mount on top of.

21:15.600 --> 21:18.600
But you can't mount on top of what anything you like.

21:18.600 --> 21:22.600
However, let's look at the slightly more complicated examples.

21:22.600 --> 21:24.600
The key thing here is that you have two bind mounts,

21:24.600 --> 21:25.600
where you have a bind mount of a food,

21:25.600 --> 21:27.600
and you have a bind mount on top of a food,

21:27.600 --> 21:29.600
with a very suspicious looking path name.

21:29.600 --> 21:31.600
Link threads up that far, right?

21:31.600 --> 21:33.600
Does anyone think this is a vulnerability?

21:33.600 --> 21:35.600
Yes, this is another vulnerability.

21:35.600 --> 21:39.600
Basically, what's happening here is that you have some cache directory,

21:39.600 --> 21:41.600
which is shared between two containers.

21:41.600 --> 21:42.600
It's the exact same attack as before.

21:42.600 --> 21:43.600
Oh, yeah, he thinks this is a vulnerability.

21:43.600 --> 21:44.600
Oh, no.

21:44.600 --> 21:46.600
Five minutes, okay.

21:46.600 --> 21:48.600
Okay, okay.

21:48.600 --> 21:50.600
Yeah, so anyway, this is a vulnerability,

21:50.600 --> 21:54.600
because you can have this Proc 10th cache,

21:54.600 --> 21:56.600
one of the things is a fun simling to another target,

21:56.600 --> 21:58.600
and you're going to make it to Proc's shed.

21:58.600 --> 22:01.600
The best part is you can use a simling this to Proc's circuit trigger,

22:01.600 --> 22:03.600
and turns out that XZEC, Docker default,

22:03.600 --> 22:06.600
has a lot of fun characters to put into a circuit trigger,

22:06.600 --> 22:09.600
including basically this crash is a machine if you do this.

22:09.600 --> 22:12.600
So yeah, basically the fact that we were swapping between

22:12.600 --> 22:14.600
this link and Proc is going to allow

22:14.600 --> 22:17.600
to bypass this check, and as before,

22:17.600 --> 22:19.600
Copa had an allows you to escape from container.

22:19.600 --> 22:22.600
So the solution we came with was doing a showing the mounts.

22:22.600 --> 22:24.600
Again, we should have switched the mount API earlier.

22:24.600 --> 22:26.600
It's something I'm working on, but it's somewhat done yet.

22:26.600 --> 22:30.600
And yeah, the live pathress, basically, for all Proc effects rights in the pathress,

22:30.600 --> 22:33.600
we now use, sorry, all Proc effects rights in runc.

22:33.600 --> 22:37.600
We now use the pathress for them, because it protects against these attacks completely.

22:37.600 --> 22:41.600
And it's, yeah, and also we went through every right path in runc to make sure

22:41.600 --> 22:44.600
that there was nothing you could put missed directs for a right, which you couldn't.

22:44.600 --> 22:47.600
Okay, and runc, the things we need to do is we need to,

22:47.600 --> 22:51.600
we really need to move our mount infrastructure to using the new out API.

22:51.600 --> 22:53.600
It's really quite insane, we don't use the yet.

22:53.600 --> 22:56.600
And yeah, there's somewhat what it's I want to do.

22:56.600 --> 23:01.600
And also, live pathress exists in runc, because I use the Rust thing.

23:01.600 --> 23:04.600
Today, you can actually build runc using live pathress,

23:04.600 --> 23:06.600
because pathress light has live pathress back end.

23:06.600 --> 23:09.600
But I think we, I want to get rid of this, like, intermediate go thing.

23:09.600 --> 23:10.600
I just want to use the Rust thing.

23:10.600 --> 23:13.600
Does then run and see, can use it in other programs, can use it.

23:13.600 --> 23:16.600
And it's much more robust than the go version, because the go version is kind of

23:16.600 --> 23:18.600
like to be a minimal thing.

23:18.600 --> 23:20.600
The kernel stuff, we, I can skip over, because it doesn't matter.

23:20.600 --> 23:23.600
There's some kind of stuff I want to do, not too important.

23:23.600 --> 23:27.600
And the one thing I want to say if everyone in the room is, if you are writing any kind of program that

23:27.600 --> 23:31.600
has to deal with paths, which is probably any kind of program, you should, at least,

23:31.600 --> 23:34.600
use open-out to a live pathress for regular path operations.

23:34.600 --> 23:38.600
And you definitely, if you're writing new programs, you should definitely write them in

23:38.600 --> 23:42.600
a, you know, part of the previous way, because playing around with paths is just

23:42.600 --> 23:44.600
waiting for a trouble.

23:44.600 --> 23:48.600
And if you need to have any such prompt, you really, really, really really need to use the

23:48.600 --> 23:52.600
address with the prox stuff, or make infinity yourself, you like, but you have to really do that.

23:52.600 --> 23:56.600
And yeah, the key thing to keep in mind is that any support takes a path name is potentially dangerous,

23:56.600 --> 23:59.600
especially if you're dealing with entrusted directories.

23:59.600 --> 24:00.600
Okay.

24:00.600 --> 24:01.600
Questions?

24:01.600 --> 24:02.600
Three minute or questions.

24:02.600 --> 24:04.600
Oh, thank you.

24:04.600 --> 24:17.600
My apologies if I talked to you fast.

24:17.600 --> 24:22.600
I think the smoke needs to clear from the speed.

24:22.600 --> 24:32.600
I think there's our questions.

24:32.600 --> 24:36.600
How do you feel about other projects, like the do most projects are a lot of projects

24:36.600 --> 24:38.600
that deal with paths are aware of this?

24:38.600 --> 24:40.600
Like, not just the no.

24:40.600 --> 24:41.600
No.

24:41.600 --> 24:42.600
No.

24:42.600 --> 24:45.600
So the problem is that most projects, so what I've defined is that most projects,

24:45.600 --> 24:49.600
to be fair, run see was kind of in the same camp, but camp for this, is that they

24:49.600 --> 24:53.600
write a version that works fine, and they've thought of like a couple of lossable attacks.

24:53.600 --> 24:57.600
And then each time they run into one of these problems, they end up

24:57.600 --> 25:00.600
hacking around that one particular instance rather than looking back and saying,

25:00.600 --> 25:04.600
actually, we really need to rethink how we do this sort of stuff.

25:04.600 --> 25:05.600
Yeah.

25:05.600 --> 25:08.600
I will say one nice thing about that's in recent years is that the Linux

25:08.600 --> 25:11.600
kernel has actually moved a lot more to file script based APIs.

25:11.600 --> 25:15.600
And this is a huge, huge improvement to the way things work before.

25:15.600 --> 25:18.600
I mean, everything is a file, everything is a file descriptor.

25:18.600 --> 25:19.600
Yes.

25:19.600 --> 25:21.600
All right.

25:21.600 --> 25:22.600
Okay.

25:22.600 --> 25:23.600
All right.

25:23.600 --> 25:24.600
I think there's a few so much.

25:24.600 --> 25:25.600
Thanks so much.

25:25.600 --> 25:27.600
Thank you.

