WEBVTT

00:00.000 --> 00:10.600
All righty, we'll write them at just a few seconds, with fans going to talk to us about

00:10.600 --> 00:11.600
container day.

00:11.600 --> 00:19.200
All right, hi everybody, I'm trying to change it up because many years have given like

00:19.200 --> 00:25.320
the container day update, hey it's our new version, here's the new features, and rather

00:25.320 --> 00:30.840
than bore you with that another year, this talk is going to be a little more about the

00:30.840 --> 00:37.280
project itself, and some design decisions, I think we've made that have been beneficial

00:37.280 --> 00:42.480
to the success of the project, and I'm actually going to start with a demo because I

00:42.480 --> 00:48.520
may get long-winded, rather, show it to you now, and it may help set the stage for what

00:48.520 --> 00:49.520
we talk about.

00:49.520 --> 00:58.960
There's two new-ish features that this demo relies on, one is the arrow-FS snapshoter, which

00:58.960 --> 01:06.280
this is not a talk about arrow-FS, but it's an interesting new-ish file system in the Linux

01:06.280 --> 01:12.760
kernel that's valuable in the container space, and the nerdbox shim, which we'll talk

01:12.760 --> 01:20.160
about shims, it's actually using the LiveKrun project, folks may have heard of that, but

01:20.160 --> 01:26.960
I'm going to switch to a terminal that I think is big enough, here is just the messy output

01:26.960 --> 01:34.840
of container D, and if you can see all the way up here, this is a Macbook running Mac OS,

01:34.840 --> 01:41.960
it's phoastems of booth, you need to for a minute, but this is container D running on this

01:41.960 --> 01:48.160
Mac, and we've had Mac support in container D, you might question why, because there's

01:48.160 --> 01:54.640
not been a runtime that could isolate processes in the same way that we do on Linux, but

01:54.640 --> 01:59.960
needless to say there have been people working on that.

01:59.960 --> 02:06.760
So rather than I'll show you the command, I'm going to run CTR is a simple client, similar

02:06.800 --> 02:13.880
to Docker, pod, man, et cetera, I'm going to run the Alpine image and run a shell, obviously

02:13.880 --> 02:22.960
not an exciting demo, but again, try and remember that this is a Mac OS laptop, I'm not

02:22.960 --> 02:30.040
running this inside of VM, if I, you know, you name inside the container, I'm in, it's

02:30.040 --> 02:36.320
as I'm arm 64 Linux, and again, all the usual commands work.

02:36.320 --> 02:44.480
If we grab for a shim, here's where you see that there's a container D shim called

02:44.480 --> 02:53.040
NERBOX V1 that's actually running, and this is instead of run C, so NERBOX is using LibKrun

02:53.040 --> 03:00.880
to run a lightweight hypervisor, which then inside that has a NIDD process that is listening

03:00.880 --> 03:07.720
to communication that actually obviously uses run C inside the container.

03:07.720 --> 03:16.840
So the reason I show you all that, and again, simple demo, not all that flashy, because

03:16.840 --> 03:22.840
I think decisions we've made that container D's code base didn't have to change it

03:22.840 --> 03:25.600
at all for this to work.

03:25.600 --> 03:32.800
I think that reflects a set of design decisions that have made us successful, we're running

03:32.800 --> 03:38.080
everywhere, I won't belabor this slide, but, you know, container D is pretty much in all

03:38.080 --> 03:44.520
the managed services of hyperscalers, many of the build your own role, your own Kubernetes

03:44.520 --> 03:51.240
solutions, all the Chinese clouds, many of them are very involved in the project as contributors,

03:51.240 --> 03:58.840
they're also using it in their solutions, and then my company, Amazon, AWS, is using

03:58.840 --> 04:06.120
it, and I guess this is maybe the first moment that this has been mentioned, but

04:06.120 --> 04:13.120
Lambda managed instances, which was announced that reinvent in December, actually uses

04:13.120 --> 04:19.760
container D in the data plane, so your Lambda functions on manage instances are actually running

04:19.760 --> 04:27.720
as containers on container D. So again, container D is running a lot of places, and again,

04:27.720 --> 04:33.680
my positive, my argument for today is that some of the extensibility we baked in early

04:33.680 --> 04:39.000
on, and some of the things we've refined, have made it possible for it to be useful in

04:39.000 --> 04:45.360
a lot of different ways. And so we're just going to talk briefly about clients, snapshotters,

04:45.440 --> 04:51.680
and shims, there's some other interfaces in container D that are also kind of interesting.

04:51.680 --> 04:59.760
We built a transfer service about two years ago that allows you to do pull operations or

04:59.760 --> 05:06.960
copy operations, any source to any sync, and that actual source and sync objects are

05:06.960 --> 05:13.000
plugable, so you could do a registry to registry, copy plug-in, you could do all kinds

05:13.000 --> 05:19.520
of interesting things, and we also prior to that added stream processors, which means

05:19.520 --> 05:26.960
I can plug my own binary in to the stream of bytes from or to a registry, so think about

05:26.960 --> 05:36.240
decrypting encrypted layers is one of the features that uses that, or having a custom algorithm.

05:36.240 --> 05:42.160
In the client side, again, there's a small kind of very, very simple diagram, so clients

05:42.160 --> 05:51.600
think of Docker using container D, the CTR tool we just ran, build kit is a client of container

05:51.600 --> 05:59.480
D, there are higher layer ones like Kubernetes, use of the CRIPI, so container D presents

05:59.480 --> 06:07.040
the CRIPI, so that Kubernetes and the Kublet specifically can drive container D.

06:07.040 --> 06:12.080
And again, so we have a go SDK that's kind of this rich client, obviously you can talk

06:12.080 --> 06:16.880
directly to GRPC, so anything you can generate bindings for, we've done that for

06:16.880 --> 06:24.680
Rust and go obviously so that you can use GRPC services properly, and then as I mentioned

06:24.680 --> 06:32.120
there's this more kind of thick client in go, that build kit uses CTR uses, again,

06:32.120 --> 06:39.000
this is given us like a broader array of potential ways to drive container D, either through

06:39.000 --> 06:48.120
Kubernetes or without the CRIPI, and that lambda managed instances product I talked about

06:48.120 --> 06:54.960
uses container D via a Rust client, and so there's a few groups that are starting to

06:54.960 --> 07:01.240
use Rust in and around the container D ecosystem.

07:01.240 --> 07:07.920
So snapshotters, again, try not to take up too much time, you know, belabering kind of

07:07.920 --> 07:16.680
the architecture overall, but snapshotters are the way that we represent the container

07:16.680 --> 07:21.280
image that you've pulled from a registry, how it's represented in the file system.

07:21.280 --> 07:29.000
And so the snapshotter interface abstracts away the file system details, so we have a bunch

07:29.000 --> 07:34.840
of long time built in snapshotters that you recognize overlay, butter, fS, DFS, device

07:34.840 --> 07:42.960
maper, we have a native snapshotter, and the Microsoft team has a couple for Windows and

07:42.960 --> 07:46.680
Linux on Windows containers.

07:46.680 --> 07:53.400
More recently, we added a block file snapshotter and, as I mentioned, RFS.

07:53.400 --> 07:58.880
The cool thing, again, here is that we built the snapshotter API, I'll show it on the next

07:58.880 --> 08:04.960
slide, to be very simple and straightforward, so you care about the details of your file

08:04.960 --> 08:10.840
system, and container D just needs basic information like the list of mount commands

08:10.840 --> 08:14.360
to run, that you've assembled for it.

08:14.360 --> 08:19.600
But the other thing we did is we built a proxy interface, so you can build a snapshotter that

08:19.600 --> 08:24.800
isn't even part of the container D code base, and so AWS, we built something called

08:24.800 --> 08:32.800
sochi that does lazy loading, GKE also has something called image streaming that's built

08:32.800 --> 08:34.840
as a snapshotter.

08:34.840 --> 08:42.960
There's NIDAS, StarGZ, overlay, BD, these are all kind of lazy loading, new initiatives

08:42.960 --> 08:45.720
from various teams around the world.

08:45.720 --> 08:50.920
And again, because we built this proxy system, they don't have to wait for our release,

08:50.920 --> 08:53.720
they don't have to wait for us to merge their changes.

08:53.720 --> 08:59.040
You can just build a snapshotter and start using it with your custom file system, with your

08:59.040 --> 09:04.080
ideas about how to do lazy loading or fast polling.

09:04.080 --> 09:10.360
And again, that's just a feature that allows container D to be extended without the

09:10.360 --> 09:13.160
core team being involved at all.

09:13.160 --> 09:18.320
We did, as you can see, merge error-fs directly into the core, but it could have been

09:18.320 --> 09:24.960
it did start as an external add-on snapshotter.

09:24.960 --> 09:28.920
As I said, I'll just show the interface really quickly.

09:28.920 --> 09:32.480
These are the RPC.

09:32.480 --> 09:36.960
This is the interface that you would have to implement if you wrote your own snapshotter.

09:36.960 --> 09:42.240
Again, I don't have all the details here about what these requests, responses look like,

09:42.240 --> 09:49.120
but again, these are simple interfaces that allow you to handle the file system interface

09:49.120 --> 09:54.680
and let's container D only worry about what it needs to know to set up the container

09:54.680 --> 09:59.320
file system and mount namespace.

09:59.320 --> 10:04.560
Shims are a little more interesting, so the demo we saw at the beginning was built

10:04.560 --> 10:13.560
around this idea that container D has a set of objects that it manages in its GRP services

10:13.560 --> 10:22.680
that you would expect things like a container process, the task, the images, the content

10:22.680 --> 10:30.680
store, but once all that's assembled and there's an OCI Spector Run, we passed that down

10:30.680 --> 10:36.760
to a shim, the shim interface can be implemented by something like RunC, obviously, or

10:36.760 --> 10:45.800
C Run, or on Windows, HCS Shim, and so that process, that binary gets to decide how

10:45.800 --> 10:52.440
to turn that OCI Spector and the bundle into a isolated process, obviously that what's

10:52.440 --> 10:56.840
different on Windows than it does on Linux with RunC.

10:56.840 --> 11:01.960
But this is generated a very long list of interesting side projects.

11:01.960 --> 11:07.800
Again, we don't have to control whose creating shims, whose offering shims, some of them

11:07.800 --> 11:15.720
have been contributed into our project, and we call them non-core, like RunWazee, Nerdbox

11:15.720 --> 11:20.360
as a new non-core project that I showed you.

11:20.360 --> 11:25.920
I'm trying to think of any of the others are actually in the containerized organization.

11:25.920 --> 11:30.840
But again, many of these are third-party, they're managed by a group that cares about

11:30.840 --> 11:32.480
this technology.

11:32.480 --> 11:37.960
Some of them are fun, you know, single-person projects like the BSD Jails.

11:37.960 --> 11:45.440
Yuki is a rust, shim implementation that doesn't even need runC, because it's implemented

11:45.480 --> 11:54.640
the OCI Spector process isolation steps within itself as a rust program.

11:54.640 --> 11:59.720
RunC, Google, put that together for G-visor.

11:59.720 --> 12:06.640
And again, the list goes on, Cata, Incubare, Quasar, which actually is interesting, because

12:06.640 --> 12:11.520
it has a runC variant, it knows how to do lightweight virtualization.

12:11.520 --> 12:18.720
And so using other annotations and knowledge about the container, it can decide on the isolation

12:18.720 --> 12:19.720
mechanism.

12:19.720 --> 12:26.720
There's an EBPF shim, and then firecracker container D, again, something we had put

12:26.720 --> 12:33.680
together when AWS was using firecracker for lightweight virtualization.

12:33.680 --> 12:39.720
So again, the point here, the shim interface has allowed people to do their own interesting

12:39.720 --> 12:45.680
things about what it means to isolate a process, different operating systems, what I

12:45.680 --> 12:48.600
showed you was, you know, macOS, using lib.

12:48.600 --> 12:50.600
K-run.

12:50.600 --> 12:55.920
But again, this is a lot of people to do many interesting things that we as container

12:55.920 --> 13:02.120
D maintainers would have never dreamed up ourselves, or built ourselves, or maybe would

13:02.120 --> 13:08.160
never have merged into the core project, because maybe we wouldn't know how to maintain

13:08.160 --> 13:13.160
them or keep them over time.

13:13.160 --> 13:18.360
The shims implement the task interface, so this is a little bit larger, but it's all the

13:18.360 --> 13:24.440
things you would expect to have to do with an isolated process on an operating system,

13:24.440 --> 13:29.120
create it, start it, delete it, et cetera.

13:29.120 --> 13:36.480
Many of them will look similar to what runC does when it starts and stops isolated container

13:36.560 --> 13:38.920
processes.

13:38.920 --> 13:46.080
So, you know, part of my idea for this talk was just if you're creating something you

13:46.080 --> 13:51.640
believe is shareable, this may not apply if I'm building a command line tool, it's not

13:51.640 --> 13:57.200
something I'm expecting to be extensible, but if you think about Kubernetes as another

13:57.200 --> 14:03.360
great example, if there was an a way to create your own custom resources, or to plug in

14:03.360 --> 14:09.920
your own interfaces for egress or ingress, you know, how valuable would that project

14:09.920 --> 14:10.920
have become?

14:10.920 --> 14:16.000
So, I think container D and Kubernetes are both good examples if you think about your

14:16.000 --> 14:22.600
design, you know, one of the use cases that I can consider today that I could enable

14:22.600 --> 14:29.480
by making an interface plug-able and have reduced friction for people using your project

14:29.480 --> 14:34.000
when it's mature and stable, and maybe you don't want to be adding, you know, many

14:34.000 --> 14:38.560
many things into the core project.

14:38.560 --> 14:41.560
So, wow, I'm way ahead of time.

14:41.560 --> 14:44.680
So, we have lots of time for questions.

14:44.680 --> 14:49.440
We can even show more nerdbox, but not that exciting.

14:49.440 --> 14:52.680
We do have some channels on Slack.

14:52.680 --> 14:58.120
If you ever have a question, if you're interested in adding a feature to container D, come

14:58.120 --> 15:05.400
chat with us there, and again, we have plenty of time for some questions right now.

15:05.400 --> 15:09.840
And just before everyone gets, please stay very quiet doing the questions, the microphone

15:09.840 --> 15:14.400
for the question is not particularly strong, so if we have any chance for fail actually

15:14.400 --> 15:19.760
understanding anything, please pick up and rest a few please be quiet.

15:19.760 --> 15:35.320
Okay, so I haven't had really chance to follow the Transfer Service work, but is the idea

15:35.320 --> 15:40.960
that the snap shutters are going to use the Transfer Service for our going to use which

15:40.960 --> 15:46.960
service, the transfer service, so basically right now most snap shutters implement their

15:46.960 --> 15:51.280
own like pulling logic, which kind of like messes with like CI or I, mirror configuration

15:51.280 --> 15:52.880
and stuff like that.

15:52.880 --> 15:58.880
So is there a plan to like get snap shutters to use this one unified solution?

15:58.880 --> 16:06.520
Yeah, that's a good question, so when you configure the Transfer Service, if you're calling

16:06.520 --> 16:13.240
at the API, you can specify, here's the snapshotter that will handle, you know, unpack,

16:13.240 --> 16:23.160
etc., it does get tricky with like a lazy loading technique, so yeah, I'm not sure we have

16:23.160 --> 16:31.440
a great answer for that yet, definitely our work on SOCHI, it'd be us, there are friction

16:31.440 --> 16:40.840
points, that's something we should probably talk about as a community, especially as more

16:40.840 --> 16:47.840
and more people are playing with lazy loading.

16:47.840 --> 16:51.400
So my impression generally is the continuity tends to the extent of the ability to design

16:51.400 --> 16:55.520
a continuity tends to revolve around having like pulling out to a different binary that

16:55.520 --> 16:57.920
does this different implementation.

16:57.920 --> 17:01.440
In OCI, the way we did the extensibility for run time was through hooks, which doesn't

17:01.440 --> 17:06.240
give you everything, but I guess the thing is that it seems to me like for my business

17:06.240 --> 17:09.280
to actually want both rather than just one of the other, because like for instance, if

17:09.280 --> 17:13.320
you wanted to, if some service wanted to hook into getting metrics and they did a very

17:13.320 --> 17:18.520
specific that are from paini directly, I think that you, I don't, I'm sorry as my way

17:18.520 --> 17:21.880
a community doesn't really have that many of like the hook style extensions, it's mainly

17:21.880 --> 17:24.720
the pulling out of the extension.

17:24.720 --> 17:31.720
Yeah, so, you know, some of that configurable and containerities config, like a set of stream

17:31.720 --> 17:41.160
processors that I want you to call data dog contributed the plug-in for, container, or

17:41.160 --> 17:45.160
image signing, and image verification actually.

17:45.160 --> 17:52.120
Again, it's similar, you say, here's my binary, go call it when you've pulled an image.

17:52.120 --> 18:00.520
It's a good question, I'm, you know, hooks fit in the life cycle aspect, I guess it'd

18:00.520 --> 18:05.400
be interesting to think where they, where they could fit in the rest of those, I guess

18:05.400 --> 18:12.480
images have a life cycle, you can say, you know, trigger on pre-pole or pole, yeah, it's an

18:12.480 --> 18:15.480
interesting idea.

18:15.480 --> 18:19.680
And we, we've seen a question from the chat as well, and I've got to read it.

18:19.680 --> 18:25.360
How well this performance compare with the native Apple container runtime and what the

18:25.360 --> 18:28.560
line is, does this happen or what's the other solution?

18:29.280 --> 18:35.000
Yeah, I don't think I'm ready to make any performance comparisons.

18:35.000 --> 18:41.280
I think Eric will be giving a talk in here pretty soon on containerization.

18:41.280 --> 18:46.240
Obviously, there's lightweight virtualization involved in both.

18:46.240 --> 18:51.440
But yeah, that we're still experimental with nerdbox, I don't think anyone has real

18:51.440 --> 18:53.560
performance data at this point.

18:54.560 --> 18:57.560
Any other question?

19:10.560 --> 19:13.560
Anyone missing a key call to something?

19:13.560 --> 19:15.560
I might have already left.

19:15.560 --> 19:18.560
No, I don't think there's a lot of them found in K.

19:19.560 --> 19:22.560
I think building K might have lost their phones otherwise.

19:24.560 --> 19:26.560
I guess we're going to be wrapping up the nerds already.

19:26.560 --> 19:28.560
All right, thanks a lot.

19:28.560 --> 19:29.560
Thank you.

