WEBVTT

00:00.000 --> 00:11.480
Hey everybody, my name is Eric Ernst, I'm excited to be here to talk about our open source

00:11.480 --> 00:16.640
projects and some of the community work that we're doing around containers on macOS.

00:16.640 --> 00:22.400
So last June, our team at Apple Open Source, the Apple Containerization Framework and

00:22.400 --> 00:27.800
Container Julien projects, which enabled developers to be able to create, run, build, push,

00:27.800 --> 00:33.120
and continue to image directly on their Mac in a way that focuses on security and privacy.

00:33.120 --> 00:38.480
Really, what we did is we introduced two key projects, one of them, a command line tool called

00:38.480 --> 00:43.760
Container, naming as hard, and it allows users, again, to quickly be able to get

00:43.760 --> 00:47.640
up and running containers on top of a macOS operating system.

00:47.640 --> 00:51.560
The second thing that we open source project wise is Containerization, and this is a framework

00:51.560 --> 00:57.080
that allows us to build solutions like that command line tool container.

00:57.080 --> 01:01.560
Users will typically interact with just a container that command line tool, but the core

01:01.560 --> 01:07.920
functionality for the project is really behind simply the eyes in Containerization.

01:07.920 --> 01:13.240
So I want to talk about what I'm going to focus on in this quick session, so I'm going

01:13.240 --> 01:17.520
to talk about some of the design principles that guided the work that we're doing.

01:17.520 --> 01:24.240
And then I want to briefly talk about how we designed it to be extensible in this example

01:24.240 --> 01:29.400
is how developers can go ahead and do that, as well as showing some of the key APIs

01:29.400 --> 01:34.320
in Containerization, again, motivation that is pretty easy to build on top of this.

01:34.320 --> 01:39.160
We'll talk about how we wrote it in Swift, and what that experience has been.

01:39.160 --> 01:43.600
It was a new process for the team, and then we're going to talk a bit more about just

01:43.600 --> 01:47.120
overall resource community and what we're doing going forward.

01:47.120 --> 01:50.680
So first off, let's talk about the actual design side of things.

01:50.680 --> 01:57.040
So in order to run the Linux container on macOS, you need to virtualize a Linux environment.

01:57.040 --> 02:03.840
A historical solution is to spawn a large virtual machine for all the different containers.

02:03.840 --> 02:06.840
Resources are allocated to that one virtual machine.

02:06.840 --> 02:08.280
The containers are added to it.

02:08.280 --> 02:11.640
These resources are devied up per container.

02:11.640 --> 02:15.760
When you need to share additional directories and files from your Mac, they're passed

02:15.760 --> 02:20.280
to that virtual machine before then they are then passed to the corresponding container

02:20.280 --> 02:22.800
with the needs it.

02:22.800 --> 02:27.480
So one of the areas we focused on in the security, and for that, we want to have the same

02:27.480 --> 02:32.040
level isolation, typical platforms have for their large virtual machines, and apply that

02:32.040 --> 02:34.520
to every container that is launched.

02:34.520 --> 02:38.640
We also want to reduce the need for core utilities or any dynamic libraries or anything

02:38.640 --> 02:44.520
else inside of the guest, and this helps both reduce the attack surface and also the

02:44.520 --> 02:49.480
burdened of maintenance costs for keeping these up to date.

02:49.480 --> 02:51.920
We also focus on privacy.

02:51.920 --> 02:57.040
Because of that, we want to limit access of what files are needed for that guest virtual

02:57.040 --> 02:59.920
machine and do it on a per container basis instead.

02:59.920 --> 03:04.080
So only the container requesting a specific directory at that point should have access

03:04.080 --> 03:05.960
to those contents.

03:05.960 --> 03:09.880
And we want to hit these goals, we'll still provide an efficient and perform an experience

03:09.880 --> 03:11.640
for end users.

03:11.640 --> 03:15.960
So for security or goal, we'll provide the same level isolation, used by the large virtual

03:15.960 --> 03:20.000
machine, and apply that to each and every single container that is started.

03:20.000 --> 03:24.440
Containerization does this by running each container in its own lightweight virtual machine,

03:24.440 --> 03:27.080
we'll still provide in sub-second-start times.

03:27.080 --> 03:31.480
This also provides the benefit that each container has its own dedicated IP address.

03:31.480 --> 03:37.120
So if you've used different solutions, usually we end up having to do is do different port forwarding

03:37.120 --> 03:40.280
magic in order to be able to access the different service.

03:40.280 --> 03:44.720
So in this now, you can just access the direct IP of that container workload instead.

03:45.720 --> 03:50.600
When sharing directories and files, only the container requesting that directory has access

03:50.600 --> 03:56.160
to those contents, and resources like CPU and memory, if there's no containers running, no

03:56.160 --> 03:58.600
resources will be allocated.

03:58.600 --> 04:03.360
Now I've said lightweight virtual machine several times, so let me dig into what I mean by

04:03.360 --> 04:04.360
that.

04:04.360 --> 04:09.400
To be lightweight, we first focus on the actual machine of self, specifically what devices

04:09.400 --> 04:12.640
associated with it that we need and what a re-verchalizing.

04:12.960 --> 04:17.200
Using Apple's virtualization framework, we are using the minimal set of devices that we

04:17.200 --> 04:22.320
need for the user experience and nothing else, so being in a pair of virtualized devices

04:22.320 --> 04:28.360
not chipset or machines or anything else, just a virtual socket, virtual block device, network

04:28.360 --> 04:30.760
device, and shader file system.

04:30.760 --> 04:35.680
This is more aligned with a micro-VM type of model.

04:35.680 --> 04:39.960
Also what we're doing is making sure that that machine itself is sized appropriately.

04:39.960 --> 04:44.840
So we have a reasonable default, but then again, we're sizing the virtual machine

04:44.840 --> 04:51.000
based on the requests of the actual workload itself, same exact thing for CPU.

04:51.000 --> 04:55.080
So it's lightweight machine, the next is what do we do and turn that machine on.

04:55.080 --> 04:58.320
In our case, we direct boot against Linux kernel.

04:58.320 --> 05:01.320
There's no need for any kind of firmware, bootloader, or anything else.

05:01.320 --> 05:05.920
This saves on boot time as well as it reduces our bill of materials of what we need.

05:05.920 --> 05:09.920
Now for this direct booted kernel, we also want to make sure it's a minimal configuration.

05:09.920 --> 05:14.880
We chose NARM 64 configuration that, again, addresses our needs, but minimizes otherwise

05:14.880 --> 05:16.880
what we would have inside of there.

05:16.880 --> 05:21.320
Reduce the tax surface, reduce the footprint of that actual kernel, as well as helps provide

05:21.320 --> 05:23.120
a perform boot time.

05:23.120 --> 05:26.280
Getting from reset to user space is very fast.

05:26.280 --> 05:31.240
So speaking of user space, I want to talk about what the user space looks like inside of

05:31.240 --> 05:32.240
our guest.

05:32.240 --> 05:37.440
So historically, when using a large virtual machine, you have different dynamic libraries, utilities,

05:37.440 --> 05:39.040
and everything else inside of it.

05:39.080 --> 05:41.720
You're booting a full system.

05:41.720 --> 05:42.720
The file system?

05:42.720 --> 05:43.720
Yeah, I said that.

05:43.720 --> 05:44.720
Okay, sorry.

05:44.720 --> 05:47.160
If it's security, we want to reduce the attack surface.

05:47.160 --> 05:49.560
So we don't have any core utilities.

05:49.560 --> 05:51.200
There's no dynamic libraries.

05:51.200 --> 05:54.080
There's no lib-c implementation or anything else.

05:54.080 --> 05:58.680
We created our own init process that is purpose built just for running containers in this

05:58.680 --> 06:00.400
constrained environment.

06:00.400 --> 06:05.120
And we call it VMInitD, which is written in Swift, it's statically built because there's

06:05.120 --> 06:06.120
no libraries.

06:06.120 --> 06:11.120
And it's designed to manage a full cycle of processes associated with running the container

06:11.120 --> 06:13.440
inside that guest environment.

06:13.440 --> 06:19.200
Now running as initial process comes with a bunch of responsibilities before, as well as

06:19.200 --> 06:21.400
during execution of containers.

06:21.400 --> 06:27.340
So it's responsible for doing things like the actual interface IP is set up, mounting the

06:27.340 --> 06:33.800
file system, such as any volumes or the actual RITFS itself.

06:33.800 --> 06:38.160
It is responsible for launch and supervision of all the processes inside of it, namely

06:38.160 --> 06:40.320
the container entry point.

06:40.320 --> 06:45.240
And it has a year PCAPI that we have for it in order for us to be able to derive that from

06:45.240 --> 06:47.880
the host side.

06:47.880 --> 06:54.200
Looking at our three focus areas for security, privacy and performance with a minimal kernel,

06:54.200 --> 06:59.240
we kind of hit both security as well as performance, I would say, for direct booting,

06:59.280 --> 07:01.120
quicker boot as well.

07:01.120 --> 07:07.600
Using that static purpose built init process helps, again, have a much faster boot time

07:07.600 --> 07:11.520
for us as well as a reduced attack surface inside of the guest.

07:11.520 --> 07:15.280
And each workload being inside its own virtual machine kind of helps across the board

07:15.280 --> 07:16.840
I would say for all three of these.

07:16.840 --> 07:21.520
So let's talk about extending then.

07:21.520 --> 07:25.520
So I don't have time to really go into much demos and everything else, but it's a command

07:25.520 --> 07:29.840
tool to container run container.

07:29.840 --> 07:32.840
Any image you want, as you would expect.

07:32.840 --> 07:37.560
One of the things we wanted to look at, make sure there, yeah.

07:37.560 --> 07:41.920
So we wanted to make it so that people can extend it for their own use cases and actually

07:41.920 --> 07:45.480
for us to easily extend it for our own internal use cases as well.

07:45.480 --> 07:49.680
So what we did was we created a plug-in framework for it and this kind of behaves a little

07:49.680 --> 07:53.280
bit like you would have forget subcommands.

07:53.280 --> 07:58.560
So with this, it allows us to have other developers who want to contribute, but don't

07:58.560 --> 08:00.600
want to make changes to the core code.

08:00.600 --> 08:03.640
Maybe they're specific for the use case or custom workflow or anything else.

08:03.640 --> 08:08.880
They can go ahead and easily just extend what we have functionality wise.

08:08.880 --> 08:14.200
We can just add simple CLI subcommands or these could be different services.

08:14.200 --> 08:18.960
XPC services, the maybe are tied to the lifecycle of the whole thing running or just a container

08:18.960 --> 08:22.160
running or anything else.

08:22.160 --> 08:25.960
So we actually use plugins today for a bunch of the components that we need in order

08:25.960 --> 08:28.640
to be able to run the baseline tool itself.

08:28.640 --> 08:31.240
And then later on the talk, I'll talk about a couple of plugins that we're kind of thinking

08:31.240 --> 08:34.120
about for open.

08:34.120 --> 08:37.680
Creating this easy, it's auto-discovery, you just drop it into no location with a little

08:37.680 --> 08:39.280
bit of config made it data.

08:39.280 --> 08:43.200
And after that, it's picked up automatically the next time you're on the CLI command, you'll

08:43.200 --> 08:46.160
see it.

08:46.160 --> 08:48.320
So that's extending container.

08:48.320 --> 08:53.560
But containerization itself, if I were to talk to all the developers in the team, I would

08:53.560 --> 08:57.960
say that the command link was fine, it has high utility for people, but really the most

08:57.960 --> 09:01.640
proud of containerization itself and all the different APIs that are provided inside of

09:01.640 --> 09:02.640
it.

09:02.640 --> 09:06.480
So with that in mind, I want to talk about how you can easily build a client like ours on

09:06.480 --> 09:08.840
top of containerization.

09:08.840 --> 09:13.320
So as I mentioned before, containerization is where the core logic is for working with

09:13.320 --> 09:15.480
Linux containers on macOS.

09:15.480 --> 09:19.960
To this time, we created many useful modules, which could be useful in their own right.

09:19.960 --> 09:24.160
You don't have to be making containers to want to use some of these or take them for anything

09:24.160 --> 09:25.160
else.

09:25.160 --> 09:29.640
We have containerization OS, interacting with low-level OS components like terminal and

09:29.640 --> 09:31.240
different process management.

09:31.240 --> 09:36.520
We have a minimal net link library for being able to do interface management inside of

09:36.520 --> 09:43.480
a Linux guest with OCI so we can have different runtime primitives as well as different container

09:43.480 --> 09:46.280
image types and everything.

09:46.280 --> 09:50.880
Then we also create an EXT-401, which is I think pretty interesting, it makes it so that

09:50.880 --> 09:56.960
way we can create and manipulate EXT-405 systems from macOS so you don't have to boot

09:56.960 --> 10:00.560
Linux kernel and do all this, you can actually do it directly, then have that raw block

10:00.560 --> 10:04.040
device and boot with it.

10:04.040 --> 10:08.680
Throughout all of this, we made sure to make heavy use of protocols, which in Swift really

10:08.680 --> 10:13.560
means that you can go ahead and replace this with a different implementation and make

10:13.560 --> 10:16.280
it plugable and everything else.

10:16.280 --> 10:21.680
In addition to protocols, we really did focus on the APIs themselves and we're not one

10:21.680 --> 10:27.120
dot zero so they still move around a little bit but really looking at it and to make

10:27.120 --> 10:29.840
sure that we have different layers for this.

10:29.840 --> 10:35.120
So at the highest level, it makes it really easy and I'll show a quick example to just

10:35.120 --> 10:38.920
be able to run a Linux environment on top of macOS.

10:38.920 --> 10:44.460
Kind of in the mid-level, you have different virtual machine management type of APIs where

10:44.460 --> 10:50.000
process management APIs in there and then you can go lower level where we get into more

10:50.000 --> 10:54.240
file system and Linux systems management type of layers and they're kind of build on top

10:54.240 --> 10:58.360
of each other so you can go as deep as you want depending on your application.

10:58.360 --> 11:02.120
At this point, I'm going to risk trying to change the terminal and show that we can make

11:02.120 --> 11:09.960
and macOS application use this Linux in very little code.

11:09.960 --> 11:11.960
We did it.

11:11.960 --> 11:12.960
Okay.

11:12.960 --> 11:17.280
I hope that the people in the back can see it okay.

11:17.280 --> 11:22.520
This is 38 lines in the file of which maybe we have about six function calls.

11:22.520 --> 11:29.040
If I look at it, one of the top level types of we have is container manager where this

11:29.040 --> 11:32.560
is taking a kernel and then knit FS reference.

11:32.560 --> 11:38.760
That's that minimal kernel that every container is going to use for each individual copy

11:38.760 --> 11:40.120
of it then.

11:40.120 --> 11:45.880
Then knit FS is what we do for that VMNID process that I talked about is static built.

11:45.880 --> 11:52.080
With that, we can now go ahead and create containers using that factory.

11:52.080 --> 11:56.760
With the container create, what we need is the reference to the actual OCI image.

11:56.760 --> 12:01.800
Again, standard OCI image that we're using building and consuming and then just whatever

12:01.800 --> 12:07.600
a unique name to it, you can figure it for whatever resources you want, set terminal because

12:07.600 --> 12:15.640
I'll just do it a little echo because we're here, I can type that and then this is just

12:15.640 --> 12:18.280
setting up all the metadata after we execute that.

12:18.280 --> 12:22.280
It's pulling down and unpacking and making a block device that we can use inside the gas

12:22.280 --> 12:26.240
for that root of S which is the container image.

12:26.280 --> 12:30.920
You create, we do start, this is at this point, boot of the VM, starts the process, and

12:30.920 --> 12:31.920
then we wait.

12:31.920 --> 12:34.720
It was a shell or an interactive or anything else, it would keep going in our case that

12:34.720 --> 12:38.800
weight should come back pretty quick once the echo completes and then we'll stop and

12:38.800 --> 12:40.800
on the defer will clean up.

12:40.800 --> 12:45.840
So that was about six function calls and at that point, you can have a custom macOS application,

12:45.840 --> 12:50.960
yes it's using a container image but you could use, that's just a payload.

12:50.960 --> 12:55.440
You can run a Linux environment and about 10 lines and have that embedded in Swift on top

12:55.600 --> 12:56.600
of your Mac.

12:56.600 --> 13:02.440
So it built and then if we just run it, it'll just say hello world eventually, yes,

13:02.440 --> 13:05.760
okay, that took a while, I didn't like that, I'll taste it on.

13:05.760 --> 13:06.760
Okay.

13:06.760 --> 13:25.360
So let me tell you about Swift a little bit, background of the team is a lot of folks

13:25.360 --> 13:30.560
who contribute to container D, run C authors, different tools like this, so like very container

13:30.560 --> 13:36.000
run time focused, very C focused and very, very go focused.

13:36.000 --> 13:39.600
This is our first project using Swift, so I'm just going to give a few different data points

13:39.600 --> 13:42.960
of what the experience was like.

13:42.960 --> 13:48.400
First, C interoperability, you know, with terminal and all the different containerization

13:48.400 --> 13:52.080
OS, all these different modules use a lot of ciscalls and things like this so we often

13:52.080 --> 13:58.080
do need to call in the C. Compared to using Go, it was a lotty, it was very easy, the interoperability,

13:58.080 --> 14:00.440
really, it couldn't be any easier.

14:00.440 --> 14:06.400
The second area was around enums and tech unions, this helped, it really changed how

14:06.400 --> 14:12.040
we're designing area APIs and containerization, we found them really quite nice and useful.

14:12.040 --> 14:17.280
Memory safety, again, is pretty important given our focus on security, you know, using

14:17.280 --> 14:23.560
optionals, designed to prevent psych faults, it was, they're pretty nice to use.

14:23.560 --> 14:28.560
In addition, you know, it's kind of table stakes, we need a static SDK available for Linux,

14:28.640 --> 14:33.600
and it, you know, meets our needs so that we can have that minimal guest image, and we're

14:33.600 --> 14:37.520
often calling into different macOS frameworks and everything else, so being able to call

14:37.520 --> 14:42.720
Swift into these frameworks was a pretty natural and easy thing to do.

14:42.720 --> 14:46.880
Now, let's talk about the open source community aspects, tell you some of the goals we have

14:46.880 --> 14:50.320
and some of the work that we're going to be doing going forward.

14:50.320 --> 14:54.720
We open source the last year and we've been really excited, it's been a pretty positive response

14:54.720 --> 14:59.680
so far. Now that it's in the open, our first party is encouraging contributions in trying

14:59.680 --> 15:07.760
to develop in the open with a community. So we welcome all feedback, if it's issues, PRs, questions,

15:08.320 --> 15:15.520
ideas, anything, it really, this brings us a ton of joy. So we'd love that. We're also looking

15:15.520 --> 15:19.520
forward to seeing how different projects can build on top of containerization, since it's

15:19.520 --> 15:24.880
standalone framework, again, developers can build directly on top of it and use the APIs

15:24.880 --> 15:30.160
to build their own custom solutions. And finally, we want to expand the container ecosystem

15:30.160 --> 15:36.240
through that plug-in architecture. So transitioning to kind of more of a roadmap type of focus,

15:36.240 --> 15:40.640
we have several features and integrations which we're currently working through and thinking about.

15:40.640 --> 15:47.520
One of them is it's trying to create a container plug-in that will allow for a more long

15:47.520 --> 15:53.440
running Linux environments on top of macOS. So like, less the femoral compared to containers.

15:54.240 --> 16:00.080
I would say that a lot of this is motivated like, hey, we like how WSL does this and kind of having

16:00.080 --> 16:06.400
that type of UX where we start to blur the Unix environments a little bit back and forth.

16:07.280 --> 16:11.040
So that's an interesting area. Another one that we're looking at is to be able to have a

16:11.040 --> 16:16.880
femoral Kubernetes environment locally. So this is effectively, how do we work with things like

16:17.840 --> 16:23.120
or mini-cube or any of these? We're doing some work in order to be able to see if we can

16:23.120 --> 16:30.160
be like a provider with kind and upstream. They were having like, we probably won't be able to do that.

16:30.880 --> 16:35.600
Container is pretty different from a lot of the other macOS based solutions that are kind or

16:35.600 --> 16:39.360
supporting because we do have pretty nice resource management of a one-to-one mapping that

16:39.360 --> 16:45.200
the container to a VM. So since our focus is on the other ones, essentially the feedback from

16:45.200 --> 16:50.320
Ben was like, hey, you guys should just write your own it's easy. So we're going to look at that

16:50.320 --> 16:55.440
for a femoral environment pretty shortly here. That'll probably be a container plug-in.

16:56.400 --> 17:02.080
In addition to this, we're always looking to improve our performance. So we currently use build kit

17:02.080 --> 17:07.200
and build kits great build kits in the facto. Every machine is built kit using it inside of a

17:07.600 --> 17:14.720
guest across from Swift across the machine boundary for GRPC. It's the performance that's

17:14.720 --> 17:21.280
suboptimal on our side and we want it to be better. The problem is fixing it, it's complex,

17:21.280 --> 17:27.040
as far as being able to have the equivalent functionality. So this is kind of a much larger thing

17:27.040 --> 17:30.400
that we've been talking about in the community and trying to figure out how we'd push that forward.

17:32.000 --> 17:36.640
On top of that, we want more ecosystem integration. So this is more like if we do Kubernetes

17:37.200 --> 17:42.240
of how can we build on top of containerization to have something that would fit underneath that

17:42.240 --> 17:47.520
container runtime interface type of API. And in the grace of you could run a more typical

17:47.520 --> 17:54.160
Kubernetes stack on this. One of the most common things is, hey, this is cool. Can I use compose

17:54.160 --> 17:59.760
and we say not yet. And dev containers are kind of similar to these two. The API that they kind of

17:59.760 --> 18:07.040
have is more like the Docker CLI. So it's challenging for us to decide how much should we make

18:07.040 --> 18:12.320
it a drop in replacement versus being opinionated for a very easy entry point. So this is still

18:12.320 --> 18:18.160
something we're looking at. They're different community PRs for adding compose, which is super fun,

18:18.160 --> 18:23.920
but not all the way there yet. So we're still kind of working through that. Next step, final

18:23.920 --> 18:29.120
wrapping up is, if you're on a Mac, you can try it out. Again, we would really love feedback

18:29.120 --> 18:34.160
for curious developers who want to use their own use case. Check out the APIs and containerization,

18:34.160 --> 18:39.280
build your own solution or extend it with plugins. If the intersection of virtualization containers

18:39.280 --> 18:48.720
are interesting, are growing as well on the team. But thank you. I appreciate it. And we have stickers

18:48.720 --> 18:52.800
in some things here as well.

