WEBVTT

00:00.000 --> 00:13.440
So this is our last AVG Devroom talk of the day, it's another podcasting one, I'm using AI to do

00:13.440 --> 00:18.600
that by guessing it's, well, I find out how you can use AI, thank you very much.

00:18.600 --> 00:23.640
Hello everyone, today I'm going to talk to you about a new project which is called Pod

00:23.640 --> 00:36.640
Libre and it's a podcast audio editing software for the AI age.

00:36.640 --> 00:43.280
This is the mascot for Pod Libre and we have some stickers so feel free to come and get

00:43.280 --> 00:45.280
some after the presentation.

00:45.280 --> 00:52.440
So let me introduce myself, I'm Benjamin, I love podcast, I love free an open source software,

00:52.440 --> 01:01.560
I work at Lina Goha, developing truly open source AI solution and I emphasize on truly open

01:01.560 --> 01:02.560
source.

01:02.560 --> 01:07.440
I'm also the founder of Adorece which is a company that develops open source solution

01:07.440 --> 01:15.040
for podcasters, one of them is CasterPod which is an open source podcas hosting platform

01:15.040 --> 01:23.200
and I'm also a photographer for Wikipedia which does nice picture for Wikipedia.

01:23.200 --> 01:33.040
So first let's talk about who is a developer in the room, okay that's over 100% and

01:33.040 --> 01:44.000
who is a podcaster, okay like 25, who wants to be a podcaster, okay and who is listening

01:44.000 --> 01:49.760
to podcasters or like 100% okay.

01:49.760 --> 01:55.760
So now I'm talking to the person who are already podcasting what software do you use, what

01:55.760 --> 02:01.280
is your workflow, like the main editing software.

02:01.980 --> 02:05.580
Oper, vapor audacity.

02:05.580 --> 02:11.160
Logic okay, that's it.

02:11.160 --> 02:24.000
No one is using outer, okay, like the rhythm, okay so let's start with a definition

02:24.000 --> 02:32.200
what is put lib or per lib array of its it's come from podcasts and from lib which

02:32.200 --> 02:40.920
is free as in free speech. So it's either French or Spanish. I'm taking just two

02:40.920 --> 02:47.560
examples that I wish we'll see in the dictionary in a few years. I switch to

02:47.560 --> 02:52.320
per lib because I want it full control over my podcast workflow without sending my

02:52.320 --> 02:57.480
audio to the cloud or the per lib approach means owning your entire

02:57.480 --> 03:05.120
projection chain from recording to RSS feed. Because of course we all agree

03:05.120 --> 03:12.080
here that podcasting is linked to RSS. If there is no RSS feed then it's not a

03:12.080 --> 03:20.280
podcast. If it's on YouTube it's probably not a podcast. So I'm going to tell

03:20.280 --> 03:28.920
you my story. Bonnie just before me said that everything about recording a

03:28.920 --> 03:33.440
podcasting is like a religion. And I totally agree with her. In fact I

03:33.440 --> 03:42.600
couldn't agree more. So welcome to my church. Back six years ago we wanted a

03:42.600 --> 03:47.920
solution to host our podcast and to host the podcast for our customers, our

03:47.920 --> 03:53.080
clients. And we couldn't find any and more of her we wanted one which was

03:53.080 --> 03:59.200
connected to the Fediverse and that was we could find even less. So we started

03:59.200 --> 04:05.520
CastoPod as a side project. We're deeply grateful for the community that

04:05.520 --> 04:11.960
we've built like over 1.7 million hours of content. So people are trusting us

04:11.960 --> 04:17.360
with their content. Most of them are self hosting which is what we wanted in

04:17.360 --> 04:23.080
the first place. So that's really great. And it's translated in over 30 languages.

04:23.080 --> 04:30.480
Some of them we are like the only podcast platform having these languages like

04:30.480 --> 04:39.680
Roton. In France there's no other platform using that language and this is why

04:39.680 --> 04:45.240
we love open source because if you want something you can have it. So we're

04:45.240 --> 04:57.120
our podcasters too. So this is my church. Over the years I started using logic

04:57.120 --> 05:03.200
then other because it's open source and because it's non-destructive then I

05:03.200 --> 05:09.080
switch to audacity because it's easier faster and it's almost non-destructive

05:09.080 --> 05:13.800
now not 100% but almost and with the new version you can use it with pipe

05:13.840 --> 05:22.200
wires. So yeah it makes sense but we are a big advocate for transcriptions,

05:22.200 --> 05:31.920
for obvious accessibility and discoverability issues. So we need to make transcriptions

05:31.920 --> 05:37.720
and I developed over the years some patent scripts to do that and many scripts.

05:37.720 --> 05:46.560
So publishing an episode has become very cumbersome and when talking to other

05:46.560 --> 05:51.800
podcasters we will realize that we are not the only ones with these issues. So this

05:51.800 --> 05:59.320
is why we decided to start public labor. One other thing is that the software we are

05:59.400 --> 06:08.120
using so we per our audacity to read them logic. They are done for musicians and

06:08.120 --> 06:12.640
there are many many things that we don't need and many things that we need that's

06:12.640 --> 06:22.200
aren't there. So the workflow is not very appropriate for podcasting. So this is an

06:22.280 --> 06:32.760
example. So this is my podcast the latest episode yes. So as I said and as Bonnie said

06:32.760 --> 06:38.680
this is my religion maybe yours is totally different and this is why I'm talking to you

06:38.680 --> 06:44.520
now because at some point we need some feedback because there is no way that we have

06:44.520 --> 06:50.440
all the good answers just with a bunch of us. So we really need your feedback if there is

06:50.520 --> 06:57.640
one thing that should come out this stock is feedback from you. So here you can see there are

06:57.640 --> 07:09.160
some speakers so one microphone per speaker you can see that when someone is not talking it's totally

07:09.160 --> 07:19.400
blank it's muted. Here there are chapters you can see yeah next weekend it's first them here

07:20.440 --> 07:30.200
the Trump is the best advocate for open source software here is the music and doing that

07:30.200 --> 07:36.920
that's that's pretty simple it's very easy they are like four tracks plus the chapters

07:39.400 --> 07:46.200
that should take 10 minutes and I really do think that's doing this editing this should

07:47.080 --> 07:58.360
no more than 10 minutes but the truth is it takes hours because you need to normalize the level

07:58.360 --> 08:06.760
and to compress the audio and maybe here this was our intern he's 14 so he spoke too far away

08:06.760 --> 08:16.040
from the microphone so the ratio signal noise was not good enough so we have to level up the sound

08:16.200 --> 08:23.080
so there was too much noise so we had to remove noise but all these should be almost automatic

08:23.080 --> 08:31.080
it should be most complicated part you have to do which should be checking a checkbox because

08:31.080 --> 08:40.840
these are the thing that we do every week this is the transcription edition if you do we do

08:40.920 --> 08:47.240
automatic the transcription but you have to correct it and there is no software that allows you to

08:47.240 --> 08:57.720
correct that in a way that you feel your in 2026 and more of the transcription is done afterwards

08:58.840 --> 09:03.400
but I needed before editing because the transcription could be very useful for editing

09:04.280 --> 09:11.960
so this is a benchmark that we did over a year ago with many many tools open source and non-open source

09:13.320 --> 09:23.720
I'll publish the the full presentation afterwards and none of them answers our needs

09:24.600 --> 09:32.600
our needs so we need transcript driven editing because we want to be able to use the

09:32.600 --> 09:39.080
transcription to edit and to know which one should be taken out noise reduction one metadata

09:39.080 --> 09:46.200
place for everything because there are many metadata the title the show notes the transcription

09:46.200 --> 09:52.040
the chapters and right now you have to put them like maybe there are some ID three tags within the

09:52.120 --> 10:01.160
MP3 file and in the end it takes much time for no ID value chapter management etc

10:03.160 --> 10:12.280
so we asked you is there anyone here who answered this poll in the room no you don't remember so we asked

10:12.280 --> 10:18.920
that's on the the fetivers on mastodone and we asked would it make sense so it was over a year ago

10:19.000 --> 10:29.480
we did make sense to develop a tool for podcast edition and yeah we we got like 45 plus 36

10:29.480 --> 10:40.440
percent we said yeah so put libre is a podcast editor built by podcaster for podcasters and open source

10:40.600 --> 10:49.400
so the key principles we want something that's cross platform because even though I'm using

10:49.400 --> 10:58.120
linux and I think everyone should one step at a time it should also be able to run on macOS

10:58.120 --> 11:05.320
and windows privacy focus because we don't want to sense our data to any big tech we need a

11:05.320 --> 11:13.160
plugin based architecture because I know that my workflow is not yours and even my workflow

11:13.160 --> 11:17.880
can evolve and change depending on which podcast if I'm hosting two podcasts they probably

11:17.880 --> 11:25.560
don't want to have the same workflow of course it's open source and to be able to run offline

11:25.560 --> 11:33.560
I don't want an application that's used APIs for everything maybe there's one plugin that we

11:33.560 --> 11:42.680
have an API and run on the cloud but I want it to be able to run offline so this is the like the

11:42.680 --> 11:51.080
basic workflow that we designed so you create a podcast then an episode your import your files

11:51.080 --> 11:59.240
maybe later but that's not our priority your record from the application and then you would

11:59.400 --> 12:06.760
like normalize audio reduce noise compress then transcripts and you'd be able to correct the

12:06.760 --> 12:15.080
transcription and go back and then serve the music something that's really miss on all

12:17.400 --> 12:25.720
audio editing software is the ability to have a music which length can adapt to the lens of my

12:25.720 --> 12:33.000
introduction so like we've beginning a loop and ending and I would just want to be able to

12:33.000 --> 12:39.000
drag and drop and say use the loop as many times as possible so that's the intro music last

12:39.000 --> 12:46.600
as long as my introduction my spoken introduction then you did the chapters the metadata and

12:47.400 --> 12:58.200
you export you publish to whichever hosting system you want to so this is something that

12:59.480 --> 13:06.600
it would look like this is a mockup it doesn't exist yet we have some stuff running but I want

13:06.600 --> 13:15.240
to be showing them to you today because we a little bit behind schedule and this is what we

13:15.320 --> 13:22.760
started working on so we are using Python because I said before I developed many scripts with Python

13:22.760 --> 13:29.320
because we need transcription we need many things that are working there where you can p-y cute

13:30.280 --> 13:39.080
so that it can run on any platform we spur a set it and noise redirect by dub and plugin architecture

13:40.040 --> 13:47.160
so we've a little more detail this is what it looks like I published that so that you can have a

13:47.160 --> 13:55.400
look at it and give us feedback of course because as I said and I'm repeating myself we need your feedback

13:57.480 --> 14:06.040
so we built the very first break we have a work in progress we have plugins we have transcription

14:06.920 --> 14:15.960
we started editing the audio so we have the core architecture the plugin system is functional

14:15.960 --> 14:21.880
we also got a grant from an alnets to make sure that this is sustainable and that's will we

14:21.880 --> 14:33.480
go to the end my company address of course is standing it but what this means it's things will break

14:34.440 --> 14:43.960
we will the UI will change your feedback shapes everything repeating again and now it's probably the

14:43.960 --> 14:50.440
best time to get involved because the more the mayor year but also the better ideas we can get

14:53.560 --> 14:58.520
we know that it won't be that easy there are many many things to do

14:59.400 --> 15:08.200
the platform compatibility GPU acceleration we don't even know if we want to do something like that on

15:08.200 --> 15:18.680
macOS with the new silicon systems managing compatibility with Python because plugins is the key

15:18.680 --> 15:28.120
and you know that it's very easy to make a mess with plugins and with stability because I'm using

15:29.160 --> 15:39.560
audacity I'm using like the latest version 3.7 and it's not stable and I say a lot of bad

15:39.560 --> 15:47.800
words when I'm editing right now I haven't tried the version 4 which is not finished yet you can

15:47.880 --> 15:55.800
do chapter is it's missing a lot of features but we need something that's stable and we want to

15:55.800 --> 16:00.840
spend like 10 minutes maybe half an hour on it not hours because it keeps on crashing

16:04.440 --> 16:12.840
so if you want to join us if you want to give us your feedback it can be technical it can be

16:12.840 --> 16:19.640
we're talking of features and environments things that we haven't sort of there's a website which

16:19.640 --> 16:28.200
is put lib.org there's a community website which is community.podlib.org and the source code is on

16:28.200 --> 16:34.760
codeberg because we don't like GitHub because we don't like Microsoft.

16:38.600 --> 16:41.480
Thank you.

16:51.800 --> 16:57.800
Thank you very much questions if we can write there's one right up at the back we have to use

16:57.800 --> 17:14.360
the microphones that people online can hear so thank you for everything I'm an enthusiast

17:14.360 --> 17:23.240
customer the user in self-hosting and my question is exactly if in your roadmap there is any

17:23.240 --> 17:31.480
seamless integration to start from the recording to the publishing on one very best platform which

17:31.480 --> 17:43.800
may be cast upon the world. Yeah of course we in the very first version we plan to integrate

17:43.800 --> 17:50.600
with cast-op-odd so that's you can publish directly from pod lib to cast-op-odd without even opening

17:50.600 --> 18:01.480
Firefox we also plan to have a local publishing because I think that a podcast in the end it's just

18:01.480 --> 18:10.680
MP3 files and analysis text file so we should be able to publish to a static website so this is one

18:10.680 --> 18:16.760
other if you have a static website also we want it to we want to the the app to be able to

18:16.760 --> 18:22.280
upload it to an FTP SFTP server so that's win one click it's online and you don't even need

18:22.280 --> 18:31.000
PHP on your server one thing that we also integrate and there's a talk later this afternoon

18:31.000 --> 18:41.480
it's called Fur Camp it's also a project granted by an alnet and it's a software that

18:41.480 --> 18:51.960
publishes static website for audio so and the thing is we're doing a plugin architecture so

18:51.960 --> 18:58.280
that's when you create a workflow you select the plugins that you want and in the end if you want

18:58.280 --> 19:03.640
to publish to cast-op-odd you select the cast-op-odd plugin and you say this is where I want to publish

19:04.200 --> 19:10.280
and if you do another podcast on another platform you duplicate this workflow and you just

19:10.840 --> 19:17.880
change the publishing plugin at the end put to publish it to I don't know Spotify for podcasts

19:19.080 --> 19:37.320
or YouTube podcast that's your decision and I have questions about the workflow you showed and about

19:37.320 --> 19:48.200
the software so I'm working in a okay I mean a nonprofit radio in Britain in France and we are

19:48.200 --> 19:56.760
looking for software who are more easier than a rapper so I'm very interested about the

19:56.760 --> 20:04.280
little although I'm also a sound engineer and I've seen a I have questions about some of the

20:04.280 --> 20:12.600
technical aspects and especially your work for you mentioned compressing audio before editing

20:12.600 --> 20:21.400
in and something like that I just want to be sure if you in the software you if it will be as flexible

20:21.400 --> 20:32.120
as a rapper which also a rapper can not only do music but a very lot of stuff I've done mixing

20:33.000 --> 20:41.640
for short films and other stuff so maybe we can also meet later

20:43.640 --> 20:48.680
after I'm not a big fan of reaper because it's not open source and everyone is using it because

20:48.680 --> 20:58.440
it's free if you choose not to pay for it so I don't like this business model either you pay for it

20:58.440 --> 21:03.720
or you don't and many podcasters in France are using it because you can cheat and get it for free

21:03.720 --> 21:14.520
without but and it's not open source and it runs on Linux but with some tricks the thing is

21:14.520 --> 21:22.840
to answer your question the workflow that I showed is an example it's my church and if you don't

21:23.720 --> 21:33.560
a few months ago I didn't even compress sound because I like sound like writer of the microphone

21:33.560 --> 21:39.240
I take very good care of the environment and the microphone so that's I don't have to change anything

21:39.240 --> 21:45.640
and some listeners say you have to compress because in the car we can't hear and so I change my mind

21:45.640 --> 21:53.240
and I change my workflow there is not one workflow even for a single person so of course you have to

21:53.240 --> 22:02.120
be able to adapt so very nice idea very nice initiative I think so thank you for that I have

22:02.120 --> 22:08.120
two questions but I'm not sure if I'm allowed the first one is about the PyTorch dependency that you mentioned

22:08.120 --> 22:14.120
that typically that's used for training not for inference and one because I know from experience

22:14.120 --> 22:19.960
dealing with packaging Python stuff course platform that becomes very hard to manage is that something

22:19.960 --> 22:28.920
that you are going to maybe require at runtime or only for like building or so yeah this is a

22:28.920 --> 22:36.600
copy paste form everything that we've been working on the past few years with my colleagues and I

22:37.400 --> 22:46.840
there's one thing that we not we don't know for sure maybe at some points we will build several

22:46.840 --> 22:53.400
versions so that you can choose maybe you don't care about transcription and you don't want this

22:53.400 --> 23:00.200
at all and you want something like a minimal install very small so that you can run on the very

23:00.280 --> 23:06.680
old laptop and that's probably what we're going to do if you want to keep it simple to provide

23:06.680 --> 23:16.040
several versions and we'll build many of them the thing is it has to we don't want too many of them

23:16.040 --> 23:22.040
because otherwise you won't be able to choose which one you want if I'm allowed the second question

23:22.040 --> 23:28.680
that's good the other question would be regarding because I saw their deep filter net that I had

23:28.680 --> 23:34.120
some experiments done with and there is a way to package everything it's something that I'm working

23:34.120 --> 23:38.120
together with a friend of mine to package everything into an audio plugin so you don't have to

23:38.840 --> 23:43.800
sell up any training don't have to sell up dependencies it just ends up with a single plugin binary

23:43.800 --> 23:49.960
basically would you consider maybe picking out all the pytorch all the other dependencies if you

23:49.960 --> 23:57.000
can just install one single plugin that does for example the like vocal isolation if it does it for you

23:57.720 --> 24:03.240
do you still need all these other AI tools if you have plugins that can do the real time processing

24:03.240 --> 24:11.960
of the audio we want to put as much as possible into the plugins we are I'm currently using whisper

24:12.840 --> 24:23.400
but whisper is not the the only transcription engine in town there is not a skewty the even

24:23.400 --> 24:31.800
meta release the like months or two ago trend models for a 1000 languages we've translation so

24:35.720 --> 24:48.600
we want to put as few as less possible in the core app of course unfortunately we've run out of time

24:48.680 --> 24:53.880
and we keep very strict time at fostered so thank you very much great to all great questions

24:53.880 --> 24:59.560
and great AVT creation everyone as our first

