Date post: | 14-Apr-2017 |
Category: |
Engineering |
Upload: | shoichi-kaji |
View: | 169 times |
Download: | 3 times |
Me• Shoichi Kaji
• Tokyo, Japan
• pause/github: skaji
• Perl5: cpm, App::FatPacker::Simple, Mojo::SlackRTM
• Perl6: mi6, Frinfon, evalbot in Slack:)
Agenda• What is cpm, and why?
• cpanm VS cpm
• The internal of cpm
• divide installing processes into pieaces
• learn from go language
• Roadmap
Why a new CPAN client?
• Yes, I always use cpanm to install CPAN modules. It’s awesome!
• Because cpanm installs modules in series,it takes quite a lot of time to install a module that has many dependencies
Why a new CPAN client?
• So I created cpm
• Actually cpm is not a new CPAN client, but it uses cpanm in parallel,so that it can install CPAN modules much faster
First, let’s think simple
$ cat modules | xargs cpanm
Can we just use xargs to parallelize cpanm?
NO, WE CAN’T.
The problem with
• The modules to be installed are not determined in advance.
• Even if you have a list of modules to be installed, cpanm workers will be broken unless you synchronize cpanm workers
• So we have to
• (1) divide installing process of CPAN module into pieces that can be executed individually
• (2) synchronize cpanm workers in some way
$ cat modules | xargs cpanm
(1) Divide installing process of CPAN modules
sub installing_process { my $module = shift; # 1. resolve # query cpanmetadb my $dist_url = resolve($module);
# 2. fetch (and extract) # wget && tar xzf && read META.json my ($dir, @configure_deps) = fetch($dist_url); install_module($_) for @configure_deps;
# 3. configure # perl Makefile.PL/Build.PL && read MYMETA.json my @deps = configure($dir); install_module($_) for @deps;
# 4. install # make install (or ./Build install) install($dir);}
I divided the process into 4 jobs:
* resolve * fetch * configure * install
which are independent
Take a look at go language…go introduces two concurrency primitives: * goroutines * channels They are very simple but powerful.
func work(in <-chan string, out chan<- string) {for {
job := <-in// do work with jobout <- "result"
}}
func main() {in := make(chan string)out := make(chan string)go work(in, out)in <- "job"result := <-out
}
Take a look at go language…func main() {
in1 := make(chan string)out1 := make(chan string)go work(in1, out1)
in2 := make(chan string)out2 := make(chan string)go work(in2, out2)
in1 <- "job1"in2 <- "job2"
select {case result1 := <-out1:
// do something with result1case result2 := <-out2:
// do something with result2}
}
It is very easy to increase workers
You can use select() to await multiple channels simultaneously
The internal of cpmMaster
cpnam worker
cpnam worker
cpnam worker
select
pipe x 2
pipe x 2 pipe x 2
cpanm worker 1. get job via pipe 2. work, work, work! 3. send result via pipe
Master 1. prepare pipes for
workers by pipe(2) 2. launch workers by
fork(2) and connect them with pipes
3. loop {calculate jobs and send jobs to idle workers. if all workers are busy, then wait them and recieve results by select(2)}
Roadmap
• Last year I talked with Tatsuhiko Miyagawa about cpanm 2.0 (menlo)
• Then he said “why don’t you merge cpm into cpanm itself?”
• I was very happy to hear that!
Roadmap• So if you all find cpm is useful and stable,
then cpm should be merged into cpanm 2.0
• Before merging, there are some problems that need to be resolved:
• The log file is very messy
• I will highly appreciate your feedback!