+ All Categories
Home > Software > Puppet Camp London 2015 - Helping Data Teams with Puppet

Puppet Camp London 2015 - Helping Data Teams with Puppet

Date post: 26-Jul-2015
Category:
Upload: puppet-labs
View: 64 times
Download: 0 times
Share this document with a friend
36
STYLIGHT.COM Helping Data Teams with Puppet STYLIGHT.COM SERGII KHOMENKO, DATA SCIENTIST, [email protected], @lc0d3r
Transcript

S T Y L I G H T . C O M

Helping Data Teams wi th Puppet

S T Y L I G H T . C O M

S E R G I I K H O M E N K O , D A T A S C I E N T I S T , S E R G I I . K H O M E N K O @ S T Y L I G H T . C O M , @ l c 0 d 3 r

W h o ? W h a t ? W h y ? S e t t i n g u p y o u r B I w i t h p u p p e t .

S m a l l t i p s a n d t r i c k s P u p p e t y o u r r a n k i n g

A G E N D A

Data scientist at one of the biggest fashion communities, STYLIGHT. Data analysis and visualization hobbyist. Speaker at Berlin Buzzwords 2014, ApacheCon Europe 2014 Founder and speaker at Munich Golang UG, Munich Tableau UG. Speaker at Munich UseR Group, Munich Search UG, Munich Quantified Self UG.

Sergii Khomenko

Milos Radovanovic

Passionate about DevOps stuff: 1. microservices 2. docker 3. 12 factor apps 4. continuous integration/deployment

L i v e i n 1 2 c o u n t r i e s STYLIGHT – international community

S T Y L I G H T . C O M

Setting up your BI with puppet.

T a b l e a u - r e p o r t i n g a n d a d - h o c s P y t h o n / T a l e n d E T L t o o l s

Minimum Viable BI

R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E

Minimum Viable BI

We use Puppet for *nix servers and can’t merge with Windows machine Standalone mode for Puppet

– easier to start and develop – windows machines are separated from *nix ones

R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E

Minimum Viable BI

cd c:\folder\with\our-bi git pull origin master IF %ERRORLEVEL% NEQ 0 set context=GIT_FAILURE && goto error_handler puppet apply --modulepath=puppet\modules puppet\win-node-name.net.pp IF %ERRORLEVEL% NEQ 0 set context=PUPPET_FAILURE && goto error_handler goto end

R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E

Minimum Viable BI

:error_handler echo entering error_handler EVENTCREATE /T ERROR /L APPLICATION /SO Puppet_Scheduler /ID 100 /D "EXECUTION FAILED REASON %context%" goto end :end echo DONE

Minimum Viable BI

Standalone mode for Puppet – configuration is totally separated – custom modules --modulepath=puppet\modules – Github hosted configuration – Error handling via Windows event log

R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E

Minimum Viable BI

node  'ʹwin-­‐‑node-­‐‑name.net'ʹ  {        scheduled_task  {'ʹrefresh-­‐‑1'ʹ:            ensure        =>  present,            enabled      =>  true,            command      =>  'ʹC:\path\to\your\script.bat'ʹ,            arguments  =>  'ʹsome  args  'ʹ,            

S C H E D U L I N G I S I M P O R T A N T

Minimum Viable BI

           user  =>  'ʹyour-­‐‑user'ʹ,            password  =>  'ʹyour-­‐‑password'ʹ,            trigger      =>  {                schedule      =>  daily,                start_time  =>  'ʹ06:00'ʹ,            }        }

S C H E D U L I N G I S I M P O R T A N T

Minimum Viable BI

# Can't use the Puppet's scheduled_task as it does not support to run the schedule task every 5 minutes. https://github.com/sdliangzhihua/windows-puppet-example/blob/master/manifest.pp#L68

S Y N C M Y C O N F I G U R A T I O N E V E R Y 1 5 M I N

Minimum Viable BI

$cmd = 'C:\Windows\system32\cmd.exe' $job_name = 'sync_code' exec { 'CreateCodeSyncScheduledTask': command => "${cmd} /C schtasks /create /sc MINUTE /mo 15 /tn ${job_name} /tr C:\\your\\puppet.bat /ru administrator /f", onlyif => ["${cmd} /C schtasks /query /tn ${job_name} & if errorlevel 1 (exit /b 0) else exit /b 1"], }

S Y N C M Y C O N F I G U R A T I O N E V E R Y 1 5 M I N

S T Y L I G H T . C O M

Small tips and tricks do  not  repeat  yourself  and  other  tricks

Minimum Viable BI

node  'ʹwin-­‐‑node-­‐‑name.net'ʹ  {        scheduled_task  {'ʹrefresh-­‐‑1'ʹ:            ensure        =>  present,            enabled      =>  true,            command      =>  'ʹC:\path\to\your\script.bat'ʹ,            arguments  =>  'ʹsome  args  'ʹ,            

S C H E D U L I N G I S I M P O R T A N T

Small tips and tricks

class  job_scheduler(        $ensure                        =  $job_scheduler::params::ensure,        $enabled                    =  $job_scheduler::params::enabled,        $user                                =  $job_scheduler::params::user,        $password              =  $job_scheduler::params::password,        $working_dir    =  $job_scheduler::params::working_dir, )inherits  job_scheduler::params{ }

Small tips and tricks

define  job_scheduler::job (        $arguments              ='ʹtableau_adobe.py'ʹ,        $command                  ='ʹc:\Py27-­‐‑32\python.exe'ʹ,        $schedule_type      ='ʹdaily'ʹ,        $start_time            ='ʹ08:15'ʹ,        $day_of_week          ='ʹevery'ʹ, ) {

Small tips and tricks

define  job_scheduler::tableau_job (        $arguments              ='ʹdefault-­‐‑tableau'ʹ,        $command                  ='ʹc:\folder\tableau.bat'ʹ,        $schedule_type      ='ʹdaily'ʹ,        $start_time            ='ʹ21:00'ʹ,        $day_of_week          ='ʹevery'ʹ, ) {

Small tips and tricks

# Params with default values for the tableau job # that might be changed in a job definition # # 1. $arguments ='default-argument', # 2. $command ='c:\folder\script.bat', # 3. $schedule_type ='daily', # 4. $start_time ='21:00', # 5. $day_of_week ='every', ####################

Small tips and tricks

job_scheduler::tableau_job { ’some job': start_time => '01:00', arguments => ’args'; ’default refresh-1': start_time => '06:00'; 'default refresh-2': start_time => '10:00'; 'weekly update': start_time => '03:35', arguments => 'weekly-update', schedule_type => weekly, day_of_week => ['mon']; }

Small tips and tricks

job_scheduler::redshift_job  {            'ʹRS  tagged  products'ʹ:                  start_time  =>  'ʹ00:40'ʹ,  params  =>  'ʹ..\datasources\something.tds'ʹ;            'ʹRS  another  job'ʹ:  start_time  =>  'ʹ00:50'ʹ,  params  =>  'ʹ..\datasources\else.tds'ʹ

S T Y L I G H T . C O M

Puppet your ranking Lean,  flexible,  powerful

A r a n k i n g i s a r e l a t i o n s h i p b e t w e e n a s e t o f i t e m s s u c h t h a t ,

f o r a n y t w o i t e m s , t h e f i r s t i s e i t h e r ' r a n k e d h i g h e r t h a n ' ,

' r a n k e d l o w e r t h a n ' o r ' r a n k e d e q u a l t o ' t h e s e c o n d .

Ranking specifics:

•  Seasonal influence •  Trends •  Cold start of new countries, shops •  Multiple dimensions of ranking model

Requirements: •  Decreasing time to implement new ranking

model •  Keeping working infrastructure alive •  A/B testing without changing entire

infrastructure •  Performance level - “still fast” and

“transparent”

Lean approach to Ranking M u l t i p l e p o i n t s o f e v a l u a t i o n

Jboss Solr-loadbalancer nginx Solr

nginx Solr

nginx Solr

Common search infrastructure

Updated infrastructure

Jboss Solr-loadbalancer nginx Solr

nginx Solr

nginx Solr

Jboss Solr-loadbalancer nginx Solr

Front-end loadbalancer

q = +brand:adidas shop:monshowroom^3 q = +adidas monshowroom defType = dismax qf = brand shop^3 sort = user_ratings desc, score desc qq = adidas q = {!boost b=$b defType=dismax v=$qq} b = prod(popularity, clicks)

Lean approach to Ranking

Lean approach to Ranking solr0x.node.company.pp

include nginx nginx::config { "solr_dev": } nginx::solr-ranking { "delta2": ur ls => [ “ /some.thing?

gender=women&brand=2271&tag=1161&tag=877&tag=468", " /some.thing?

gender=men&brand=11235&tag=10203&tag=10299&tag=10326" ] ,

Lean approach to Ranking

<% urls.each do |url| -%> if ($args ~* <% if url[ 'gender'] > 0 -%>gender_id%3A<

%= url[ 'gender'] %>.*<% end -%><% url[ ' tags'].each do |tag| -%>tag_id%3A<%= tag %>.*<% end -%><% if url[ 'brand'] > 0 -%>brand_id%3A%28<%= url[ 'brand'] %>%29<% end -%>) {

set $orig $args; set $args "q={!boost+b=%24b+defType=dismax+v=

%24qq}&qq=id:*"; rewrite ^(.*)$ "$1?$orig" break; } <% end -%>

nginx / templates / conf / solr-rewrites.conf.erb

Stages to evaluate a model: •  R ranking model •  Independent Solr-node

1.  For internal use-cases 2.  Testing for some of pages 3.  A/B roll out for % of users

•  Production roll out

Lean approach to Ranking M u l t i p l e p o i n t s o f e v a l u a t i o n

Thanks for your attention!

S T Y L I G H T . C O M

Sergii Khomenko Data Scientist

STYLIGHT GmbH [email protected]

@lc0d3r

Nymphenburger Straße 86 80636 Munich, Germany


Recommended