+ All Categories
Home > Data & Analytics > Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Date post: 24-Jan-2018
Category:
Upload: sergii-khomenko
View: 741 times
Download: 0 times
Share this document with a friend
36
STYLIGHT.COM Helping Data Teams with Puppet STYLIGHT.COM SERGII KHOMENKO, DATA SCIENTIST, [email protected], @lc0d3r
Transcript
Page 1: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

S T Y L I G H T . C O M

Helping Data Teams wi th Puppet

S T Y L I G H T . C O M

S E R G I I K H O M E N K O , D A T A S C I E N T I S T , S E R G I I . K H O M E N K O @ S T Y L I G H T . C O M , @ l c 0 d 3 r

Page 2: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

W h o ? W h a t ? W h y ? S e t t i n g u p y o u r B I w i t h p u p p e t .

S m a l l t i p s a n d t r i c k s P u p p e t y o u r r a n k i n g

A G E N D A

Page 3: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Data scientist at one of the biggest fashion communities, STYLIGHT. Data analysis and visualization hobbyist. Speaker at Berlin Buzzwords 2014, ApacheCon Europe 2014 Founder and speaker at Munich Golang UG, Munich Tableau UG. Speaker at Munich UseR Group, Munich Search UG, Munich Quantified Self UG.

Sergii Khomenko

Milos Radovanovic

Passionate about DevOps stuff: 1. microservices 2. docker 3. 12 factor apps 4. continuous integration/deployment

Page 4: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Page 5: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Page 6: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

L i v e i n 1 2 c o u n t r i e s STYLIGHT – international community

Page 7: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

S T Y L I G H T . C O M

Setting up your BI with puppet.

Page 8: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

T a b l e a u - r e p o r t i n g a n d a d - h o c s P y t h o n / T a l e n d E T L t o o l s

Minimum Viable BI

Page 9: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E

Minimum Viable BI

We use Puppet for *nix servers and can’t merge with Windows machine Standalone mode for Puppet

– easier to start and develop – windows machines are separated from *nix ones

Page 10: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E

Minimum Viable BI

cd c:\folder\with\our-bi git pull origin master IF %ERRORLEVEL% NEQ 0 set context=GIT_FAILURE && goto error_handler puppet apply --modulepath=puppet\modules puppet\win-node-name.net.pp IF %ERRORLEVEL% NEQ 0 set context=PUPPET_FAILURE && goto error_handler goto end

Page 11: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E

Minimum Viable BI

:error_handler echo entering error_handler EVENTCREATE /T ERROR /L APPLICATION /SO Puppet_Scheduler /ID 100 /D "EXECUTION FAILED REASON %context%" goto end :end echo DONE

Page 12: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Minimum Viable BI

Standalone mode for Puppet – configuration is totally separated – custom modules --modulepath=puppet\modules – Github hosted configuration – Error handling via Windows event log

R U N N I N G P U P P E T I N A S T A N D A L O N E M O D E

Page 13: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Minimum Viable BI

node  'ʹwin-­‐‑node-­‐‑name.net'ʹ  {        scheduled_task  {'ʹrefresh-­‐‑1'ʹ:            ensure        =>  present,            enabled      =>  true,            command      =>  'ʹC:\path\to\your\script.bat'ʹ,            arguments  =>  'ʹsome  args  'ʹ,            

S C H E D U L I N G I S I M P O R T A N T

Page 14: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Minimum Viable BI

           user  =>  'ʹyour-­‐‑user'ʹ,            password  =>  'ʹyour-­‐‑password'ʹ,            trigger      =>  {                schedule      =>  daily,                start_time  =>  'ʹ06:00'ʹ,            }        }

S C H E D U L I N G I S I M P O R T A N T

Page 15: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Minimum Viable BI

# Can't use the Puppet's scheduled_task as it does not support to run the schedule task every 5 minutes. https://github.com/sdliangzhihua/windows-puppet-example/blob/master/manifest.pp#L68

S Y N C M Y C O N F I G U R A T I O N E V E R Y 1 5 M I N

Page 16: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Minimum Viable BI

$cmd = 'C:\Windows\system32\cmd.exe' $job_name = 'sync_code' exec { 'CreateCodeSyncScheduledTask': command => "${cmd} /C schtasks /create /sc MINUTE /mo 15 /tn ${job_name} /tr C:\\your\\puppet.bat /ru administrator /f", onlyif => ["${cmd} /C schtasks /query /tn ${job_name} & if errorlevel 1 (exit /b 0) else exit /b 1"], }

S Y N C M Y C O N F I G U R A T I O N E V E R Y 1 5 M I N

Page 17: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

S T Y L I G H T . C O M

Small tips and tricks do  not  repeat  yourself  and  other  tricks

Page 18: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Minimum Viable BI

node  'ʹwin-­‐‑node-­‐‑name.net'ʹ  {        scheduled_task  {'ʹrefresh-­‐‑1'ʹ:            ensure        =>  present,            enabled      =>  true,            command      =>  'ʹC:\path\to\your\script.bat'ʹ,            arguments  =>  'ʹsome  args  'ʹ,            

S C H E D U L I N G I S I M P O R T A N T

Page 19: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Small tips and tricks

class  job_scheduler(        $ensure                        =  $job_scheduler::params::ensure,        $enabled                    =  $job_scheduler::params::enabled,        $user                                =  $job_scheduler::params::user,        $password              =  $job_scheduler::params::password,        $working_dir    =  $job_scheduler::params::working_dir, )inherits  job_scheduler::params{ }

Page 20: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Small tips and tricks

define  job_scheduler::job (        $arguments              ='ʹtableau_adobe.py'ʹ,        $command                  ='ʹc:\Py27-­‐‑32\python.exe'ʹ,        $schedule_type      ='ʹdaily'ʹ,        $start_time            ='ʹ08:15'ʹ,        $day_of_week          ='ʹevery'ʹ, ) {

Page 21: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Small tips and tricks

define  job_scheduler::tableau_job (        $arguments              ='ʹdefault-­‐‑tableau'ʹ,        $command                  ='ʹc:\folder\tableau.bat'ʹ,        $schedule_type      ='ʹdaily'ʹ,        $start_time            ='ʹ21:00'ʹ,        $day_of_week          ='ʹevery'ʹ, ) {

Page 22: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Small tips and tricks

# Params with default values for the tableau job # that might be changed in a job definition # # 1. $arguments ='default-argument', # 2. $command ='c:\folder\script.bat', # 3. $schedule_type ='daily', # 4. $start_time ='21:00', # 5. $day_of_week ='every', ####################

Page 23: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Small tips and tricks

job_scheduler::tableau_job { ’some job': start_time => '01:00', arguments => ’args'; ’default refresh-1': start_time => '06:00'; 'default refresh-2': start_time => '10:00'; 'weekly update': start_time => '03:35', arguments => 'weekly-update', schedule_type => weekly, day_of_week => ['mon']; }

Page 24: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Small tips and tricks

job_scheduler::redshift_job  {            'ʹRS  tagged  products'ʹ:                  start_time  =>  'ʹ00:40'ʹ,  params  =>  'ʹ..\datasources\something.tds'ʹ;            'ʹRS  another  job'ʹ:  start_time  =>  'ʹ00:50'ʹ,  params  =>  'ʹ..\datasources\else.tds'ʹ

Page 25: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

S T Y L I G H T . C O M

Puppet your ranking Lean,  flexible,  powerful

Page 26: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

A r a n k i n g i s a r e l a t i o n s h i p b e t w e e n a s e t o f i t e m s s u c h t h a t ,

f o r a n y t w o i t e m s , t h e f i r s t i s e i t h e r ' r a n k e d h i g h e r t h a n ' ,

' r a n k e d l o w e r t h a n ' o r ' r a n k e d e q u a l t o ' t h e s e c o n d .

Page 27: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Ranking specifics:

•  Seasonal influence •  Trends •  Cold start of new countries, shops •  Multiple dimensions of ranking model

Page 28: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Requirements: •  Decreasing time to implement new ranking

model •  Keeping working infrastructure alive •  A/B testing without changing entire

infrastructure •  Performance level - “still fast” and

“transparent”

Lean approach to Ranking M u l t i p l e p o i n t s o f e v a l u a t i o n

Page 29: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Jboss Solr-loadbalancer nginx Solr

nginx Solr

nginx Solr

Common search infrastructure

Page 30: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Updated infrastructure

Jboss Solr-loadbalancer nginx Solr

nginx Solr

nginx Solr

Jboss Solr-loadbalancer nginx Solr

Front-end loadbalancer

Page 31: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

q = +brand:adidas shop:monshowroom^3 q = +adidas monshowroom defType = dismax qf = brand shop^3 sort = user_ratings desc, score desc qq = adidas q = {!boost b=$b defType=dismax v=$qq} b = prod(popularity, clicks)

Lean approach to Ranking

Page 32: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Lean approach to Ranking solr0x.node.company.pp

include nginx nginx::config { "solr_dev": } nginx::solr-ranking { "delta2": ur ls => [ “ /some.thing?

gender=women&brand=2271&tag=1161&tag=877&tag=468", " /some.thing?

gender=men&brand=11235&tag=10203&tag=10299&tag=10326" ] ,

Page 33: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Lean approach to Ranking

<% urls.each do |url| -%> if ($args ~* <% if url[ 'gender'] > 0 -%>gender_id%3A<

%= url[ 'gender'] %>.*<% end -%><% url[ ' tags'].each do |tag| -%>tag_id%3A<%= tag %>.*<% end -%><% if url[ 'brand'] > 0 -%>brand_id%3A%28<%= url[ 'brand'] %>%29<% end -%>) {

set $orig $args; set $args "q={!boost+b=%24b+defType=dismax+v=

%24qq}&qq=id:*"; rewrite ^(.*)$ "$1?$orig" break; } <% end -%>

nginx / templates / conf / solr-rewrites.conf.erb

Page 34: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Stages to evaluate a model: •  R ranking model •  Independent Solr-node

1.  For internal use-cases 2.  Testing for some of pages 3.  A/B roll out for % of users

•  Production roll out

Lean approach to Ranking M u l t i p l e p o i n t s o f e v a l u a t i o n

Page 35: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

Thanks for your attention!

Page 36: Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015

S T Y L I G H T . C O M

Sergii Khomenko Data Scientist

STYLIGHT GmbH [email protected]

@lc0d3r

Nymphenburger Straße 86 80636 Munich, Germany


Recommended