+ All Categories
Home > Technology > Hg for bioinformatics, second part

Hg for bioinformatics, second part

Date post: 01-Dec-2014
Category:
Upload: giovanni-dallolio
View: 1,458 times
Download: 2 times
Share this document with a friend
Description:
The second part of a talk about hg and version control I gave to my colleagues in a group of bioinformaticians. First part here: http://www.slideshare.net/giovanni/hg-version-control-bioinformaticians
38
Hg and version control for bioinformatics 2
Transcript
Page 1: Hg for bioinformatics, second part

Hg and version control for bioinformatics2

Page 2: Hg for bioinformatics, second part

What you will learn from this talk

● Graphical interfaces to hg repos● Working with a remote copy of the repo on

bitbucket● Working together with other people

Page 3: Hg for bioinformatics, second part

Graphical interfaces to hg

Page 4: Hg for bioinformatics, second part

Graphical interfaces to hg

● In the last talk we saw hg as a command line tool

● However there are many graphical interfaces to it● Learning all the hg commands may be silly● Complex repositories may be difficult to navigate

without the help of a graphical interface

Page 5: Hg for bioinformatics, second part

tortoiseHG

● TortoiseHG is a multi-platform graphical interface that integrates with your file manager

● Once installed, it:● adds a few voices in the right-click menu on a file or

folder● install a tool called repository explorer

Page 6: Hg for bioinformatics, second part

TortoiseHG on your desktop

● This directory contains a hg repository

● Green and red symbols mark files tracked by hg

Page 7: Hg for bioinformatics, second part

TortoiseHG right-click menu

● Right click on the folder and look at the new voices in the menu

Page 8: Hg for bioinformatics, second part

Right-click on a file

● Right click on a file gives you more options ● Commit changes if

the file differs from last saved version

● Check the history of the file

● Revert it to previous version

● Etc...

Page 9: Hg for bioinformatics, second part

The tortoise-hg repository exporer

● The tortoise repository explorer is a graphical tool to manage a hg repository:● Browse the historial

● Commit changes

● Manage branches

● Upload to a remote server

Page 10: Hg for bioinformatics, second part

The repository explorer

1. Historial of changes

2. Files changed in the selected commit

3. Changes made to selected files in selected commit

Page 11: Hg for bioinformatics, second part

Making a commit from the Repository explorer

Tools menu → Commit

Page 12: Hg for bioinformatics, second part

Setting up a repository on bitbucket

Page 13: Hg for bioinformatics, second part

Having a copy of your repository on a remote location

● In the real world, people always keep a copy of their repository on a remote server

● Advantages:● backups ● Can access the code from anywhere

● The smartest thing is to use a free code hosting service (github, bitbucket, etc..)

Page 14: Hg for bioinformatics, second part

Code hosting services

● There are many ~free code hosting services:● Bitbucket (hg)● Github, Gitorious (git)● Launchpad (bzr)● Sourceforge (svn, various)

● Bitbucket has fairly good conditions for our case:● Unlimited private and public repositories● Unlimited disk space● Only limit is: 5 collaborators max per account

Page 15: Hg for bioinformatics, second part

Register a free account on bitbucket

http://bitbucket.org

Page 16: Hg for bioinformatics, second part

Recommended: set up a ssh key

● After registering to bitbucket, the first thing you should do is setting up a ssh key

● Go to 'Account' → Add SSH Keys

● Safer transfers through Internet

● Don't have to type password every time

Page 17: Hg for bioinformatics, second part

Creating a Repo on bitbucket

● Just click on 'Repositories' → create new repo

Page 18: Hg for bioinformatics, second part

Creating a Repo on bitbucket

● Just keep following the instructions

● ssh key is recommended

Page 19: Hg for bioinformatics, second part

Cloning a repo

● After creating a repository on bitbucket, it will give you an url that you can use to download the repo on your computer.

● Example:https://bitbucket.org/dalloliogm/secret-repossh://bitbucket.org/dalloliogm/secret-repo

● Just use the hg clone command:hg clone ssh://bitbucket.org/dalloliogm/secret-repo

● You can also clone a repository created by someone else ● (or clone your repository on another computer/directory)

Page 20: Hg for bioinformatics, second part

Synchronizing an existing repo with bitbucket

● What happens if you have created your repository in local before creating it on bitbucket?

● No problem, follow the instructions and you can synchronize them

Page 21: Hg for bioinformatics, second part

Setting up remote repo (tortoise)

● Go to Tools → Settings → Synchronize

Page 22: Hg for bioinformatics, second part

Setting up remote repo (manually)

● Open the .hg/hgrc file inside the repo main directory

● Add the following:[paths]default = ssh://bitbucket.org/dalloliogm/secret-repo

Page 23: Hg for bioinformatics, second part

Working with remote repos

Page 24: Hg for bioinformatics, second part

Now, let's get serious!

● You have successfully set up a remote copy of your code on bitbucket

● Let's see how it works

Page 25: Hg for bioinformatics, second part

Hg – working with remote repos

● hg clone → get a copy of an existing repo (only once)

● hg pull → get the list of changes from the latest version on the remote repository

● hg update → apply the changes from the latest pulled version to the current working directory

● hg merge → merge conflicting versions● hg push → push the local changes to the remote

repository

Page 26: Hg for bioinformatics, second part

Hg clone

● This command creates a copy of a repository on your computer● For example, a copy of a repository on bitbucket

● Launch it only once per repository

Page 27: Hg for bioinformatics, second part

Hg pull & update

● Hg pull gets the list of changes made to the remote repository since the last time you cloned/pulled it● It checks whether one of your colleagues has updated a

new version to the remote repo

● These changes are not applied automatically to the current working directory;● You have to do a hg update after a hg pull to update your

local files● hg pull -u → pulls & updates

Page 28: Hg for bioinformatics, second part

Hg push

● The hg push command sends the changes you have made in local to the remote server

● The command fails if other people have pushed other changes before you● You always have to make a pull&update (and

merge) before doing a push● More on this later

Page 29: Hg for bioinformatics, second part

Exercise

● Try to use bitbucket as a repository for your own script

● Commit your versions in local, and push them to bitbucket as a backup copy

● You can clone (and later pull&update) the repo on your computer at home

Page 30: Hg for bioinformatics, second part

Hg for our pipeline

Page 31: Hg for bioinformatics, second part

Applying hg to our pipeline

● Someone should initialize a repo on the root directory (only once)

● Add, commit, document● Push a copy of the repo on bitbucket● Everybody will clone the repo from there, and

pull/push changes from there

Page 32: Hg for bioinformatics, second part

What to include in the repo

● Code, documentation● We may create another repository for results

and parameters● For each set of results, we should be able to

know which version of the scripts and which parameters have been used

Page 33: Hg for bioinformatics, second part

Executing the pipeline on the cluster

● Connect to the cluster● Hg pull & update from bitbucket (to get the

latest code)● Test to verify whether it works correctly on the

cluster?● Execute the pipeline

Page 34: Hg for bioinformatics, second part

Proposal: code reviews

● One person may be in charge of writing the core pipeline

● Other people can clone the repository and improve it (code review)

● So we will work on the same code, and hopefully make it better

Page 35: Hg for bioinformatics, second part

Collective code ownership

● In the perfect group, nobody is 'the only author' of a script

● Code is just a medium :-)● A single script written by two persons is much

better than two redundant scripts

Page 36: Hg for bioinformatics, second part

The daily pull

● Every day, the first thing you should do is a hg pull & update to get the latest version of the code

● Make your changes in local and commit them.● When you are ready, pull&update again to align

your code to the remote copy, then push to bitbucket

● Beware of conflicting changes..

Page 37: Hg for bioinformatics, second part

Merging and conflicts

● What happens when two people work on the same code on different computers?● Two different versions of the code will exist

● How to merge them?● Ask me :-)● Never force the push (hg push -f) – you will delete

other people's work● Always do a hg pull&update before a push;

eventually use hg merge to integrate other people's changes

Page 38: Hg for bioinformatics, second part

Making changes to the pipeline

● Get the latest copy of the pipeline from bitbucket (pull&update)

● Make changes, commit● pull&update, push to bitbucket● Connect to cluster, pull&update, execute

pipeline


Recommended