Post on 24-May-2015
description
transcript
Git – Introduc-on Ovidiu Dimulescu
• Version Control Systems history • What is Git? • Design Goals and Implementa=on • Why Git? • Git Internals • Installa=on • Commands Overview • SVN to Git migra=on • Q&A
1st Genera=on -‐ RCS, SCCS • Single file-‐based opera=ons • Local only
2nd Genera=on – SVN, CVS, TFS • Mul=-‐file based opera=ons • Centralized client / server architecture • Merge before commit
3rd Genera=on – Git, Hg, Bzr, Arch • Mul=-‐file based opera=ons • Decentralized / Distributed architecture • Commit before merge
Version Control Systems History
• Tendency towards more concurrency
• DVCS get most of the buzz and innova=on
Open Source -‐ Git, Mercurial, Bazaar, Monotone, Fossil
Commercial -‐ Kiln – Fogbuzz (based on Mercurial) -‐ Veracity – SourceGear -‐ Plas=cSCM
• 2nd Gen VCS advances are slowing down Last SVN major release took 2yrs
Version Control Systems Trends
Git is an open source, distributed version control system created by Linus Torsvald* *Git is also Bri=sh English slang for a stupid or unpleasant person. According to Linus Torvalds "I'm an ego=s=cal bastard, and I name all my projects a`er myself. First Linux, now git “
What is Git?
• Strong support for non-‐linear development
Design Goals & Implementa-on
• Strong support for non-‐linear development
Git supports rapid branching and merging, and includes specific tools for visualizing and naviga=ng a non-‐linear development history. A core assump=on in Git is that a change will be merged more o`en than it is wricen, as it is passed around various reviewers
Design Goals & Implementa-on
• Strong support for non-‐linear development • Distributed development
Design Goals & Implementa-on
• Strong support for non-‐linear development • Distributed development
-‐ work offline with full repository history -‐ every local copy is a backup -‐ everything is fast -‐ sync directly with any collaborator
Design Goals & Implementa-on
• Strong support for non-‐linear development • Distributed development • Compa=bility with exis=ng systems/protocols
Design Goals & Implementa-on
• Strong support for non-‐linear development • Distributed development • Compa=bility with exis=ng systems/protocols
Repositories can be published via HTTP/S, FTP/S, rsync, or a Git protocol over either a plain socket or ssh. Git also has a CVS server emula=on, which enables the use of exis=ng CVS clients and IDE plugins to access Git repositories. Subversion and svk repositories can be used directly with git-‐svn.
Design Goals & Implementa-on
• Strong support for non-‐linear development • Distributed development • Compa=bility with exis=ng systems/protocols • Efficient handling of large projects
Design Goals & Implementa-on
• Strong support for non-‐linear development • Distributed development • Compa=bility with exis=ng systems/protocols • Efficient handling of large projects • Very strong safeguards against corrup=on, either accidental or malicious
Design Goals & Implementa-on
• Strong support for non-‐linear development • Distributed development • Compa=bility with exis=ng systems/protocols • Efficient handling of large projects • Very strong safeguards against corrup=on, either accidental or malicious
Git repository history is cryptographically authen=cated by being stored in such a way that the name of a par=cular revision (a "commit" in Git terms) depends upon the complete development history leading up to that commit. Once it is published, it is not possible to change the old versions without it being no=ced.
Design Goals & Implementa-on
• Strong support for non-‐linear development • Distributed development • Compa=bility with exis=ng systems/protocols • Efficient handling of large projects • Very strong safeguards against corrup=on, either accidental or malicious
• Toolkit-‐based design
Design Goals & Implementa-on
• Strong support for non-‐linear development • Distributed development • Compa=bility with exis=ng systems/protocols • Efficient handling of large projects • Very strong safeguards against corrup=on, either accidental or malicious
• Toolkit-‐based design -‐ Low-‐level commands aka plumbing -‐ High-‐level wrappers aka porcelain
Design Goals & Implementa-on
• Rolls off the tongue well J • Ton of good documenta=on freely available • Cool kids are doing it: Android, Ruby on Rails, Drupal, jQuery, PostgreSQL, Linux Kernel, Perl,,, Eclipse, VLC, Samba, YUI, Wine, Gnome, KDE, X.org, Debian, etc.
• Available on major code hos=ng services: GitHub.com, BitBucket.org, Google Code, SourceForge, RubyForge, etc.
• Marketable skill • Ac=vely developed ecosystem
Why Git? For Everyone
• Intrinsic replica=on for DR, remote teams, etc. • Lower system resources on the reference repo • Strong context aware CLI support • Repository format is stable. Upgrades are usually quick as they change metadata only
• No single point of failure* • No more | grep –v .svn J
Why Git? For Ops
• Fast (logs, diffs, etc.) • Facilitates experimenta=on – Fast branch crea=on and switching – Easy merge and re-‐merge – Shelving (stashes in Git speak)
• Mul=tasking • Control over commits (order, messages, content)
before making them public • Cherry-‐picking to back/forward port bug-‐fixes
Why Git? For Developers
• Custom code promo=on policies based on who has commits rights to the reference repository. The majority of projects fall in one of the following or combina=on of: – All developers are direct commicers – A project maintainer is the sole commicer
• Code reviews. Contributors do not need to publish code to a repo. Portable patches or tarballs can be generated by Git and emailed.
Why Git? For Managers
Workflows -‐ All developers are direct commiJers
• Everyone has commit access • Uses a familiar paradigm • Works well for most teams, minimal overhead and boclenecks • First developer to commit a change to the same file wins. Second
developer has to merge. Git will no=fies the second developer and will refuse a commit un=l conflicts (if any) are marked as resolved
Workflows – Project maintainer is the sole commiJer
1. Only the project maintainer can commit to the reference repository 2. Contributors clone (fork) that repository and makes changes 3. Contributor commits those changes to their own public copy 4. Contributor asks the projects maintainer to review and pull changes 5. The project maintainer adds the contributor’s repo as a remote and
merges locally 6. Then the maintainer commits merged changes to the reference repository Common in public projects. GitHub uses this model. Advantages: • No commicer setup needed for the reference repository • Each party can work independently
So, how fast is Git? Let’s see some pictures *
* Comparison as of 2009
• The project uses large binaries that change o`en: game development, CAD engineering, etc. requiring large space – Solu=ons: git-‐annex, git-‐media
• The project is extremely large that a full local copy is not feasible (space and / or =me) – Workarounds: shallow clones, sparse checkouts, submodules, subtrees
Why Not Git?
Git Internals
• Git is content-‐addressable file-‐system that has a no=on of versions
• Versions are implemented as snapshots of an en-re tree • Git has two data structures
– Mutable index that caches informa=on about the working directory and the next revision to be commiced
– Immutable object database
• The index is the middle man between the object database and the working directory. Aka staging or cache area.
• The object database has four types of objects: -‐ BLOB -‐ TREE -‐ COMMIT -‐ TAG
• Each file revision is stored as a unique BLOB object. The object iden-fier is an SHA-‐1 hash of its content.
Git Internals – Storage
• A BLOB object is the content of a file. Blob objects have no filename, =mestamps, or other metadata.
• A TREE object is the equivalent of a directory. It contains a list of filenames, each with some type bits and the name of a blob or tree object that is that file, symbolic link, or directory's contents. This object describes a snapshot of the source tree.
• A COMMIT object links tree objects together into a history. It contains the name of a tree object (of the top-‐level source directory), a =mestamp, a log message, and the names of zero or more parent commit objects.
• A TAG object is a container that contains reference to another object and can hold addi=onal meta-‐data related to another object. Most commonly, it is used to store a digital signature of a commit object corresponding to a par=cular release of the data being tracked by Git.
Git Internals – Object Database
A branch points to a commit A tag points to a commit A Commit points to a Tree A Commit can have mul=ple Parents A Tree can contain mul=ple Trees A Tree can contain mul=ple Blobs
Git Internals – Object Database
commit
tree
blob
tree
tree blob
blob
branch
commit
tree
blob
tree
tree
commit
tree
blob
tree
tree blob
blob
branchtag
A`er a commit and tag
A`er one more commit
Git Internals – Object Database
commit
tag
tree
blob
tree
tree
commit
tree
blob
tree
tree blob
commit
tree
blob
blob
branch
Git Internals – Object Database
Git Internals -‐ Content Versioning
Git
Other
Installa-on
1. Go to git-‐scm.com select your platorm and follow the instruc=ons. Choose msysGit installer if you’re going with TortoiseGit Windows client.
2. On Linux, rather the compiling from source you can use your distro package manager to install Git
3. On Mac, if you have trouble with the pre-‐packaged installer you can use the ports systems (Homebrew, MacPorts, etc.
4. If CLI is not your cup of tea there are various GUI clients:
Cross-‐platorm: git gui, gitk, =g Windows: TortoiseGit, SmartGit, Git Extensions Mac: Tower, GitX, GitHub 4 Mac, Giu Linux: git-‐colo, giggle, gitg
5. Install IDE clients if desired
Eclipse EGit IntelliJ – Support out of the box since 9.0 Xcode – Basic support included na=vely since 4.0
Git Installa-on
Setup and Configura-on -‐ Basics Git uses cascading loca=ons to determine effec=ve config similar to other *nix tools
/etc/gitconfig System Wide ~/.gitconfig Per User proj/.git/config Per project
You can manipulate entries for each level by issuing git config and passing -‐-‐system, -‐-‐global and no argument respec=vely. To see current seungs issue
$ git config -‐-‐list You need to set your iden=ty locally as Git has no central server
$ git config -‐-‐global user.name ”John Smith" $ git config -‐-‐global user.email jsmith@anemailprovider.com
You’d want to ignore certain file (*.class, *.swp, *~, target, DS_STORE, etc.)
git config -‐-‐global core.excludesfile ~/.gi=gnore Where .gi=gnore accepts various pacerns .DS_Store *~ *.swp tmp/**/*
Setup and Configura-on -‐ Convenience Colored Output
$ git config -‐-‐global color.ui true Custom Editor
$ git config –global core.editor emacs Custom Diff / Merge Tools
$ git config -‐-‐global diff.tool beyondcompare $ git config -‐-‐global di|ool.prompt true
Aliases
$ git config -‐-‐global alias.compactlog 'log -‐-‐precy="%h %s"’ $ git config -‐-‐global alias.co checkout $ git config -‐-‐global alias.ci commit
GeWng help
$ git help config $ git config –help $ man git-‐config
Setup and Configura-on – SysAdmin Version Git has no built in access mechanism?
• Be up and running quickly • Implement access control of your choice • Git project can focus on its core func=ons
Op-ons?
Basic control -‐ File share permissions and ACLs
• Use your normal Unix or Windows file access. Suitable for internal access only Advanced control -‐ Gitolite
• Allows permissions not just by repository, but also by branch or tag names within each repository. That is, you can specify that certain people (or groups of people) can only push certain “refs” (branches or tags) but not others.
• Works over SSH. Requires only one git user on the host Hosted solu-on -‐ GitHub, BitBucket, etc.
Setup and Configura-on – Cool Tricks Shell Tab Comple-on Save hcps://raw.github.com/git/git/master/contrib/comple=on/git-‐comple=on.bash Source it in from your shell profile $ git log TAB HEAD master origin/HEAD origin/master $ git TAB
Shell Prompt Context Sensi-ve $ export PS1='\u \W$(__git_ps1 " (%s)")\$ ' odimulescu jaxjug-‐1011 (master)$
Commands Overview
Crea=ng Repositories Staging changes Stashing away changes Commiung changes Branch out Merging, Rebasing, Cherry-‐picking Calling home
Checkout (re)ini=alizes your working directory (as a whole or individual files) and staging area from the local repository. All three areas are in sync Make changes in the working directory. At this point is out of sync with index and repository Stage adds changes (modified files, newly added files, removed files). At this point the working directory and the index are in sync for the changed files that were staged Commit brings the repository, the index and the working directory in sync.
Working Directory – Where you make edits Staging Area or Index -‐ Where you stage what changes you plan to commit next -‐ Has no actual content just references. Temporary objects are inserted into repository. They will be dangling un=l a commit links them or discarded on next repository clean up Repository -‐ Where commits or permanent copies are stored
Crea=ng repositories Create a repository from scratch $ git init [DIRECTORY] Handy op=ons in some cases -‐-‐shared=(false|true|umask|group|all|0xxx) – useful to share the repo over NFS or Samba -‐-‐separate-‐git-‐dir <git dir> – used to store the .git folder outside of your project area. GIT_DIR
environment needs to be set for further opera=ons as default behavior is to look up the current dir un=l a .git folder is found
Clone an exis-ng repository git clone REPO_URL LOCAL_DIR $ git clone hcps://github.com/jquery/jquery.git jquery-‐local You can use the –separate-‐git-‐dir as above Create bare repository git init -‐-‐bare [LOCAL_DIR] git clone -‐-‐bare REPO_URL [LOCAL_DIR] $ git clone –bare my-‐local-‐project /Volumes/passport/my-‐project-‐backup.git Creates a repository without a checked-‐out working directory. This is typically used for public or shared repository or to backup to an external drive. LOCAL_DIR are typically named with a .git extension
Staging changes Add changes to the staging area git add <file> git add <directory> git add –all git add -‐-‐patch git add <file> will add the file to the staging area. git add <directory> will add the en=re <directory> to the staging area git add -‐-‐all will the en=re current directory to the staging area. Equivalent to git add . git add -‐-‐patch allows selec=ng individual sec=ons (aka hunks) of a file rather as whole None of the changes are visible in the repository un=l commit is invoked Empty directories are not supported. As an workaround you can add a .gi=gnore (any other name will work) to force the folders structure to be added Viewing status of the working directory $ git status # On branch master # Changes to be commiced: # (use "git reset HEAD <file>..." to unstage) # # modified: README.1st
Staging changes Removing changes from the staging area and working directory git rm <file> git rm -‐r <directory> git rm -‐f <file> or git rm -‐r -‐f <directory> git rm <file> will remove the file from the staging area and working directory git rm -‐r <directory> will removed the directory recursively from staging and working directory git rm –f <file> force removal of the file even it has modifica=ons in the working directory Removing changes from the staging area only git rm -‐-‐cached <file> will remove the file from the staging area only git rm -‐-‐cached -‐r <dir> will removed the directory recursively from staging area only This is useful when you realize you added some unwanted files (*.class, build folder etc. that you haven’t ignored it). Moving files around Git does not explicitly track file movement. This is primarily due to being content addressed (ie does not care about loca=on) but has built-‐in heuris=c to detect movement. git mv file_from file_to this is really a convenience to mv from_file file_to ; git rm old_path; git add new_path This poses an issue with all =me history for a file. Use git log -‐-‐follow
Commiung changes Basic git commit -‐a -‐m <msg> -‐a -‐ Tell the command to automa=cally stage files that have been modified and deleted, but new files you have not told git about are not affected -‐m -‐ Use the given <msg> as the commit message. When you are not the author git commit -‐-‐author=<author> When you are not a good speller or have short aJen-on span git commit –amend This allows you to re-‐edit the last commit message and replaces it with the updated content or To add other files to the commit that you forgot ini=ally … I’m not feeling lucky. Dry-‐run to the rescue git commit -‐-‐dry-‐run
Stashing away changes Dealing with interrup-ons Scenario: You are happily working on your task when your manager asks you to jump on a hot issue
Stash away precious work $ git stash Edit / test / commit the hot fix restore previous work $ git stash pop Con:nue hacking
Granularity is near Scenario: You like to predict the future by using smaller commits you can test and later isolate.
Hack, hack, hack ... Stage the files or file changes you want as first commit $ git add file1 file2 etc $ git add -‐-‐patch file1 Save all other changes to the stash $ git stash save -‐-‐keep-‐index Edit/build/test/commit current changeset Prepare to work on all other changes $ git stash pop Repeat above five steps un:l one commit remains ... Edit/build/test/commit last changeset
Stashing away changes list List the stashes that you currently have. drop [<stash>] Remove a single stashed state from the stash list. clear Remove all the stashed states. show [<stash>] Show diff between the stashed state and its original parent pop [<stash>] Remove a single stashed state from the stash list and apply it on top of the current working tree state. The working directory must match the index apply [<stash>] Like pop, but do not remove the state from the stash list branch <name> [<stash>] Creates and checks out a new branch named <name> star=ng from the commit at which the <stash> was originally created, applies the changes recorded in <stash> to the new working tree and index. This is useful if the branch on which you ran git stash save has changed enough that git stash apply fails due to conflicts. Since the stash is applied on top of the commit that was HEAD at the =me git stash was run, it restores the originally stashed state with no conflicts.
Branch out -‐ Context switching is cheap and easy Create a new branch git branch <branchname> [<start-‐point>]
This will create a new branch named <branchname> using <start-‐point> as reference. If not passed in will use the current HEAD. Switching to a branch git checkout <branchname> -‐ this switches transparently you working directory __CONTENT__ Crea-ng and switching to a branch in one go git checkout –b <branchname> [<start-‐point>] Dele-ng an exis-ng branch git branch –d <branchname> Deletes the branch as long as it’s fully merged with it’s upstream. Use –D if you really want to drop it Renaming an exis-ng branch git branch -‐m <old_name> <new_name> If the new_name exists you can use –M to force the opera=on. That in turn will drop the new_name reference
Merging
$ git merge origin
Rebasing
$ git rebase origin
Cherry-‐picking
$ git cherry-‐pick C3
Calling home
Where is home?
git remote show origin Tell me more about home git remote show orgin * remote origin Fetch URL: /Volumes/USB_S=ck/Git Repos/presenta=on.git Push URL: /Volumes/USB_S=ck/Git Repos/presenta=on.git HEAD branch: master Remote branches: master tracked Local branch configured for 'git pull': master merges with remote master Local refs configured for 'git push': master pushes to master (up to date) GeWng stuff from home git fetch Fetches the content but does not update working directory git pull Fetches the content and merges them into working directory
Calling home
Give back the easy way
git push [<repository> [<refspec>...]] By default git sends the local changes on the current branch to the reference (upstream) repository. You can control to which <repository> and what local references (ie other then current branch) you want to push. Not feeling lucky? git push -‐-‐dry-‐run But I am selec-ve git push [-‐-‐delete] [-‐-‐tags] [-‐-‐all] [-‐-‐mirror] -‐-‐all Instead of naming each ref to push, specifies that all refs under refs/heads/ be pushed -‐-‐delete All listed refs are deleted from the remote repository. This is the same as prefixing all refs with a colon. -‐-‐tags All refs under refs/tags are pushed, in addi=on to refspecs explicitly listed on the command line -‐-‐mirror Instead of naming each ref to push, specifies that all refs under refs/ (which includes but is not limited to refs/heads/, refs/remotes/, and refs/tags/) be mirrored to the remote repository. Newly created local refs will be pushed to the remote end, locally updated refs will be force updated on the remote end, and deleted refs will be removed from the remote end. This is the default if the configura=on op=on remote.<remote>.mirror is set.
SVN to Git migra-on
Basic flow 1. Clone the SVN repository locally into a Git 2. Configure ignored files from SVN so Git honors them 3. Work normally locally as you would with Git 4. Synchronize with SVN server as needed
Caveats • Stay within guidelines. Prefer safety over fancy • Use rebase over merge to keep history linear
SVN to Git migra-on -‐ Git as SVN client
Create an authors file username1 = username1 <email address> username2 = username2 <email address> Create a Git clone of the SVN repository $ git svn clone –A file –s SVN_REPO_URL LOCAL_DIR $ git svn clone –A -‐T trunk -‐b branches -‐t tags -‐r START_REVISION:HEAD SVN_REPO_URL LOCAL_DIR -‐A authors file mapping -‐s presume the svn recommended layout for tags, trunk, and branches -‐T how trunk is called -‐b how branches are called -‐t how tags are called -‐r is for the revision to start taking history from
SVN to Git migra-on -‐ Git as SVN client
Post import cleanup -‐ Convert tag-‐branches to tags -‐ There are different scripts to do that Ignore SVN ignored files $ git svn show-‐ignore > .git/info/exclude $ git svn show-‐ignore > .gi=gnore Pull changes from SVN Repo $ git svn rebase Push changes to SVN Repo $ git svn dcommit -‐-‐dry-‐run (ensures it ends up on the desired branch) $ git svn dcommit Repo informa-on a la SVN $ git svn info $ git svn log
SVN to Git migra-on -‐ Git as SVN client
SVN to Git migra-on -‐ Alterna-ves
Subgit -‐ subgit.com
PROS: • SVN and GIT client can coexist so teams can migrate on their own pace
CONS
• Availability and Reliability. Currently EAP. GA targeted end of Q1 • Pricing (TBD)
SVN to Git migra-on -‐ Alterna-ves
GitHub Enterprise – enterprise.github.com
PROS: • Supports SVN protocol • Supports live conversa=on around code reviews • LDAP authen=ca=on built-‐in
CONS
• Price: $21/user, packs of 20. Subject to change • SVN support not 100%
Summary
• Uber-‐Fast • Full history locally • Local versioning capability • Commit before merge paradigm is libera=ng • Enables different workflows • Cryptographic authen=ca=on • Extremely well documented • Well integrated and available mul=-‐platorm • Marketable skill
Steve Chacon – hcps://github.com/schacon/git-‐presenta=ons Steve Chacon – Pro Git Book Travis Swicegood -‐ Pragma=c Guide to Git
References
ProGit.org GitMagic GitReady.com GitRef.org Version Control by Example git-‐scm.com/documenta=on Visual Git Reference
Resources
Ques-ons & Answers