#atlassian
STEFAN SAASEN • DEVELOPMENT MANAGER • ATLASSIAN • @STEFANSAASEN
Scaling Git
Source View - git cat-file
Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it P ro G i t B o o k , S e c t i o n : G i t I n t e r n a l s
”“
G I T U N D E R T H E H O O D
$> tree .git/objects .git/objects ├── info └── pack !
2 directories
G I T U N D E R T H E H O O D
git add some-file.txt
G I T U N D E R T H E H O O D
$> tree .git/objects .git/objects ├── e4 │ └── 3a6ac59164adadac854d591001bbb10086f37d ├── info └── pack !
3 directories, 1 file
zlib compressed SHA1
G I T U N D E R T H E H O O D
git commit -m "First commit"
G I T U N D E R T H E H O O D
$> tree .git/objects .git/objects ├── 13 │ └── 1e360ae1a0c08acd18182c6160af6a83e0d22f ├── 31 │ └── 995f2d03aa31ee97ee2e814c9f0b0ffd814316 ├── e4 │ └── 3a6ac59164adadac854d591001bbb10086f37d ├── info └── pack !
5 directories, 3 files
G I T U N D E R T H E H O O D
$> tree .git/objects .git/objects ├── 13 │ └── 1e360ae1a0c08acd18182c6160af6a83e0d22f ├── 31 │ └── 995f2d03aa31ee97ee2e814c9f0b0ffd814316 ├── e4 │ └── 3a6ac59164adadac854d591001bbb10086f37d ├── info └── pack !
5 directories, 3 files
Blob
G I T U N D E R T H E H O O D
$> tree .git/objects .git/objects ├── 13 │ └── 1e360ae1a0c08acd18182c6160af6a83e0d22f ├── 31 │ └── 995f2d03aa31ee97ee2e814c9f0b0ffd814316 ├── e4 │ └── 3a6ac59164adadac854d591001bbb10086f37d ├── info └── pack !
5 directories, 3 files
Blob
Tree
G I T U N D E R T H E H O O D
$> tree .git/objects .git/objects ├── 13 │ └── 1e360ae1a0c08acd18182c6160af6a83e0d22f ├── 31 │ └── 995f2d03aa31ee97ee2e814c9f0b0ffd814316 ├── e4 │ └── 3a6ac59164adadac854d591001bbb10086f37d ├── info └── pack !
5 directories, 3 files
Blob
Tree
Commit
G I T U N D E R T H E H O O D
echo "// Comment" >> some-file.txt
G I T U N D E R T H E H O O D
git add some-file.txt
G I T U N D E R T H E H O O D
$> tree .git/objects .git/objects ├── 13 │ └── 1e360ae1a0c08acd18182c6160af6a83e0d22f ├── 31 │ └── 995f2d03aa31ee97ee2e814c9f0b0ffd814316 ├── c1 │ └── 9e6823e34980033917b6427f3e245ce2102e6e ├── e4 │ └── 3a6ac59164adadac854d591001bbb10086f37d !
6 directories, 4 files
Entirely new BLOB
wat?
G I T U N D E R T H E H O O D
git gc
G I T U N D E R T H E H O O D
$> tree .git/objects .git/objects ├── info │ └── packs └── pack ├── pack-7475314b451a882d77b1535d215def8bad0f4306.idx └── pack-7475314b451a882d77b1535d215def8bad0f4306.pack !
2 directories, 3 files
G I T U N D E R T H E H O O D
Loose Objects
G I T U N D E R T H E H O O D
1.zlib compressed 2.Delta encoded
PackfileLoose Objects
but...
CPU Replace Graph
processes-git-pack-objects cputime
0
200000
400000
600000
800000
1000000
1407989425 1407989450 1407989475 1407989500 1407989525 1407989550 1407989575
user syst
CPU
Memory
processes-git-pack-objects rss
0
100
200
300
400
500
1407989425 1407989450 1407989475 1407989500 1407989525 1407989550 1407989575
MiB
Memory
IO Replace Graph
processes-git-pack-objects disk_octets
0
5
10
15
20
25
1407989440 1407989460 1407989480 1407989500 1407989520 1407989540
read write
Disk I/O
What's your point?
git clone/fetch
generates a packfile
git clone/fetch
git clone/fetch
every time
Clone
800stash.atlassian.com
Fetch
1,200
This is what we learned so far.
1. SCM Cache
CACHE
git clone 1
git clone 2CACHE
2. Sizing is important
You need sufficient hardware
768MiB
Memory budget
5 GiB
Memory budget
768MiB
5 GiB
Memory budget
3. Limits
Limits
4. Continuous Integration
What do you have?
This is what I've got.
Ok, here is what I've got. Give me everything that's new.
Here you go!
Don't worry. I'm up to date.
Caption goes here
Avoid Polling.
SCM Cache can also cache ref
advertisements
Consider shallow clones.
5. Update
Use recent versions of
Just text by itself, for impact.
Stash Data Center
RDBMS
FS
RDBMS
NFS
Performance at scale
RDBMS
NFS
S c a l i n g G i t
S c a l i n g G i t
• Git hosting operations are expensive.
S c a l i n g G i t
• Git hosting operations are expensive.• Properly size the hardware for the workload that you expect
S c a l i n g G i t
• Git hosting operations are expensive.• Properly size the hardware for the workload that you expect• Avoid polling when you do a lot of builds, enable caching of
ref advertisements when you can't
S c a l i n g G i t
• Git hosting operations are expensive.• Properly size the hardware for the workload that you expect• Avoid polling when you do a lot of builds, enable caching of
ref advertisements when you can't• Prefer shallow clones
S c a l i n g G i t
• Git hosting operations are expensive.• Properly size the hardware for the workload that you expect• Avoid polling when you do a lot of builds, enable caching of
ref advertisements when you can't• Prefer shallow clones• Limits are in place to keep your Stash server running
S c a l i n g G i t
• Git hosting operations are expensive.• Properly size the hardware for the workload that you expect• Avoid polling when you do a lot of builds, enable caching of
ref advertisements when you can't• Prefer shallow clones• Limits are in place to keep your Stash server running• Stash Data Center allows you to scale out and have high
availability
Sign up today! !
Talk to me after if you’re interested in learning more
Stash Data Center Beta Program
Thank you!
STEFAN SAASEN • DEVELOPMENT MANAGER • ATLASSIAN • @STEFANSAASEN