Analyst Insight - Repositories: Managing Source Code, Artifacts, and Containers 1
Author: Tom Petrocelli
June 2020
Vendor SmartList
Repositories: Managing
Source Code, Artifacts, and
Containers
Analyst Insight - Repositories: Managing Source Code, Artifacts, and Containers 2
Vendor SmartList
Repositories: Managing Source
Code, Artifacts, and Containers
EXECUTIVE SUMMARY
Key Stakeholders: Chief Information Officers, Chief Technical Officers, Vice Presidents of IT, VPs of Platform
Engineering, DevOps Evangelists, Platform Engineers, Release Managers, Automation Architects, Software
Developers, Software Testers, Security Engineers, Software Engineering Directors and Managers, IT directors,
DevOps professionals
Why It Matters: As code deployments continue to accelerate, automation of the management and storage of
source code, build artifacts, and container images becomes more necessary than ever. Security requirements also
demand new solutions for managing code in all stages of its lifecycle.
Top Takeaway: Repositories now exist that provide specialized features for managing code at different points in
its lifecycle, from creation to deployment. While this expanded repository landscape provides much needed
features, it adds to the already increasing complexity of DevOpsi toolchains.
Amalgam Insights’ Landscape for Managing Code Repositories
Open Source
Git
Mercurial
Harbor
Project Quay
Commercial
AtlassianBitbucket (Git)
AWS Elastic Container Registry
AWS CodeCommit
Azure Repos (Git, TFVC)
Docker Hub
GitHub (Git)
GitLab (Git)
Google Container Registry
JFrogArtifactory
Red Hat Quay & Quay.io
SonatypeNexus
Market Report Repositories: Managing Source Code, Artifacts, and Containers \ 3 \
Vendor SmartList
Repositories: Managing Source
Code, Artifacts, and Containers
Tom Petrocelli // PUBLISHED June 2020
MARKET CONTEXT Over the past few years, the demands of development have changed both practice and technology. In order to
develop features more quickly, developers have adopted Agileii methodologies. The goal of Agile’s short sprints and
team enablement allow for higher development velocity. This faster pace has, in turn, meant the need to increase
deployment velocity.
The problem with increased velocity, in both development and deployment, is that it can tax organizations’ resources,
especially human resources. One solution has been increased automation, primarily CI/CD (Continuous
Integration/Continuous Deployment) and Infrastructure as Code (IaC)iii software. CI/CD moves code from creation to
deployment in automated or semi-automated pipelines, which helps ensure fewer errors even while the software takes
on a range of tasks such as building artifacts, security checks, packaging, and finally deploying to target systems. IaC
automates the provisioning and configuration of both physical and virtual hardware resources. Together, these
technologies automate the deployment of code from creation through provisioning of target environments to final
deployment.
Another solution has been to reuse and share code among programmers so that no one reinvents what someone else
has already written. Code sharing within companies is widespread and, with the rise of open source, increasingly
within communities of developers.
Both solutions have one thing in common: they need to have places to safely park code, artifacts, configurations, and
other components until they are needed. Beyond simply having a safe place to keep these system parts, there is also
the need to be able to share them, either within an organization or a community.
This has given rise to specialized software repositories that house and manage source code, shared libraries,
containers, build artifacts, system configurations, and other critical components of modern software systems. In the
past, a Software Version Control (SVC)iv system was enough. Now, a variety of specialized repositories have emerged
in order to meet the differing needs of code at various stages of development and deployment.
Vendor SmartList: Repositories: Managing Source Code, Artifacts, and Containers \ 4 \
Tom Petrocelli // PUBLISHED June 2020 HYOUN PARK // PUBLISHED JANUARY 2019
Vendor SmartList
Repositories: Managing Source
Code, Artifacts, and Containers
DIFFERENT TYPES OF REPOSITORY There are a number of different repositories that are commonly found in development environments. The three most
common, used to manage source code, artifacts, configurations, and container images, are the aforementioned SVC,
artifact repositories, and container registries. Here is an overview of each.
SOFTWARE VERSION CONTROL
The earliest form of code repository, aside from a directory on a hard drive, is the Software Version Control
system, or SVC. SVCs are nothing new, stretching back 40+ years. They are repositories for code that support
versioning and check-in/check-out. Check-in/check-out is a process by which developers first load a version of
source code into the SVC system (check-in), making it available for other developers. When a developer needs to
work on the source code, they check-out the code, making a local copy while removing access for other
developers. This process assures that two developers don't work on the same code simultaneously, creating
conflicts between the versions of the code. This process has been supplanted by multi-branch distributed SVCs
that allow developers to upload changes (“commit the code”) in different branches of the code that are resolved,
or “merged”, later. Many SVC systems also support the ability to compare versions of code as well as workflows
for group code reviews.
Early, open source SVCs include Subversive and Mercurial. Over the past ten years, Git has emerged as the
dominant SVC and is the basis of both GitLab’s CI/CD system and GitHub’s CI software. The open source version
of Git supports CI pull requests, multiple branches, and merging into the main branch. CI systems typically handle
the merging of source code into a single code base from which the object or byte code is built. Most CI/CD
systems support Git either as one of several SVCs or exclusively.
SVCs typically support a series of additional workflows that affect code management. Modern SVCs support
cloning, or making a local copy of the repository, code commit, pull requestsv, forkvi, and merge. The Git
documentationvii provides a list of the workflows available in Git and is typical of modern SVCs.
ARTIFACTS
Build artifacts, or just “artifacts”, are anything produced when a code is built. They can be object or byte code,
configuration files, SQL database creation and modification files, graphics, or anything necessary to have the final code
run correctly in its environment. Artifacts are the end result of a build. Either through manual or automated means,
such as CI/CD software, source code is converted into the actual code that will run on the deployment system. In
Vendor SmartList: Repositories: Managing Source Code, Artifacts, and Containers \ 5 \
Tom Petrocelli // PUBLISHED June 2020 HYOUN PARK // PUBLISHED JANUARY 2019
Vendor SmartList
Repositories: Managing Source
Code, Artifacts, and Containers
addition, configuration files are created that will tell the runtime environment how to run the code, such as
Kubernetesviii configurations. Finally, supporting files, such as graphics for a UI, are added and a directory structure is
created that supports the deployment system. These are eventually packaged for deployment to the target system.
Managing these artifacts can be difficult. Not all artifacts are updated with every development iteration, so artifact
repositories need to detect what has changed and what hasn’t. Artifacts also need to be scanned for security flaws,
especially configuration files, so that downstream processes do not deploy dangerous code. Repositories for artifacts
also need to be able to manage older versions that may be maintained for support reasons as well as separate
packages for different target environments such as cloud or mobile targets. For companies that make use of open
source, the ability to mirror open source repositories may be important as well.
CONTAINER REGISTRIES
As containerix architectures have become more prevalent, commercial and open source container registries have
started to proliferate. The basic purpose of a container registry is to provide a place where container images can be
stored and retrieved. A container image includes the source code and configuration for the container that allows the
container runtimex to deploy the container. Container registries provide catalogs of available container images that can
be reused by different applications. Most important, container registries maintain the lifecycle of container images,
managing updates, versions, and deprecation of container images. Maintaining the security of container images is also
a function of registries, especially security scanning and access control. Registries can be deployed online or on-
premises.
There are two major forms of the container registry – public and private. A public repository allows for sharing of
container images outside of an immediate organization. They are popular with open source organizations as a way of
distributing the open source in a ready to run manner. Private registries are primarily used within organizations to
store and share development and production images amongst developers and operations.
INTERACTIONS WITH CI/CD
CI/CD relies heavily on repositories as waystations for code at various points in the pipelines. A typical CI/CD pipeline
begins with the SVC repository. When code is committed and pull requests generated, these events trigger actions in
the CI/CD pipeline. CI/CD almost always starts with the SVC repository.
When artifacts are created during the build phase of a CI/CD pipeline, they need to be stored somewhere until they
are ready for deployment. While that can be back in the SVC or in a simple file directory, increasingly, build artifacts
Vendor SmartList: Repositories: Managing Source Code, Artifacts, and Containers \ 6 \
Tom Petrocelli // PUBLISHED June 2020 HYOUN PARK // PUBLISHED JANUARY 2019
Vendor SmartList
Repositories: Managing Source
Code, Artifacts, and Containers
are being stored in repositories specific to artifacts or containers i.e. artifact repositories and container registries.
These purpose-built repositories can better manage conflicts between versions, security, and dependencies but also
facilitate the sharing and reuse of artifacts and containers. Artifact repositories and container registries are typically
encountered after the build phase of the CI/CD pipeline.
Container registries may also be the final target of a CI/CD pipeline. If the purpose of the code is to be a reusable
common component in a cluster, for example a logging service, they will likely be held in the container registry for use
in all container clusters.
SECURITY AND REPOSITORIES
Security in the DevOps toolchain has increasingly become a problem. Too many DIY toolchains don’t include security
scanning to ensure that code with known vulnerabilities is not injected into production systems. Thankfully, more
repositories of all types have added security scanning so that code and artifacts with security vulnerabilities do not
make to through to production. Almost all commercial artifact repositories and container registries will scan for
vulnerabilities when something is loaded into them. Some will scan when something is pulled from the repository as
well. Open source container registries, such as Harbor, will also scan and sign containers.
The best form of security for repositories of all sorts is continuous scanning. With ingress and egress scanning, the
assumption is that the repository itself is secure and invulnerable to compromise. This is not a valid assumption. It is
always possible that a bad actor might compromise the repository itself and inject unsafe code or other artifacts. In
response, continuous scanning is becoming more common. With continuous scanning, code, build artifacts, and
containers are scanned while in the repository so that any vulnerabilities introduced after objects were uploaded are
detected and dealt with.
Another security feature of container registries and artifact repositories is signing. By cryptographically signing an
image or artifact, the developer can be confident that the image or artifact has not been compromised or changed
before or during it’s time in the repository.
In addition to the native security scanning, there are companies that can create a security overlay to the CI/CD
process and the repositories that are part of the pipelines. Companies such as Snyk and Aqua integrate with
repositories of different sorts in order to provide find vulnerabilities held within. These companies are especially
important when CI/CD pipelines are built on open source or older technology which may not have integrated scanning
as a feature.
Vendor SmartList: Repositories: Managing Source Code, Artifacts, and Containers \ 7 \
Tom Petrocelli // PUBLISHED June 2020 HYOUN PARK // PUBLISHED JANUARY 2019
Vendor SmartList
Repositories: Managing Source
Code, Artifacts, and Containers
A ONE SIZE FITS ALL SOLUTION Is there a one size fit all solution? The answer is yes, but with a hefty disclaimer. Almost any type of storage – even
cloud or on-premises folders and directories – can be used to store source code, artifacts, and containers. This is not
an accepted best practice because features such as version control and access control are important qualities of
repositories. By the same token, it has been common practice to store all types of code and artifacts in an SVC
repository, especially Git. This is as close to a common denominator that exists today.
Common practice is not always best practice. Given the diversity of objects under management, a one size fits all
approach is not recommended. In addition, different types of objects are shared in different ways, especially
containers. Integrations with platforms such as Kubernetes are important for container repositories while build
artifacts have to be accessible by continuous deployment systems. Purpose built repositories are better tuned to the
different functions of their objects.
VENDOR AND OPEN SOURCE LANDSCAPE As the toolchain needs of organizations has continued to grow, the vendor and open source landscape that supports
them has expanded as well. Below find a sample of important open source projects and representative vendors that
populate this segment of the IT market.
Table 1: Vendor and Open Source Landscape
Name Type License Website
Git SVC Open Source https://git-scm.com/
Mercurial SVC Open Source https://www.mercurial-scm.org/
Harbor Container
Registry
Open Source https://goharbor.io/
GitHub (Git) SVC Commercialxi https://github.com/git
GitLab (Git) SVC Commercial https://GitLab.com/
Atlassian Bitbucket
(Git)
SVC Commercial https://bitbucket.org/
JFrog Artifactory Artifact
Repository
Commercial https://jfrog.com/artifactory/
Vendor SmartList: Repositories: Managing Source Code, Artifacts, and Containers \ 8 \
Tom Petrocelli // PUBLISHED June 2020 HYOUN PARK // PUBLISHED JANUARY 2019
Vendor SmartList
Repositories: Managing Source
Code, Artifacts, and Containers
Container
Registry
Docker Hub Container
Registry
Commercial https://hub.docker.com/
Red Hat Quay
(Project Quay)
Container
Registry
Commercialxii https://www.projectquay.io/
https://www.openshift.com/products/quay
Google Container
Registry
Container
Registry
Commercial https://cloud.google.com/container-
registry/
Azure Repos (Git,
TFVC)
SVC Commercial https://azure.microsoft.com/en-
us/services/devops/repos/
AWS Elastic
Container Registry
Container
Registry
Commercial https://aws.amazon.com/ecr/
AWS CodeCommit SVC Commercial https://aws.amazon.com/codecommit/
Sonatype Nexus Artifact
Repository
Commercial https://www.sonatype.com/product-
nexus-repository
OPEN SOURCE
Open source has played a significant part in the repository market. The majority of SVCs today are either in whole or
based on Git. Meanwhile, Harbor, a CNCF sponsored open source container registry, is gaining acceptance and
showing up in commercial products such as VMWare Tanzu.
Git
Undoubtedly the most common distributed SVC repository, Git is the repository of choice for storing and managing
source code, in no small part due to the commercial, cloud version GitHub. One of the reasons for Git’s popularity is its
support of workflows beyond check-in/check-out and versioning, especially pull requests. Git deploys a graph style
(branching) versioning system. Git also has project management features that enable collaboration on software
projects.
Mercurial
Mercurial, also a distributed SVC, was first released at about the same time as Git (2005). It initially had more traction
than Git because it was designed to make it easier to transition from earlier systems such as Subversion. It was also
Vendor SmartList: Repositories: Managing Source Code, Artifacts, and Containers \ 9 \
Tom Petrocelli // PUBLISHED June 2020 HYOUN PARK // PUBLISHED JANUARY 2019
Vendor SmartList
Repositories: Managing Source
Code, Artifacts, and Containers
quite popular in the Python and Java communities. Mercurial is still in wide use but has been eclipsed by Git, especially
in the open source community.
Harbor
Harbor is an open source container registry project sponsored by the CNCF. It is integrated or a component of a
number of commercial toolchains and Kubernetes distributions such as VMware Tanzu Kubernetes Grid Integrated
Edition.
COMMERCIAL VENDORS
GitHub
GitHub is a commercial version of the open source Git project. Originally, GitHub was an independent company and
quickly became popular within the open source community. As a cloud service, GitHub made sharing code much easier
geographically and across company boundaries than on-premise systems.
Since GitHub’s acquisitions by Microsoft, questions have been repeatedly raised as to its independence and future. So
far, Microsoft has taken a hands-off approach to GitHub and its popularity has not declined. More importantly, GitHub
has continued to advance the technology beyond the typical SCV and project management aspects of Git. For
example, GitHub Actions promises to provide CI/CD capabilities to GitHub, creating a product more akin to GitLab.
GitLab
GitLab is also based on Git. In addition to the traditional Git functionality, GitLab offers an end-to-end, solution for the
entire DevOps lifecycle that’s delivered as a single application and comes with CI/CD built-in,, single application, CI/CD
solution as well as Agile project management, and process monitoring and analytics.
GitLab also has a robust API for IT integration and a merge train feature that chains merges until all dependent code
is merged into the master branch. There is a templated project setup through a GUI but all pipeline can be customized
by writing YAML.
Atlassian Bitbucket
Atlassian Bitbucket is yet another Git. Like most of Atlassian’s products, Bitbucket emphasizes collaboration. Its
strongest feature, however, is its integration with Atlassian’s other DevOps products, Jira and Confluence. In addition,
Atlassian Bitbucket Pipelines, a CI/CD product, is tightly integrated with BitBucket.
Vendor SmartList: Repositories: Managing Source Code, Artifacts, and Containers \ 10 \
Tom Petrocelli // PUBLISHED June 2020 HYOUN PARK // PUBLISHED JANUARY 2019
Vendor SmartList
Repositories: Managing Source
Code, Artifacts, and Containers
JFrog Artifactory
Artifactory is a popular repository that combines the features of an artifact repository and container registry in one
platform. It stresses enterprise management of build artifacts and container images including governance, security,
and sharing across large enterprises. Artifactory automates key processes related to artifacts including updates and
versioning, similar to what an SVC system does for source code. Recently, JFrog has added a CI/CD tool called
Pipelines that integrates with Artifactory and automates DevOps processes from merge through the Build phase to
deployment.
Docker Hub
Docker, Inc., was one of the early container advocates. Their container image, while in the process of being
supplanted by open source container images such as the CNCF project, ContainerD, is still the most common in use.
Docker has recently rebooted themselves as developer tools company. One of their core products is the container
registry, Docker Hub. It offers both public and private repositories that provide secure management and sharing of
container images.
Red Hat Quay and Quay.io
Project Quay is an open source project container registry project that is primarily supported by the Red Hat division of
IBM. Originally a project (and product) supported by CoreOS, Quay is also a build system for containers. Red Hat Quay
is the commercial version of the open source project. Quay.io is the hosted service offering using the same code base
as the on-prem products.
Google Cloud
The Google Cloud Platform (GCP) has an entire CI/CD toolchain available to deploy code to GCP services. Part of that
toolchain is the Container Registry. The GCP Container Register accepts containers and other build artifacts, security
scans them and stores them for use by CD pipelines.
Microsoft Azure
Azure, Microsoft’s cloud platform, has been building out its toolchain products. Interestingly, Azure offer tools based
on both newer open source technology and older Windows-centric on-premises products. A good example of this is
Azure Repos, the SVC for Azure. Part of Azure DevOps toolchain, Azure Repos comes in two flavors – a version based
Vendor SmartList: Repositories: Managing Source Code, Artifacts, and Containers \ 11 \
Tom Petrocelli // PUBLISHED June 2020 HYOUN PARK // PUBLISHED JANUARY 2019
Vendor SmartList
Repositories: Managing Source
Code, Artifacts, and Containers
on the open source Git (like their GitHub product) and another based on Team Foundation Version Control (TFVC).
The latter originated as part of the Visual Studio Online suite.
As Microsoft continues to develop more products based on open source (which is where the future growth in demand
lies), it is expected that the TFVC version will be depreciated.
AWS
Amazon Web Services has a managed container registry product called, not unexpectedly, Amazon Elastic Container
Registry (Amazon ECR). It supports both Docker and OCI container images. Like many AWS products, there is
emphasis on “elastic”. Amazon ECR can scale both storage and bandwidth easily and uses a pay as you go model. It is
integrated with existing AWS container products such as AWS Elastic Kubernetes Service and Amazon Elastic Container
Service.
In addition, Amazon has an SVC called AWS CodeCommit based on Git.
Sonatype Nexus Repository
Nexus Repository from Sonotype is a binary object and build artifact repository. It supports integrations with major
build tools such as CircleCI and Jenkins as well as package managers.
CONCLUSION No CI/CD toolchain can operate without repositories to hold code in various stages of processing. An SVC is,
undoubtedly the most important since proper code management is too difficult without it. As toolchains become more
sophisticated, repositories for build artifacts and containers become necessary. The workflows and features needed to
secure and manage code at these stages in the deployment pipeline, especially sharing workflows, are not well
supported in an SVC.
Needing to deploy multiple types of repositories, however, is expensive and inconvenient when building out end-to-
end deployment pipelines. This creates a conundrum for DevOps professionals. One the one hand, there is no one-
size-fits-all solution. On the other hand, complexity in the toolchain is expensive and unreliable.
At the moment, the market is fragmented with SVCs, container registries, and built artifact repositories products
offered by different vendors and open source projects. At some point, there may be consolidation in the market but
Vendor SmartList: Repositories: Managing Source Code, Artifacts, and Containers \ 12 \
Analyst Insight
Repositories: Managing Source Code,
Artifacts, and Containers
there is no indication that that is becoming a trend, as is the case with end-to-end CI/CD solutions. The specialization
of repositories and the added complexity that comes with it, seems to be a fact of life, at least for the time being.
The good news is that there are projects and product that are combining some but not all repository functions.
This is especially true for post-Build artifacts and container images. Also, the mixing of public and private
container registries is starting to emerge to support hybrid cloud implementations. This merging is expected to
continue but it is unclear if there will ever be a commercially viable, universal repository. The range of
functionality that would need to be supported is large and would only be of advantage to a portion of the market.
Tom Petrocelli
Research Fellow
June 2, 2020
Vendor SmartList: Repositories: Managing Source Code, Artifacts, and Containers \ 13 \
Vendor SmartList
Repositories: Managing Source Code,
Artifacts, and Containers
ABOUT AMALGAM INSIGHTS
AMALGAM INSIGHTS
Is a leading research and advisory firm focused the financial, programmatic, and cognitive tools
that multiply the value of enterprise technology including the following research practices:
Technology Expense and IT Subscription Management, Accounting and Business Planning
Technologies, Data Science and Machine Learning, DevOps and Open Source Development,
Talent Management, Learning & Development, and Extended Reality.
TOM PETROCELLI, RESEARCH FELLOW
Tom Petrocelli is a Research Fellow with Amalgam Insights. His area of interest is developer
tools, IT project efficiency, governance, and methodologies, and DevOps. He also looks at how
large regulated companies, especially financial services companies, manage IT projects. Tom
has over 35 years of experience in the IT industry.
Prior to Amalgam Insights, Tom:
Worked for a large, global, banking corporation.
Was the research director for Enterprise Social, Mobile and Cloud Applications at
Neuralytix.
Before Neuralytix, Tom was the senior analyst, Social Enterprise at Enterprise Strategy
Group.
Before becoming an analyst, Tom held various senior and executive management
positions.
CONTACT AMALGAM INSIGHTS
Phone: +1 415 754-9686
Website: www.amalgaminsights.com
Twitter: @AmalgamInsights
Disclaimer: Amalgam Insights provides consulting, research and advisory services to a variety of technology
consumers and vendors and may have revenue-based client relationships with companies mentioned in our research.
Vendor SmartList: Repositories: Managing Source Code, Artifacts, and Containers \ 14 \
Vendor SmartList
Repositories: Managing Source Code,
Artifacts, and Containers
ENDNOTES
i DevOps a portmanteau of Developer and Operations. In this context, it refers to the tools and processes used to deploy software to highly distributed, clustered environments. This definition is one derived from common usage and
not the original meaning. Originally, DevOps meant a team-oriented approach to development that melds the development team and operations team into one unit for purposes of writing and deploying software. ii Agile is a project management methodology based on the Agile philosophy. The Agile development philosophy or
advocates short development cycles and integrated business-technical teams. The most common implementation of Agile is called Scrum. iii Infrastructure as Code, or IaC, is the practice of writing code, usually YAML or JSON to represent the software and hardware infrastructure configuration in a system. An automation server then executes the code and creates the
system configuration. IaC may use a declarative or imperative design. iv Software version control, or SVC, is a repository for code that provides versioning, a history of versions, and check-in/check-out protections to keep one developer from overwriting the changes of another. v A pull request is a request a developer sends to someone authorized to approve the code, or “pull” it, for merging into the main branch. It is often a trigger for a CI process. vi To fork code means to create a new version of the code that will be developed separately. Forked code may, in the
SVC sense, only mean the creation of a new branch that may be merged later. In the case of open source, it means creating an entirely separate code base and project such that the development is expected to diverge over time. vii Git reference, https://Git-scm.com/docs viii Kubernetes is an orchestrator for containers. It is emerging as the base for new application architectures deploying
container clusters. A container orchestrator provides the basic services for a running a container cluster. The most basic of these are instantiating and shutting down containers. Other typical services including autoscaling containers,
network management, and APIs for interacting with the cluster. The most common container orchestrator is
Kubernetes. ix Containers are a form of lightweight virtualization that is part of the Linux operating system. They employ two Linux
capabilities to create containers, cgroups and namespaces. A cgroup limits the amount of resources available to processes while a namespace provides process isolation. x A container runtime downloads a container image, unpacks the image file, and instantiates and runs the container. xi There are a number of commercial products based on the open source Git. From the point of view of a customer, they are commercial products since a license or subscription needs to be purchased. xii Project Quay is the upstream open source container registry project that underlies the Red Hat commercial Quay.