+ All Categories
Home > Documents > Anything to Topology A Method and System …...deployment of a Web-frontend, combining the...

Anything to Topology A Method and System …...deployment of a Web-frontend, combining the...

Date post: 25-May-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
12
1 Institute of Architecture of Application Systems, University of Stuttgart, Germany [email protected] Anything to Topology – A Method and System Architecture to Topologize Technology-Specific Application Deployment Artifacts Christian Endres 1 , Uwe Breitenbücher 1 , Frank Leymann 1 , and Johannes Wettinger 1 The full version of this publication has been presented at CLOSER 2017. http://closer.scitevents.org/?y=2017 © 2017 SCITEPRESS – Science and Technology Publications, Lda. @InProceedings{Endres2017_AnythingToTopology, author = {Endres, Christian AND Breitenb{\"u}cher, Uwe AND Leymann, Frank AND Wettinger, Johannes}, title = {{Anything to Topology - A Method and System Architecture to Topologize Technology-Specific Application Deployment Artifacts}}, booktitle = {Proceedings of the 7th International Conference on Cloud Computing and Services Science (CLOSER 2017), Porto, Portugal}, pages = {180--190}, year = {2017}, month = apr, publisher = {SCITEPRESS}, isbn = {978-989-758-243-1}, } : Institute of Architecture of Application Systems
Transcript
Page 1: Anything to Topology A Method and System …...deployment of a Web-frontend, combining the tech-nologies is error-prone and knowledge-intensive con-trary to combining only technology-agnostic

1 Institute of Architecture of Application Systems,University of Stuttgart, Germany

[email protected]

Anything to Topology –A Method and System Architecture to Topologize

Technology-Specific Application Deployment Artifacts

Christian Endres1, Uwe Breitenbücher1, Frank Leymann1, and Johannes Wettinger1

The full version of this publication has been presented at CLOSER 2017.

http://closer.scitevents.org/?y=2017

© 2017 SCITEPRESS – Science and Technology Publications, Lda.

@InProceedings{Endres2017_AnythingToTopology,author = {Endres, Christian AND Breitenb{\"u}cher, Uwe AND Leymann, Frank

AND Wettinger, Johannes},title = {{Anything to Topology - A Method and System Architecture to

Topologize Technology-Specific Application Deployment Artifacts}},

booktitle = {Proceedings of the 7th International Conference on Cloud Computing and Services Science (CLOSER 2017), Porto, Portugal},

pages = {180--190},year = {2017},month = apr,publisher = {SCITEPRESS},isbn = {978-989-758-243-1},

}

:

Institute of Architecture of Application Systems

Page 2: Anything to Topology A Method and System …...deployment of a Web-frontend, combining the tech-nologies is error-prone and knowledge-intensive con-trary to combining only technology-agnostic

Anything to Topology - A Method and System Architecture to TopologizeTechnology-Specific Application Deployment Artifacts

Christian Endres, Uwe Breitenbucher, Frank Leymann, and Johannes WettingerInstitute of Architecture of Application Systems, University of Stuttgart, Stuttgart, Germany

{christian.endres, uwe.breitenbuecher, frank.leymann, johannes.wettinger}@iaas.uni-stuttgart.de

Keywords: Application Deployment, Topology Crawling, TOSCA, Configuration Management, Chef

Abstract: In recent years, many application deployment technologies have emerged such as configuration managementtools, e.g., Chef and Juju, infrastructure and platform technologies, e.g., Cloud Foundry and OpenStack, aswell as container-based approaches, e.g., Docker. As a result, many repositories exist which contain executableand heavily used artifacts that can be used with these technologies, e.g., to deploy a WordPress application.However, to automate the deployment of more complex applications, typically, multiple of these technologieshave to be used in combination. Thus, often, diverse artifacts stored in different repositories need to beintegrated. This requires expertise about each technology and leads to a manual, complex, and error-proneintegration step. In this paper, we tackle these issues: We present a method and system architecture that enablescrawling repositories in order to transform the contained artifacts into technology-agnostic topology models,each describing the components that get installed as well as their dependencies. We show how these topologiescan be combined to model the deployment of complex applications and how the resulting topology can bedeployed automatically by one runtime. To prove the feasibility, we developed and evaluated a prototypebased on the TOSCA standard and conducted a case study for Chef artifacts.

1 INTRODUCTION

In recent years, Cloud Computing gained a lotof attention as it helps to achieve flexible IT opera-tion (Leymann, 2009). To automate the deploymentof Cloud applications, besides the proprietary APIsoffered by providers, many additional technologieshave been developed that focus on different kinds offunctionality. Among these technologies there are,e.g., several configuration management tools, e.g.,Ansible, Chef, Juju, and Puppet; infrastructure andplatform technologies, e.g., OpenStack and CloudFoundry; as well as container-based approaches, e.g.,Docker. Due to the heavy usage of these technolo-gies, many open-source repositories have emergedthat contain executable and heavily used artifacts thatcan be used by these technologies to deploy the de-sired application. For example, the Chef Supermar-ket1 contains a plethora of cookbooks that can be usedby the Chef runtime chef-client (Taylor and Vargo,2014) to automatically deploy a certain application.Thus, installing, for instance, a WordPress applicationcan be automated efficiently by reusing the cookbook.

1https://supermarket.chef.io/

However, reusing such artifacts comes with twochallenges to be tackled: (i) Selecting appropriate ar-tifacts often requires deep technology-specific exper-tise to understand the effect of executing an artifactand to configure the runtime correctly. For example,if a Chef cookbook shall be used to deploy Word-Press, the cookbook needs to be analyzed to ensurethat exactly the desired configuration gets deployed.In addition, the Chef runtime needs to be configuredto deploy the application to virtual machine(s). Un-fortunately, efficiently getting a quick overview of thecomponents that get installed by an artifact and theirdependencies is often not possible without highly spe-cific domain expertise – especially as intuitive graph-ical tooling is missing in many technologies.

(ii) While understanding artifacts is a serious chal-lenge, combining them to deploy non-trivial applica-tions is another challenge that needs to be tackled inreal-world scenarios. For complex applications, typ-ically, multiple management technologies have to beintegrated (Breitenbucher et al., 2013): the APIs ofCloud providers must be invoked to deploy virtualmachines whereas configuration management tech-nologies, e.g., may be used to deploy the desired com-ponents on the provisioned virtual machines.

Page 3: Anything to Topology A Method and System …...deployment of a Web-frontend, combining the tech-nologies is error-prone and knowledge-intensive con-trary to combining only technology-agnostic

However, such combinations often require enor-mous expertise when multiple heterogeneous servicesneed to be orchestrated, low-level technologies haveto be wrapped, and diverse data formats must be inte-grated – to name a few challenges (Eilam et al., 2011).Thus, manually executing these steps is error-prone,time-consuming, and, therefore, not efficient (Breit-enbucher et al., 2014). In this paper, we tackle theseissues by introducing the Topologize method.

We present the Topologize Method and System Ar-chitecture that enables automated crawling of differ-ent kinds of repositories, e.g., Chef Supermarket, inorder to transform the contained technology-specificartifacts into technology-agnostic topology models.Each generated topology model is a directed, labeledgraph describing the components that get installed bya certain artifact as well as the relations between thecomponents. Thus, the generated topology modelsease understanding the functionality of artifacts sincegraphs can be interpreted without requiring any ex-pertise about the employed technology, the artifact,and its serialization format. Moreover, we show thatthese generated topology models can be combined ina technology-agnostic manner to model the deploy-ment and provisioning of complex applications us-ing a single runtime. Thus, no manual integration ofdifferent technologies is required if diverse artifactsneeds to be combined. To achieve this, we combineour method and system architecture with the TOSCAstandard (OASIS, 2013b) that provides a sophisti-cated means to integrate arbitrary kinds of manage-ment technologies. To validate the practical feasibil-ity of our approach, we developed a prototype thatis integrated with the OpenTOSCA Ecosystem (Binzet al., 2013a), a standards-based implementation ofthe TOSCA standard. Moreover, we conducted a casestudy based on the configuration management tech-nology Chef to show how the presented architectureand concepts can be applied to a technology.

The remainder of this paper is structured as fol-lows. In Section 2, we motivate our work. In Sec-tion 3, we introduce our Topologize method enablingto crawl repositories and transform the contained ar-tifacts into technology-agnostic topology models. InSection 4, we present the Topologize System Architec-ture that describes a system capable of automaticallyexecuting this method. Section 5 introduces TOSCA.To validate the feasibility of our approach, in Section6, we describe a prototypical implementation of thissystem architecture. In Section 7, we describe a casestudy in which we apply our method to the configu-ration management technology Chef and evaluate theprototype. Section 8 describes related work, Section9 concludes the paper and outlines future work.

2 MOTIVATION

Many deployment automation technologies, e.g.,configuration management technologies such as An-sible (Mohaan and Raithatha, 2014), Chef (Taylor andVargo, 2014), or Puppet (Uphill, 2014) come withhuge open-source repositories that contain a plethoraof artifacts usable for deployment. Typically, these ar-tifacts, e.g., scripts, have to be adapted, deployed, andexecuted in correct order to install the desired applica-tion. The Chef Supermarket is one example that pro-vides cookbooks for installing different kinds of ap-plications, e.g., middleware components or databasesystems. Another example are GitHub repositoriescontaining source code of applications and scripts forbuilding and deploying the application. Furthermore,the documentation about artifacts and how to executethem is typically available in natural language. How-ever, to ensure achieving the desired deployment andinstallation, artifacts and their implications must beanalyzed and understood in detail to avoid undesiredconfigurations or – in general – undesired results. Un-fortunately, correctly interpreting all effects of exe-cuting such artifacts typically requires deep techni-cal expertise about the used technology because thementioned technologies employ different approaches,meta models, and serialization formats.

Especially, the heterogeneity and diversity of de-ployment automation technologies lead to serious in-tegration challenges if multiple technologies have tobe combined (Breitenbucher et al., 2013). To achievethis, often the workflow technology is used for or-chestration purposes (Arshad et al., 2007; Bellavistaet al., 2013; Breitenbucher et al., 2014; Keller andBadonnel, 2004; Mietzner et al., 2009). However,even if an orchestration approach is used for integrat-ing different technologies, nevertheless, (i) the indi-vidual artifacts and their effects must be understoodto achieve the desired goals, (ii) the orchestration flowmust be specified, and (iii) wrappers need to be imple-mented and configured. In addition, the used runtimesmust be installed, maintained, and updated, whichtypically takes a serious amount of time (Brown andHellerstein, 2005). Thus, a normalized model is desir-able that only describes the desired application and itsdeployment without the technical details of the tech-nologies used to deploy distinct parts of the model.

For many technologies, these artifacts are files thatreference other files, thus, the files are linked. How-ever, inspecting all the possible dependencies manu-ally to determine the components that get installed issignificantly more error-prone, knowledge-intensive,and time-consuming than having a short look on astructured graphical diagram such as a topology.

Page 4: Anything to Topology A Method and System …...deployment of a Web-frontend, combining the tech-nologies is error-prone and knowledge-intensive con-trary to combining only technology-agnostic

Crawl Artifacts

Extract Components

Derive Topology Models

Specify Resources

Make Topology Models Deployable

META META

META

optional

Figure 1: Overview of the Topologize method and its steps.

3 THE TOPOLOGIZE METHOD

In this section, we introduce the Topologizemethod and its five steps. Figure 1 depicts the Topol-ogize method, its steps, and the transformation ofcrawled, technology-specific artifacts to topologies.

3.1 Goal of the Topologize Method

The goal of the method is to crawl different kinds ofrepositories, e.g., the Chef Supermarket, in order totransform the contained technology-specific artifactsinto technology-agnostic topology models. The gen-erated topology models (i) support the understandingof artifacts, which is typically not trivial as correctlyinterpreting the flat and linked files of a certain tech-nology and its serialization format often requires deeptechnical expertise, cf. Section 2. Moreover, (ii) thegenerated topology models can be easily combined ina technology-agnostic manner to model the deploy-ment of more complex applications that consist of dif-ferent building blocks. If, e.g., an artifact describesthe automated installation of a database setup whileanother artifact of a different technology describes thedeployment of a Web-frontend, combining the tech-nologies is error-prone and knowledge-intensive con-trary to combining only technology-agnostic topologymodels. We will introduce a system architecture inSection 4 that enables automating the deployment ofthe resulting topology models by one runtime.

3.2 Step 1: Specify Resources

There are various kinds of repositories available, e.g.,many configuration management technology commu-nities, such as of Ansible, Chef, and Puppet, provideopen-source repositories that contain various artifactsfor the technologies. One example is the Chef Su-permarket providing cookbooks for different kinds ofapplications, middleware components, database sys-tems, and more. In the first step of the method, therepositories are specified that shall be crawled.

3.3 Step 2: Crawl Artifacts

In step two, the specified repositories are crawled forartifacts. Queries may be specified that describe char-acteristics of the artifacts to be captured, e.g., thatonly database installation artifacts shall be crawled.

3.4 Step 3: Extract Components

In step three, all artifacts are analyzed in atechnology-specific manner to determine the com-ponents, their characteristics, and meta information.These components are stored separately in the form ofdefinitions documents and meta information, e.g., thelicense information, are attached to these documents.

3.5 Step 4: Derive Topology Models

In step four, the analyzed artifacts and their metainformation are interpreted in a technology-specificmanner in order to derive the structure of the ap-plication that gets installed by an artifact, i.e., thecomponents that are installed and their relationships.This structure is captured in the form of a topologymodel, which is a technology-agnostic, directed, la-beled graph wherein nodes describe the componentsand edges their dependencies. The generated topolo-gies serve as normalized models that describe thefunctionality of artifacts independently of the actualtechnologies. Thus, they can be understood by non-experts as no expertise about technologies is required.For serializing topology models, the OASIS TOSCAstandard (OASIS, 2013b) can be used, for which a vi-sual notation is available (Breitenbucher et al., 2012).

3.6 Step 5: Make Topologies Deployable

In this (optional) step, the topology models are refinedto enable their automated deployment if the origi-nal artifacts did not include all required information.For example, if a Chef cookbook only describes adatabase installation, an operating system and virtualmachine node gets added to the topology model.

Page 5: Anything to Topology A Method and System …...deployment of a Web-frontend, combining the tech-nologies is error-prone and knowledge-intensive con-trary to combining only technology-agnostic

Topologizer

Crawler

Plugin Plugin Plug-in Registry Plug-in

2

1

Artifact Repository

3

Components Repository

Topologies Repository

Component Recognizer and

Builder

Topology Recognizer and

Builder

Topology Modeling

Tool

Plugin Plugin Plug-in Registry Plug-in

7

5 6

4

8 Topology Provisioning

Engine

META META

META

Figure 2: The general system architecture implementing the Topologize method.

4 SYSTEM ARCHITECTURE

In this section, we describe a system architecturefor implementing the Topologize method proposed inSection 3. The system architecture is sketched in Fig-ure 2. At the top left, databases are depicted that rep-resent repositories containing technology-specific ar-tifacts. These repositories are (1) specified by statingthe resource location and type and registered at theCrawler. This corresponds to the first step SpecifyResources of the Topologize method.

Then, (2) the Crawler discovers and identifies ar-tifacts in the defined repository by executing an ap-propriate plug-in. Each plug-in comprises the spe-cific functionality to process a location specification,identify contained artifact representations, and obtainthe artifacts. Also, meta information about artifactsare gathered, e.g., name and version. Then, (3) eachcrawled artifact and its information are stored withinthe Artifact Repository. This corresponds to the sec-ond step Crawl Artifacts of the Topologize method.

The Topologizer processes the artifacts, interpretsthe contained component and structure information,and derives topology models. These models may con-tain components whose requirements are only partlysatisfied. Thus, additional information or additionalcomponents may be required for the model beingdeployable. For being able to analyze technology-specific artifacts, (4) a plug-in mechanism is imple-mented within the Topologizer. The Component Rec-ognizer and Builder (5) analyzes technology-specificartifacts and extracts technology-agnostic componentdefinitions and meta information about them, for ex-ample, requirement statements. All components and

their information are stored in the Components Repos-itory. This corresponds to the third step Extract Com-ponents of the Topologize method.

The Topology Recognizer and Builder (6) in-terprets component information and infers topologymodels. The artifacts in the components within theComponents Repository expose their requirements,thus, using these information the Topology Recog-nizer and Builder searches for components within theComponents Repository that expose appropriate ca-pabilities. To build topology models, the TopologyRecognizer and Builder creates nodes for each com-ponent and connects them with edges according to therequirement statements, cf. Section 6. The result-ing topology models are stored within the TopologyModel Repository. This corresponds to the fourth stepDerive Topology Models of the Topologize method.

The derived models may not be provisionable be-cause the original artifacts may not state all requiredinformation, e.g., credentials, or do not define the tar-get infrastructure, e.g., a specific application server.Thus, (7) a modeling environment enables the userto manually customize the topology model or it com-pletes the topology model in an automated manner.The topology model is packaged and (8) provided to asuitable Topology Provisioning Engine for provision-ing instances of the modeled application. This cor-responds to the fifth step Make Topology Models De-ployable of the Topologize method. Therefore, auto-matically maintaining technology-agnostic topologymodels of technology-specific artifacts is enabled aswell as combining them to arbitrary complex topolo-gies using the technology-agnostic modeling tool.

Page 6: Anything to Topology A Method and System …...deployment of a Web-frontend, combining the tech-nologies is error-prone and knowledge-intensive con-trary to combining only technology-agnostic

5 THE TOSCA STANDARD

On this page, we provide the fundamentals ofthe Topology and Orchestration Specification forCloud Applications (TOSCA) (OASIS, 2013a; OA-SIS, 2013b) that is used in the prototype to serializetopology models, cf. Section 6. For giving a brief andcompact background, details are omitted. Details canbe found in the specification (OASIS, 2013b), whilsthints for the interpretation of the standard are docu-mented in the TOSCA Primer (OASIS, 2013a).

A topology model – called Topology Templatein TOSCA – is a directed graph in which compo-nents (vertices) are connected regarding their rela-tions (edges). The components, e.g., a virtual ma-chine or a web application, are called Node Templatesand define various characteristics of an instance. Thestructure of a topology model can be specified byRelationship Templates that connect Node Templatespairwise regarding their interplay. For instance, inFigure 32 at the left, the Node Template WebShopis connected with the Node Template Apache-v2.4via a Relationship Template hostedOn. Furthermore,semantics are specified by typing the Node Tem-plates and Relationship Templates: Node and Rela-tionship Types, respectively, define operations, prop-erties, capabilities, and requirements of that type ofcomponent or relationship. For instance, the Ubuntu-v16.04 Node Template references an UbuntuVM-v16.04 Node Type that defines an IP-Address propertyand that this component is a virtual machine. Thus,for each instance of the Ubuntu-v16.04 Node Tem-plate, its IP-address is set in the modeled property.

For instantiating Node Templates, various actionshave to be executed. For installing a Web Server NodeType, for example, a script can be executed that in-vokes apt-get install. Such a script is executed onan operating system, thus, the respective Node Tem-plate is connected to the operating system Node Tem-plate that exposes an operation execute Script. Op-erations enable to create and interact with instancesof Node Templates or Relationship Templates, for ex-ample, realizing the provisioning of instances. Forexample, invoking a script execution on a distinct op-erating system Node Template instance, the operatingsystem instance has to be accessed. By modeling theIP address property as input for the execute Script op-eration, the script invocation execution targets the cor-rect operating system instance. To foster reusability,such operations and properties are modeled in NodeTypes and Relationship Types. Additionally, NodeTypes and Relationship Types support inheritance.

2We use the visual notation VINO4TOSCA (Breit-enbucher et al., 2012) to render topology models.

DA: PHP App

WebShop(PHPWebApplication)

IA: deploy App

Apache-v2.4(ApacheServer-v2.4)

Property: IP-AddressIA: execute Script

Ubuntu-v16.04(UbuntuVM-v16.04)

hostedOn

hostedOn

DA: DB Schema

WebShopDB(MySQLDatabase)

IA: init DB Schema

MySQL-v5.7(MySQLServer-v5.7)

Property: IP-AddressIA: execute Script

Ubuntu-v16.04(UbuntuVM-v16.04)

hostedOn

hostedOn

(vSphereHypervisor)

hostedOn hostedOn

connectsTo

Figure 3: An exemplary TOSCA Topology Template.

For instantiating the modeled functional compo-nents, e.g., a customized web application, the imple-mentation of the components has to be provided bythe respective Node Template. TOSCA defines suchimplementations as Deployment Artifacts provided byNode Templates or Node Types. These DeploymentArtifacts have to be shipped and installed in the targetenvironment of the application to be instantiated.

Contrary, in the former example of the applica-tion server, the Node Template can be instantiatedby executing a script that invokes apt-get install. Inthis example, the actual implementation of the NodeTemplate is not provided with the model, but is down-loaded by apt-get. Such a script is called Implemen-tation Artifact because it is not solely deployed, butalso executed. An Implementation Artifact can beprocessed in three flavors: (i) It is executed by theTOSCA Runtime Environment in its own environ-ment, e.g., for accessing an operating system via thessh-client. (ii) The script that executes apt-get installis executed in the application’s target environment.(iii) Implementation Artifacts that refer, e.g., the inter-face of a cloud service provider run in their own envi-ronment and are called directly. To ship such modelsand implementations, TOSCA defines the Cloud Ser-vice Archive (CSAR). A CSAR is self-contained, con-tains or refers to all required artifacts, and is portable.

It is important that the Topology Template shownin Figure 3 clearly shows the structure of the applica-tion that gets provisioned without showing any detailsabout how this provisioning is technically executed.Consequently, such models can be understood moreeasily than technology-specific artifacts such as Chefcookbooks. Thus, TOSCA-based topology modelsprovide a suitable abstraction layer for our intend.

Page 7: Anything to Topology A Method and System …...deployment of a Web-frontend, combining the tech-nologies is error-prone and knowledge-intensive con-trary to combining only technology-agnostic

Crawler 2

Local Disk 3

Topologizer

Component Recognizer and

Builder

Topology Recognizer and

Builder

Chef Supermarket

Winery

TOSCAfy

6

MongoDB META

META META

5

4 7

1

Local Disk OpenTOSCA

8

Plugin Plugin Chef Plug-in

Figure 4: The prototype we implemented to validate the Topologize method.

6 PROTOTYPE

In Section 4, we introduced an architecture for theTopologize method, cf. Section 3. In this section,we describe our prototypical implementation for theTopologize method using TOSCA, cf. Section 5. Theprototype is implemented using Java 83. Details aboutthe Chef-specific implementation follow in Section 7.

For the step Specify Resources, the location def-inition (1) comprises the artifact type, location, andaccess protocol to a repository containing technology-specific artifacts and information. This definition (2)is provided to the Crawler, a Java program that ac-cesses the repository using an appropriate plug-in.The identified artifacts (3) are downloaded to disk andsome meta information, for example, name and ver-sion are stored. This corresponds to the step CrawlArtifacts. We implemented a corresponding plug-infor Chef, details are provided in the next section.

Within the Topologizer, the Component Recog-nizer and Builder accesses the disk and processesthe technology-specific artifacts. Each artifact isunpacked and its contents are analyzed, cf. Sec-tion 7. Then, (4) the artifact is transfered to a TOSCANode Type using TOSCAfy4. TOSCAfy is an open-source framework for analyzing artifacts and gen-erating TOSCA Node Types. The resulting CSAR(5) contains the Node Type and is downloaded todisk. The artifact is stored as Implementation Arti-fact at the Node Type. Furthermore, meta data re-sulting of the analysis (5) are stored in a MongoDB5

database. MongoDB was chosen because of its abil-ities to handle documents that are not mapped to arelational scheme, e.g., TOSCA XML and meta dataJSON documents. This corresponds to the step Ex-tract Components of the Topologize method.

3https://www.java.com/4https://github.com/toscafy/5https://www.mongodb.com/

Within the Topologizer, available structure in-formation in the found artifacts are processed bythe Topology Recognizer and Builder. For derivingtopology models, the exposed requirements are triedto be satisfied with the capabilities of componentswithin Node Types that are found by the ComponentRecognizer and Builder and located in the Compo-nents Repository. Thus, due to this exploration, topol-ogy models can be built. This is implemented with anadaption of the depth-first search (Tarjan, 1972).

Duplicate requirements stated by different arti-facts within the same topology model are eliminatedaccording (Binz et al., 2013b). This may lead to atopology model that does not reflect exactly the struc-ture of the original artifact, but is necessary for notrepresenting duplicate requirements, e.g., Java, mul-tiple times as a Node Template. Each derived topol-ogy model is serialized using TOSCA, cf. Section 5,for example, by using TOSCAfy to define and gen-erate a TOSCA Definitions containing the topologymodel, all components and relations, and the artifactresources. Finally, the CSAR (6) is stored to the localdisk. This corresponds to the step Derive TopologyModels of the Topologize method.

Often, derived topology models are not provision-able directly because of missing information. There-fore, the CSARs are not only stored locally on disk,but also (8) sent to the TOSCA Modeling Tool Win-ery6 enabling the graphical and user-friendly cus-tomization of the derived topology models. Thus,a user is enabled to inject credentials and even cus-tomize the whole TOSCA Topology Template. More-over, TOSCA completion algorithms can automati-cally complete the topology model if components aremissing (Hirmer et al., 2014), for example, to inject aCloud provider Node Template which is typically notdescribed in Chef cookbooks. This corresponds to thestep Make Topology Models Deployable.

6https://projects.eclipse.org/projects/soa.winery

Page 8: Anything to Topology A Method and System …...deployment of a Web-frontend, combining the tech-nologies is error-prone and knowledge-intensive con-trary to combining only technology-agnostic

Based on the standardized TOSCA metamodel,topology models generated out of technology-specificartifacts can be combined easily to compose morecomplex applications: the crawled topology mod-els can be used as building blocks in a technology-agnostic manner for the development of new appli-cations on the TOSCA-layer. The graphical model-ing tool Winery (Kopp et al., 2013) can be used, e.g.,to merge such topology models. Thus, the resultingand merged topology models may contain Implemen-tation Artifacts of different technologies, which is in-herently supported by the TOSCA standard. To de-ploy such merged topology models that contain NodeTemplates and Types having Implementation Arti-facts implemented in different technologies, we de-veloped a plan generator (Breitenbucher et al., 2014)within the OpenTOSCA Runtime Environment (Binzet al., 2013a) that is capable of executing differentkinds of artifacts. The generator supports technolo-gies, for example, Chef, Ansible, and Docker. Thus,with our approach, artifacts of different technologiescan be combined in a technology-agnostic manner onthe TOSCA-layer, while the resulting topology modelcan be deployed automatically by a single runtime –in this prototype using the OpenTOSCA ecosystem7.

7 CASE STUDY: CHEF

In Section 6, we introduced a prototype imple-menting generically the Topologize method. In thissection, we provide a case study showing how to ap-ply and implement the method for Chef cookbooks.

7.1 Specify the Chef Resources

In the first step, we specify the artifact location andtype. The Chef Software Inc. itself provides a pub-licly available repository for Chef cookbooks: theChef Supermarket8. Therefore, we specified theHTTP API of the Chef Supermarket as location andthe artifact type cookbook. For crawling the Chef Su-permarket, we implemented a respective plug-in thatprocesses the location specification.

7.2 Crawl for Chef Cookbooks

In the second step, the Crawler searches for artifacts atthe specified location. The plug-in for the Chef Super-market employs the JAX-RS Client API of Jersey9 to

7http://www.opentosca.org8https://supermarket.chef.io/9https://jersey.java.net/

download the cookbooks and retrieve meta informa-tion. Most of the information and artifacts were ac-cessible whilst some could not be processed becausethese cookbooks could not be downloaded.

7.3 Extract the Components

In the third step, the found cookbooks are analyzed toidentify components, extracting characteristics, andbuild TOSCA Node Types. A cookbook states itsattributes and templates, includes recipes and files,and provides necessary extensions to the chef-clientfor enabling it to instantiate the cookbook. There-fore, such a cookbook correlates to a Node Type. Thedetailed mapping of cookbooks to Node Types canbe found in (Wettinger et al., 2014b). For buildingTOSCA Node Types, we used TOSCAfy.

Besides the transformation to a Node Type, the re-quirement information of the cookbook have to beextracted. The chef version, ohai version, and de-pends are distinct requirements, but the supportedplatforms have to be treated differently: a cookbookinstance cannot be installed on, e.g., Windows andLinux at once, thus, these requirements have to betreated mutually exclusive. Chef has no cookbooksfor installing, e.g., Linux, thus, the platform informa-tion cannot be satisfied. These information are neededlater for deriving the topology models regarding thecapabilities and requirements of the cookbooks.

7.4 Derive Topology Models

In the third step, an extensive repository of TOSCANode Types is built up containing many cookbookswith requirements. Thus, by traversing transitively allrequirements, cf. Section 6, topology models are con-structed that represent the requirements graph of theinitial cookbook. Using TOSCAfy, the TOSCA Def-initions containing the constructed topology modeland the cookbooks are packaged to a CSAR.

In Figure 5, a Chef metadata.json file is depictedon the left side. This file defines the cookbook’sname, version, some meta information, e.g., its de-scription and license, and the cookbook’s dependen-cies that refer to other cookbooks. In this example, thejava cookbook defines the cookbooks apt, homebrew,and windows as requirements. Derived from theseinformation, the java component and its resolved re-quirements are represented as Node Templates in aTopology Template, of which a snippet is depicted onthe right. Further resolved requirements of cookbooksare omitted for the sake of brevity.

Page 9: Anything to Topology A Method and System …...deployment of a Web-frontend, combining the tech-nologies is error-prone and knowledge-intensive con-trary to combining only technology-agnostic

{"name": "java","description": "Installs Java runtime.","version": "1.47.0","license": "Apache 2.0", "dependencies": {

"apt": ">= 0.0.0", "homebrew": ">= 0.0.0","windows": ">= 0.0.0"

}}

dependsOn

java-1.47.0_NodeTemplate(java-1.47.0_NodeType)

windows-2.1.1_NodeTemplate(windows-2.1.1_NodeType)

apt-5.0.0_NodeTemplate(apt-5.0.0_NodeType)

metadata.json TOSCA Topology Template Snippet

Figure 5: A Chef metadata.json file mapped to a TOSCA Topology Template representation.

7.5 Make the Topology ModelsDeployable

The constructed topology models are not provision-able, because Chef presumes a bootstrapped environ-ment that is not installed by Chef itself. Thus, detailedinformation about the infrastructure layer are miss-ing. Within the OpenTOSCA ecosystem, the fifth stepMake Topology Models Deployable can be applied byusing Winery, cf. Section 6. Using the topology mod-eler of Winery, the generated topology model can becompleted by adding customized infrastructure infor-mation. Thus, using the Topologize method, cook-books can be used to provision applications to notbootstrapped environments. Also, topology modelscan be composed to arbitrary complex topology mod-els without needing to have expertise of the composedcookbooks. Additionally, Winery serves as TOSCARepository for the provisioning engine OpenTOSCAthat enables automated provisioning.

7.6 Evaluation Results

In this section, we showed how the Topologizemethod can be applied to Chef. Using our proto-type and Topologize method, we crawled 3,191 Chefcookbook files from the Chef Marketplace on the17th February 2017 and derived 2,325 topology mod-els from the cookbook artifacts. In Figure 6, the sizeof each topology model, i.e., the amount of the con-tained components, is related to the amount of topol-ogy models derived by the prototype. With a big gap,the most found topology models are singletons whosecomponent either is not stating requirements or thestated requirements are older, not crawled versions ornot processable cookbooks that could not be resolved.

In Figure 7, we address the time that it takes toanalyze a cookbook for which a topology model shallbe constructed and – basing on that knowledge – de-rive and construct the topology model from the initial

1413

405246

79105

35

1610

4 4 5

21

1

4

16

64

256

1024

1 2 3 4 5 6 7 8 9 10 11 12 13

Sum

of

Top

olo

gy M

od

els

Amount of Components Inside of Each Topology Model

Figure 6: Relation of the amount of Node Templates con-tained in a topology model to the amount of topologies.

7.3 9.5 11.6 14.8 17.9 20.0 22.2

31.3 33.427.5

35.8

47.4

69.4

0

10

20

30

40

50

60

70

80

1 2 3 4 5 6 7 8 9 10 11 12 13

Du

rati

on

in S

eco

nd

s

Amount of Components Inside of Each Topology Model

Median

Figure 7: Relation of the amount of Node Templates con-tained in a topology model to the duration in seconds foranalyzing the artifact and constructing the respective topol-ogy. The stated durations are result of 10 measurements.

cookbook and all requirements. These values resultfrom 10 measurements on Ubuntu 16.04 virtual ma-chines with 8 cores having 3.0GHz and 16 GB RAM.At first glance, a duration of up to 70 seconds fortopology models seems to be vast. But, as our goal ishaving an efficient approach for transforming artifactsinto technology-agnostic topologies, the Topologizemethod is suitable, because the time-costly construc-tion of topology models has to be done only once.

Page 10: Anything to Topology A Method and System …...deployment of a Web-frontend, combining the tech-nologies is error-prone and knowledge-intensive con-trary to combining only technology-agnostic

8 RELATED WORK

Cloud Computing enables many benefits, e.g., on-demand provisioning and resource sharing (Leymann,2009). Whilst some reduce Cloud Computing tomerely packaging functionality into a virtual machineor container, in fact, the application usually forms acomplex structure that has to provide distinct prop-erties (Leymann et al., 2016). Creating a more de-tailed view of such architectures can be visualized as agraph of components and their relations, detailed withinformation about their functional capabilities and re-quirements, policies of non-functional requirements,interfaces, properties for customization, and defini-tions about included or referenced resources. Sucha graph is called a topology and is defined in theindustry-driven standard Topology and OrchestrationSpecification for Cloud Applications (OASIS, 2013a;OASIS, 2013b) and refined in (OASIS, 2015).

TOSCA defines a topology model that describesin detail the structure of an application. A topologyconsists of components that are related to each other,e.g., a web application is hosted on an applicationserver. The components, relations, and other elementsare typed to foster the reusability. Thus, TOSCA en-ables manifold benefits: the TOSCA Definitions maycontain a model of components and structures withclear semantics that enables modeling complex ap-plications in a visual encoded way (Breitenbucheret al., 2012) using specific tooling. With Winery,the whole complexity of TOSCA is accessible to hu-man modelers in a visual and guided manner (Koppet al., 2013). By matching not satisfied requirementsof components within the topology with capabilitiesof other components, topology models can be com-pleted (Hirmer et al., 2014). Thus, TOSCA enableshumans to understand and model more easily suchcomplex applications. Additionally, TOSCA enablesthe automation of management of arbitrary complexapplications. For application topologies consisting ofcomponents with well-defined provisioning and man-agement functionality, the orchestrated provisioninglogic can be generated (Breitenbucher et al., 2014;Eilam et al., 2011). For arbitrary complex and cus-tomized applications, e.g., prepared workflows can beprovided directly with the Service Template.

Contrary, Enterprise Topology Graphs (ETGs) de-fine a formal model describing the structure of run-ning enterprise IT to support tasks, e.g., consolida-tion, migration, or outsourcing by enabling provengraph algorithms on the model (Binz et al., 2012a;Binz et al., 2012b). Such tasks are time-consumingand error-prone if the underlying ETG has to be cre-ated manually. Therefore, such ETGs can be gener-

ated by an automated discovery that crawls the run-ning enterprise IT (Binz et al., 2013b). But, ETGs donot enable modeling and inferring automated provi-sioning and management of application instances.

The OpenTOSCA ecosystem enables interpretingTOSCA models and automated provisioning of themodeled applications (Binz et al., 2013a). Such provi-sioning can be divided into two opposing approaches:with imperative provisioning, all steps necessary forprovisioning the application are modeled in full de-tail, e.g., by using workflow models (Breitenbucheret al., 2013; Keller and Badonnel, 2004; Mietzneret al., 2009). The declarative provisioning enablesmodeling solely the application with its characteris-tics. Both approaches require a runtime that is ableto interpret the models, infer management functional-ity, and execute it. There are various approaches forautomating the provisioning of applications: Ansible,Chef, Planit, and Puppet, to just name some (Arshadet al., 2007; Mohaan and Raithatha, 2014; Taylor andVargo, 2014; Uphill, 2014). But, all these approachesare domain- and technology-specific, thus, the userneeds specific expertise and there is a lack of toolingintegrating the heterogeneity of all these technologies.

Although, workflow model based orchestration isa working approach to integrate diverse provisioningtechnologies, the technologies have to be preparedfor being orchestrated beforehand (Wettinger et al.,2014a; Wettinger et al., 2014b). All artifacts have tobe provisionable and manageable, thus, have to ex-pose their interfaces. Nevertheless the variety of inter-faces, with the Any2API10 approach the functionalityof artifacts can be wrapped with high-level APIs, e.g.,RESTful web services (Wettinger et al., 2015). Thus,generating distinct models of all components enablesto populate topology models, as shown in this paper.

Subsequent to the Any2API approach,TOSCAfy11 is a publicly available, open-sourceframework that provides two major capabilities: (i)retrieving and analyzing existing technology-specificartifacts, e.g., Chef cookbooks and Docker containerimages to extract and normalize their metadata; (ii)generating portable Cloud Service Archives (CSARs)comprising the artifacts. By using TOSCAfy, CSARsare no longer maintained manually as source arti-facts, but they are generated in a repeatable manner.TOSCAfy is implemented using JavaScript based onNode.js. Moreover, it is integrated with Any2API.

But, before applying the aforementioned ap-proaches, artifacts have to be obtained beforehand.The basic idea of conventional Web crawling followsa straightforward process: “(1) select a URL to crawl,

10http://www.any2api.org11https://github.com/toscafy

Page 11: Anything to Topology A Method and System …...deployment of a Web-frontend, combining the tech-nologies is error-prone and knowledge-intensive con-trary to combining only technology-agnostic

(2) fetch and parse page, (3) save the important con-tent, (4) extract URLs from page, (5) add URLs toqueue, and (6) repeat” (Matsudaira, 2014). Follow-ing this, a broad variety of crawling approaches ex-ist in research and industry. Consequently, imple-menting a small-scale crawler, e.g., to fetch a dis-tinct set of documents from the Web and store them‘as is’ is a mere programming challenge. How-ever, it is not trivial to implement large-scale crawlersthat repeatedly fetch large sets of documents to se-mantically inspect and normalize their content, de-tect updates, and classify them. Therefore, severalresearch efforts focus on the design of highly scal-able and distributed crawlers to improve performancein large-scale crawling scenarios (Boldi et al., 2004;Da Silva et al., 1999; Shkapenyuk and Suel, 2002;Heydon and Najork, 1999; Thelwall, 2001; Edwardset al., 2001). Thus, there are two major categoriesof crawlers: (i) general-purpose crawlers that fetchand inspect any kind of document, e.g., to popu-late a search engine or analyze data using miningtechniques (Matsudaira, 2014; Thelwall, 2001); (ii)specialized and focused crawlers that only inspectdistinct documents (Chakrabarti et al., 1999). Fo-cused crawling is utilized, e.g., to establish a domain-specific knowledge base as it is the purpose of thispaper. Therefore, a domain-specific and specializedcrawling framework is presented. However, up tonow, none of the existing works analyzes applicationstructure information inside of crawled artifacts ofconfiguration management technologies. Therefore,the proposed approach is a novel contribution.

9 CONCLUSION

We presented Topologize that enables (i) crawlingtechnology-specific artifacts, (ii) extracting and ab-stracting contained component information, and (iii)inferring technology-agnostic topology models that(iv) are provisionable in an automated manner. Thesemodels are serialized in TOSCA that enables themodeling and provisioning of complex, technology-specific applications whilst keeping a modular, cus-tomizable, and technology-agnostic topology model.The topologies are generated by satisfying dependen-cies of contained components. Benefits of topologiesare (a) a technology-agnostic representation of thetechnology-specific implications by showing a com-ponent’s transitive requirements, (b) automated main-taining of such topologies, (c) supporting a user in un-derstanding and selecting artifacts and topology mod-els, and (d) enabling customization and combinationof topologies in a technology-agnostic manner.

We validated our Topologize method and archi-tecture by implementing a prototype and applyingTopologize to Chef. On the 17th February 2017, wecrawled 3,191 Chef cookbooks at the Chef Market-place and transformed these technology-specific ar-tifacts to technology-agnostic components serializedin TOSCA. We derived 2,325 topologies that com-prise the transitive dependencies as far as they couldbe satisfied. These constructed topologies are of sizeup to 13 components within one topology model. Weevaluated our prototype regarding the analyzing andconstructing duration of topologies and packaging asCSAR that took between 7.3 seconds 69.4 secondsdepending on the size of the topology.

In this paper, we focused our case study on Chefthat is a well-known configuration management sys-tem. In the future, we plan to conduct case studieswith other technologies, e.g., Docker. Also, for speed-ing up the analysis and the inferring of topology mod-els, we plan to improve the prototype, e.g., by paral-lelizing the execution of the depth-first search.

ACKNOWLEDGMENTS

This work was partially funded by the BMWiproject SePiA.Pro (01MD16013F).

REFERENCES

Arshad, N., Heimbigner, D., and Wolf, A. L. (2007). De-ployment and Dynamic Reconfiguration Planning ForDistributed Software Systems. Software Quality Jour-nal, 15(3).

Bellavista, P., Corradi, A., Foschini, L., and Pernafini, A.(2013). Towards an Automated BPEL-based SaaSProvisioning Support for OpenStack IaaS. ScalableComputing, 14(4).

Binz, T., Breitenbucher, U., Haupt, F., Kopp, O., Leymann,F., Nowak, A., and Wagner, S. (2013a). OpenTOSCA- A Runtime for TOSCA-based Cloud Applications.In Proceedings of the 11th International Conferenceon Service-Oriented Computing. Springer.

Binz, T., Breitenbucher, U., Kopp, O., and Leymann, F.(2013b). Automated Discovery and Maintenance ofEnterprise Topology Graphs. In Proceedings of the 6th

IEEE International Conference on Service OrientedComputing & Applications. IEEE.

Binz, T., Fehling, C., Leymann, F., Nowak, A., andSchumm, D. (2012a). Formalizing the Cloud throughEnterprise Topology Graphs. In Proceedings of 2012IEEE International Conference on Cloud Computing.IEEE.

Binz, T., Leymann, F., Nowak, A., and Schumm, D.(2012b). Improving the Manageability of Enterprise

Page 12: Anything to Topology A Method and System …...deployment of a Web-frontend, combining the tech-nologies is error-prone and knowledge-intensive con-trary to combining only technology-agnostic

Topologies Through Segmentation, Graph Transfor-mation, and Analysis Strategies. In Proceedings of2012 Enterprise Distributed Object Computing Con-ference. IEEE.

Boldi, P., Codenotti, B., Santini, M., and Vigna, S.(2004). Ubicrawler: A Scalable Fully Distributed WebCrawler. Software: Practice and Experience, 34(8).

Breitenbucher, U., Binz, T., Kepes, K., Kopp, O., Leymann,F., and Wettinger, J. (2014). Combining Declarativeand Imperative Cloud Application Provisioning basedon TOSCA. In International Conference on Cloud En-gineering. IEEE.

Breitenbucher, U., Binz, T., Kopp, O., Leymann, F., andSchumm, D. (2012). Vino4TOSCA: A Visual Nota-tion for Application Topologies based on TOSCA. InOn the Move to Meaningful Internet Systems: OTM2012. Springer.

Breitenbucher, U., Binz, T., Kopp, O., Leymann, F., andWettinger, J. (2013). Integrated Cloud ApplicationProvisioning: Interconnecting Service-Centric andScript-Centric Management Technologies. In On theMove to Meaningful Internet Systems: OTM 2013Conferences. Springer.

Brown, A. B. and Hellerstein, J. L. (2005). Reducing thecost of IT operations: is automation always the an-swer? In Proceedings of the 10th Conference on HotTopics in Operating Systems. USENIX.

Chakrabarti, S., Van den Berg, M., and Dom, B. (1999).Focused crawling: A new approach to topic-specificweb resource discovery. Computer Networks, 31(11).

Da Silva, A. S., Veloso, E. A., Golgher, P. B., Ribeiro-Neto,B., Laender, A. H., and Ziviani, N. (1999). Cobweb– a crawler for the brazilian web. In Proceedings ofthe String Processing and Information Retrieval Sym-posium and International Workshop on Groupware.IEEE.

Edwards, J., McCurley, K., and Tomlin, J. (2001). An adap-tive model for optimizing performance of an incre-mental web crawler. In Proceedings of the 10th In-ternational Conference on World Wide Web. ACM.

Eilam, T., Elder, M., Konstantinou, A., and Snible, E.(2011). Pattern-based Composite Application Deploy-ment. In Proceedings of the 12th IFIP/IEEE Inter-national Symposium on Integrated Network Manage-ment. IEEE.

Heydon, A. and Najork, M. (1999). Mercator: A Scalable,Extensible Web Crawler. World Wide Web, 2(4).

Hirmer, P., Breitenbucher, U., Binz, T., and Leymann, F.(2014). Automatic Topology Completion of TOSCA-based Cloud Applications. In Proceedings des Cloud-Cycle14 Workshops auf der 44. Jahrestagung derGesellschaft fur Informatik e.V.

Keller, A. and Badonnel, R. (2004). Automating the Provi-sioning of Application Services with the BPEL4WSWorkflow Language. In Proceedings of the 15th

IFIP/IEEE International Workshop on DistributedSystems: Operations and Management. Springer.

Kopp, O., Binz, T., Breitenbucher, U., and Leymann, F.(2013). Winery – A Modeling Tool for TOSCA-based

Cloud Applications. In Proceedings of the 11th Inter-national Conference on Service-Oriented Computing.Springer.

Leymann, F. (2009). Cloud Computing: The Next Revolu-tion in IT. In Proceedings of the 52th Photogrammet-ric Week. Wichmann Verlag.

Leymann, F., Fehling, C., Wagner, S., and Wettinger, J.(2016). Native Cloud Applications: Why Virtual Ma-chines, Images and Containers Miss the Point! In Pro-ceedings of the 6th International Conference on CloudComputing and Service Science. SciTePress.

Matsudaira, K. (2014). Capturing and structuring datamined from the web. Communications of the ACM,57(3).

Mietzner, R., Unger, T., and Leymann, F. (2009). Cafe: AGeneric Configurable Customizable Composite CloudApplication Framework. In On the Move to Meaning-ful Internet Systems: OTM 2009. Springer.

Mohaan, M. and Raithatha, R. (2014). Learning Ansible.Packt Publishing.

OASIS (2013a). Topology and Orchestration Specificationfor Cloud Applications (TOSCA) Primer Version 1.0.Organization for the Advancement of Structured In-formation Standards (OASIS).

OASIS (2013b). Topology and Orchestration Specificationfor Cloud Applications (TOSCA) Version 1.0. Organi-zation for the Advancement of Structured InformationStandards (OASIS).

OASIS (2015). TOSCA Simple Profile in YAML Version1.0. Organization for the Advancement of StructuredInformation Standards (OASIS).

Shkapenyuk, V. and Suel, T. (2002). Design and Imple-mentation of a High-Performance Distributed WebCrawler. In Proceedings of the 18th InternationalConference on Data Engineering. IEEE.

Tarjan, R. (1972). Depth-first search and linear graph algo-rithms. SIAM journal on computing, 1(2).

Taylor, M. and Vargo, S. (2014). Learning Chef: AGuide to Configuration Management and Automation.O’Reilly.

Thelwall, M. (2001). A Web Crawler Design for Data Min-ing. Journal of Information Science, 27(5).

Uphill, T. (2014). Mastering Puppet. Packt Publishing.Wettinger, J., Binz, T., Breitenbucher, U., Kopp, O., Ley-

mann, F., and Zimmermann, M. (2014a). UnifiedInvocation of Scripts and Services for Provisioning,Deployment, and Management of Cloud ApplicationsBased on TOSCA. In Proceedings of the 4th Interna-tional Conference on Cloud Computing and ServicesScience. SciTePress.

Wettinger, J., Breitenbucher, U., and Leymann, F. (2014b).Standards–based DevOps Automation and IntegrationUsing TOSCA. In Proceedings of the 7th Interna-tional Conference on Utility and Cloud Computing.IEEE.

Wettinger, J., Breitenbucher, U., and Leymann, F. (2015).Any2API - Automated APIfication. In Proceedings ofthe 5th International Conference on Cloud Computingand Service Science. SciTePress.


Recommended