API Usage Pattern Extraction using Semantic Similarity

Post on 17-Jun-2015

643 views 1 download

Tags:

description

An enthusiastic project for API usage pattern extraction exploiting semantic similarity among API usage code examples.

transcript

SEMANTIC NETWORK BASED API USAGE PATTERN EXTRACTION & LEARNING

Mohammad Masudur Rahman

mor543@mail.usask.ca

Department of Computer Science

University of Saskatchewan

PRESENTATION OVERVIEW

Introduction Motivating Example Background Concepts Proposed Approach Semantic Network of Source code API Usage Pattern Extraction Pattern Learning & Visualization Experimental Results & Discussions Threats to Validity Conclusion & Future Works

INTRODUCTION

API (Application Programming Interface) Libraries

API Documentation, API Browser, forums API Usage learning for developers Existing projects using APIs API Usage Patterns

WHAT IS API USAGE PATTERN?

A frequent and consistent sequence of API method calls and field accesses

Performs a particular programming task. Widely used in multiple projects Widely accepted by developers community

API USAGE PATTERN

BIG QUESTION?

How to extract the API usage patterns from the source code?

SEMANTIC WEB OR NETWORK

What is the living place of the author of a particular software manual?

MOTIVATING EXAMPLE

MOTIVATING EXAMPLE

RESEARCH QUESTIONS

RQ 1: Can semantic network technologies represent the semantics of OO source code properly?

RQ 2: Can this representation be used for API usage pattern extraction and learning?

BACKGROUND CONCEPTS

API Usage Patterns API Usage Violation & Anomalies Semantic Web Semantic Network of Source Code Resource Description Framework (RDF) RDF Statement or Triples

RDF TRIPLE (BUILDING BLOCK OF SEMANTIC WEB OR NETWORK)

Subject Predicate Object

PROPOSED APPROACH FOR API USAGE PATTERN EXTRACTION & LEARNING

PROPOSED APPROACH FOR API USAGE PATTERN EXTRACTION & LEARNING

API Class List

OSS Projects

Contains API ?

Source code parser

Semantic Network Builder

API Pattern Explorer

API Usage Pattern

Manager

RDF Pattern Visualizer

Pattern Source Skeleton Builder

1

2

3 4 5

6

9 8

7

API Classes

Source files

No

Yes

Parsed Expressions

RDF Files

Patterns

Pattern Pattern

SOURCE CODE SEMANTIC NETWORK

AST Parser (Javaparser)

JavaExpressions

Apache Jena Framework

API Expression selection rules

RDF Maker

Java Source code

RDF Network

RDF Triples

API USAGE PATTERN EXTRACTION

All Usages of an API Class

Candidate API usage

Patterns

Common Sub-graph Selection

Pattern Score >

threshold ?

No

Selected API Usage Patterns

Yes

Discarded

EXPERIMENTAL RESULTS

25 Open source Projects 3 API libraries (java.io, java.util, java.awt) 250 API classes selected API usages found for 113 API classes Pattern found for 76 API classes Total 776 patterns

API USAGE PATTERNS

SOURCE CODE SKELETON

Fig: BufferedInputStream Usage Pattern

EXPERIMENTAL RESULTS

Project #Class #M &C

#ATCF #ADCF #ATPF #ADPF

Ant-Contrib

186 1388 96 23 1865 280

AOI 461 6489 218 55 1651 494

Groimp 1202 13875 132 41 1632 407

JFreechart 1059 12368 507 38 6841 410

JHotdraw7 689 7330 310 49 2547 462

#M & C =Methods & Constructors, #ATCF=Total API class, #ADCF=Distinct API class, #ATPF=Total API Patterns found, #ADPF=Distinct API Patterns found

PATTERNS PER CLASS

Fig: # patterns extracted per class comparison

RESULTS DISCUSSION

RQ 1: Can semantic network technologies represent the semantics of OO source code properly?

Graph-based API Usage Extraction by Nguyen et al, FSE, 2009 : Incomplete semantics for edges and attributes

Source code ontology by Wursch et al, ICSE, 2010 : Does not represent the complete source code

The proposed approach captures expression level syntax and semantics

Focuses on API usage patterns

RESULTS DISCUSSION

RQ 2: Can this representation be used for API usage pattern extraction and learning?

Successfully extracts 776 patterns for 76 API classes from 25 open source projects

A potential approach to be explored more for API usage pattern exploration

Visualization of RDF network helps in learning Source code as visual entities rather than

lines More comprehensive idea about OO source

code Applicable for complex OO relationships Very useful for quick learning

THREATS TO VALIDITY

Representing complete semantics: a non-trivial task.

More expressions for more accurate representation

RDF pattern visualization within limited display

Need to be introduced with RDF convention

CONCLUSION & FUTURE WORKS

Applicability of semantic web technologies for API usage pattern extraction

Semantic representation for learning by the developers

Real world user study Extracted patterns for automatic code

completion in the IDE. Extracted patterns for API violation and

anomaly detection

THANK YOU!!!

REFERENCES[1] Semantic web diagram.URL http://www.w3.org/ Talks/2002/10/16-sw/slide7-0.html.[2] Tung Thanh Nguyen, Hoan Anh Nguyen, NamH.Pham, JafarM.Al-Kofahi, and

TienN.Nguyen. Graph-based mining of multiple object usage patterns. In Proc. ESEC/FSE, 2009, pages 383-392.

[3] M.Wursch, G.Ghezzi, G.Reif,and H.C.Gall. Supporting developers with natural language queries. In Proc. ICSE, 2010,pages 165-174

[4] Tao Xie and Jian Pei. Mapo:mining api usages from open source repositories. In Proc. MSR, 2006, pages 574-57[5] Semantic web technology.URL http://www.w3.org/ 2001/sw[6] Visual learning style.URL http://www.learning-styles-online.com/style/visual-

spatial.[7] Apache Jena framework.URL http://jena.apache.org/.[8] Javaparser-java 1.5 parser and ast.URL http://code.google.com/p/javaparser/.[9] RDF-gravity tool.URL http://semweb.salzburgresearch.at/apps/rdf-gravity/.