EXPLORING THE INTERNAL STATE OF USER
INTERFACES BY
COMBINING COMPUTER VISION TECHNIQUES
USING SIKULI
Germiya K Jose
4MCA
Christ University Bangalore
AGENDA
Introduction
Basics of Python
Sikuli Script
How Sikuli Works
Technical capabilities
Hello world Program
Predefined Functions
Disadvantages
Exception Handling
Special Keys
Conclusion
IF YOU .............
Are not good in programming !
Want to avoid Repeating, Boring or Annoying
Coding ?
Want to automate something but don’t have an
access to its source ?
WHAT DOES IT DO ?
Single click to run a series of clicking and typing
Make boring task easier and quicker
Testing
WHY SIKULI?
Sikuli automates anything you see on the screen.
It uses image recognition to identify and control
GUI components.
It is useful when there is no easy access to a GUI's
internal or source code.
Sikuli is an open-source research project originally
started at the User Interface Design Group at MIT.
Sikuli visual approach to search and automation
of graphical user interfaces using screenshots.
Sikuli allows users to take a screenshot of a GUI
element (such as a toolbar button, icon, or dialog
box) and query a help system using the screenshot
instead of element’s name.
Sikuli also provides a visual scripting API for
automating GUI interactions, using screenshot
patterns to direct mouse and keyboard events.
SIKULI SCRIPT
Sikuli automates the interaction with a GUI by
executing it, recognizing widgets .
such as buttons and text fields from their visual
appearance on the screen, and interacting with
those widgets by simulating mouse pointer or
keyboard actions .
Sikuli uses python for scripting
PYTHON FEATURES
Easy-to-learn
Easy-to-read
A broad standard library
Interactive Mode
Portable
Databases
Comments # symbol used
Quoting single (')
double (")
triple (''' or """') [span the string across multiple lines]
List
Python's compound data type
lists are similar to arrays
list can be of different data type
print (“ ",list[1:3]) or print list[2:] or print list[0] or print list
del lis[2]
Dictionary
kind of hash table type
consist of key-value pairs
tinydict = {'name': 'john','code':6734, 'dept': 'sales'}
membership operator
In
Evaluates to true if it finds a variable in the specified sequence and
false otherwise.
not in
Evaluates to true if it does not finds a variable in the specified
sequence and false otherwise.
identity operator
Is
Evaluates to true if the variables on either side of the operator point to
the same object and false otherwise.
is not
Evaluates to false if the variables on either side of the operator point
to the same object and true otherwise.
LOOK:
Sikuli uses a system API to grab the pixel data from
the screen buffer and analyzes it.
This basic system function for screen capture is
available on most modern platforms
including Windows, Mac, Linux and Android.
RECOGNIZE:
Sikuli “recognizes” widgets on a GUI using
pattern matching based on visual appearance.
There are two use cases that must be dealt with
separately:
Recognizing a specific widget and recognizing a
class of widgets.
INTERACT:
Sikuli uses the Java Robot class to simulate mouse
and keyboard interaction.
After recognizing a widget, Sikuli knows that
widget’s location on the screen, and can move the
pointer to that location.
At that location, it can issue a click command which
will effectively click on that widget
BASIC EXPLORATION STRATEGIES
(a) random exploration first identifies all widgets on
the current screen image and interacts with one
uniformly at random. The length of each interaction
is a user-controlled parameter.
(b) Depthfirst exploration systematically explores
all interactions upto a given length, assuming that
the system is deterministic.
SIKULI SCRIPT
Sikuli Script is a Jython and Java library thatautomates GUI interaction using image patterns todirect keyboard/mouse events.
The core of Sikuli Script is a Java library thatconsists of two parts:
java.awt.Robot, which delivers keyboard andmouse events to appropriate locations
C++ engine based on OpenCV, which searchesgiven image patterns on the screen.
THE STRUCTURE OF A SIKULI
SOURCE/EXECUTABLE SCRIPT (.SIKULI, .SKL)
A Sikuli script (.sikuli) is a directory that consists of a Python source file (.py), and all the image files (.png) used by the source file.
All images used in a Sikuli script are simply a path to the .png file in the .sikuli bundle. \
Therefore, the Python source file can also be edited by any text editor.
While saving a script using Sikuli IDE, an extra HTML file is also created in the .sikuli directory so that users can share the scripts on the web easily.
SIKULI IDE
Sikuli IDE edits and runs Sikuli source scripts. Sikuli
IDE integrates screen capturing and a custom text
editor (SikuliPane) to optimize the usability of
writing a Sikuli script.
To show embedded images in the SikuliPane, all
string literals that ends with ”.png” are replaced by a
custom JButton object, ImageButton.
If a user adjusts the image pattern’s similarity, a
Pattern() is automatically constructed on top of the
image.
HELLO WORLD (WINDOWS)
Let us begin with a customary Hello World
example!
You will learn how to capture a screenshot of a GUI
element and write a Sikuli Script to do two things:
1. Click on that element
2. Type a string in that element
THE GOAL OF THE HELLO WORLD SCRIPT IS TO
AUTOMATICALLY TYPE “HELLO WORLD” INTO THE START
MENU SEARCH BOX, LIKE THIS:
CONTROLLING SIKULI SCRIPTS AND THEIR BEHAVIOR
setShowActions(False | True) If set to True, when a script is run, Sikuli shows a visual effect
(a blinking double lined red circle) on the spot where the action will take place before executing actions
exit([value ]) Stops the script gracefully at this point. The value is returned
to the calling environment.
Settings.MinSimilarity The default minimum similiarty of find operations. Sikuli
searches the region using a default minimum similarity of 0.7.
Settings.MoveMouseDelay Control the time taken for mouse movement to a target
location by setting this value to a decimal value (default 0.5). The unit is seconds. Setting it to 0 will switch off any animation.
INTERACTING WITH THE USER AND OTHER APPLICATIONS
PopUps and input
popup(text[, title ])
Parameters
text – text to be displayed as message
title – optional title for the messagebox (default: Sikuli
Info)
Example:
popup("Hello World!\nHave fun with Sikuli!")
popError(text[, title ])
Same as popup() but with a different title (default Sikuli
Error) and alert icon.
Example:
popError("Uuups, this did not work")
A dialog box that looks like below will popup
popAsk(text[, title ])
Returns True if user clicked Yes, False otherwise
Same as popup() but with a different title (default Sikuli
Decision) and alert icon.
There are 2 buttons: Yes and No and hence the
message text should be written as an appropriate
question.
Example:
answer = popAsk("Should we really continue?")
if not answer:
exit(1)
input([msg ][, default ][, title ][, hidden ])
Display a dialog box with an input field, a Cancel button, and
an OK button.
The script then waits for the user to click either the Cancel
or the OK button.
Parameters
msg – text to be displayed as message (default: nothing)
default – optional preset text for the input field
title – optional title for the messagebox (default: Sikuli Input)
hidden – (default: False) if true the entered characters are shown as
asterisks
Returns
the text, contained in the input field, when the user clicked
Ok
None, if the user pressed the Cancel button or closed the
dialog
inputText([msg ][, title ][, lines ][, width ])
Parameters
msg – text to be displayed as message (default: nothing)
title – optional title for the messagebox (default: Sikuli Text)
lines – how many lines the text box should be high (default: 9)
width – how many characters the box should have as width
(default: 20)
Returns the possible multiline text entered by the user
(might be empty)
EXAMPLE:
story = inputText("please give me some lines of
text")
lines = story.split("\n") # split the lines in the list lines
for line in lines:
print line
select([msg ][, title ][, options ][, default ])
Parameters
msg – text to be displayed as message (default:
nothing)
title – optional title for the messagebox (default: Sikuli
Selection)
options – a list of text items (default: empty list, nothing
done)
default – the preselected list item (default: first item)
Returns the selected item (might be the default)
EXAMPLE:
items = ("nothing selected", "item1", "item2", "item3")
selected = select("Please select an item from the list",
options = items)
if selected == items[0]:
popup("You did not select an item")
exit(1)
STARTING AND STOPPING OTHER APLLICATIONS AND
BRINGING THEIR WINDOWS TO FRONT
Here we talk about the basic features of opening or
closing other applications and switching to them
(bring their windows to front).
openApp(application)
openApp("cmd.exe")
openApp("c:\\Program Files\\Mozilla Firefox\\firefox.exe")
switchApp(application) Switch to the specified application.
switchApp("cmd.exe")
switchApp("c:\\Program Files\\Mozilla Firefox\\firefox.exe")
closeApp(application) Close the specified application.
closeApp("cmd.exe")
closeApp("c:\\Program Files\\Mozilla Firefox\\firefox.exe")
run(command)
Run command in the command line
Parameters command – a command that can be run from the
command line.
This function executes the command and the script waits for its
completion.
EXCEPTION HANDLING
setThrowException(False | True) By using this method you control, how Sikuli should handle not
found situations in this region.
Parameters True – all subsequent find operations (explicit or implicit) will
raise exception FindFailed(which is the default when a script is
started) in case of not found.
False – all subsequent find operations will not raise exception
FindFailed. Instead, explicit find operations such as
Region.find() will return None. Implicit find operations (action
functions) such as Region.click() will do nothing and return 0.
getThrowException()
Returns True or False
Get the current setting as True or False (after start of
script, this is True by default) in this region.
SPECIAL KEYS
The methods supporting the use of special keys are
type(), keyDown(), and keyUp(). String concatenation with with other text or other key constants
is possible using “+”.
type("some text" + Key.TAB + "more text" + Key.TAB +
Key.ENTER)
or eqivalent
type("some text\tmore text\n")
miscellanous keys ENTER, TAB, ESC, BACKSPACE, DELETE, INSERT, SPACE
function keys F1, F2, F3, F4, F5, F6, F7.......
navigation keys HOME, END, LEFT, RIGHT, DOWN, UP, PAGE_DOWN,
PAGE_UP
special keys PRINTSCREEN, PAUSE, CAPS_LOCK, SCROLL_LOCK,
NUM_LOCK
numpad keys NUM0, NUM1, NUM2, NUM3.............
SEPARATOR, ADD, MINUS, MULTIPLY, DIVIDE
modifier keys ALT, CMD, CTRL, META, SHIFT, WIN
These modifier keys cannot be used as a key modifier with
functions like type(), rightClick(), etc. They can only be used with
keyDown() and keyUp(). If you need key modifiers, use
KeyModifier instead.
type(Key.ESC, KeyModifier.CTRL + KeyModifier.ALT)
or equivalent
type(Key.ESC, KeyModifier.CTRL | KeyModifier.ALT)
They should only be used in the modifiers parameter with
functions like type(), rightClick(), etc.
They should never be used with keyDown() or keyUp().
CONCLUSION
Sikuli means “ God’s eye ” in mexican language .
Sikuli currently uses Python as the scripting
language.
Sikuli is a visual technology to search and
automates GUI using images (screenshots).
It automates anything you see on the screen
without internal API ‘s support
REFERENCE
SikuliX Documentation Release 1.1.0-Beta1 byRaimund
Hocke aka RaiMan ( October 19, 2014 ) .
Abstracting Perception and Manipulation in End-User Robot
Programming using Sikuli (IEEE) .
Exploring the Internal State of User Interfaces by Combining
Computer Vision Techniques with Grammatical Inference
(IEEE) .
Sikuli: Using GUI Screenshots for Search and Automation
(IEEE) .