Date post: | 02-Dec-2014 |
Category: |
Technology |
Upload: | naver-d2 |
View: | 983 times |
Download: | 4 times |
DEVIEW 2014
Lessons from developing a web browser on Raspberry PiDEVIEW 2014ChangSeok [email protected]
1
DEVIEW 2014 2
About me
ChangSeok Oh
• IRC nickname : changseok
• Opensource hacker : WebKit committer, GNOME foundation member
• Affiliations : Collabora Ltd. (past SAMSUNG Electronics)
• Experience : SAMSUNG SmartTV, TIZEN, SmartTV Alliance SDK, WebKit-clutter, Raspberry Pi etc.
DEVIEW 2014 3 http://www.commitstrip.com/en/2014/05/07/the-truth-behind-open-source-apps/
DEVIEW 2014 4
Optimization is EVERYWHERE.
DEVIEW 2014 5
Do developers ever have enough memory and performance? We are always hungry.
http://images.google.com
DEVIEW 2014 6
Optimization
• Dictionary definition
‣ Make the best or most effective use of a situation or resource
‣ In short, Improve performance & Use resources efficiently
• Usually difficult and tedious works
• Depends on developer’s experience & know-how
DEVIEW 2014
1. Using a better hardware including a faster CPU/GPU & more memory
2. Parallel programming to take advantages from multi-core CPU
3. Utilizing a GPU through OpenGL/ES to improve rendering performance.
4. Just turning off the screen and going outside to play…?
Possible approaches
7
DEVIEW 2014 8
But what if you can’t do them all?
DEVIEW 2014
• Old single core CPU
‣ ARMv6, 700MHz
• Very limited system memory
‣ 512MB shared with GPU
• Not redundant storage
• Bad OpenGL ES integration with windowing system.
9
Raspberry Pi is a good example for such a poor environment
DEVIEW 2014 10
All problems come from here
http://en.wikipedia.org/wiki/Raspberry_Pi#mediaviewer/File:Raspberrypi_pcb_overview_v04.svg
DEVIEW 2014 11
FWIW, Raspberry Pi needs a fast modern browser.
Extra information
Raspberry Pi already supports many browsers though…
12
Lynx …
DEVIEW 2014 13
Requirements
A modern & fast HTML5 browser
• Multi-Tab browsing
• HTML5 & CSS3
• HTML5 Video/Audio support (YouTube should run well)
• Responsive user interface
• Low memory footprint
DEVIEW 2014 14
DEVIEW 2014 15
Achievements
We’ve improved WebKit1 + Epiphany
• Progressive tiled rendering for smoother scrolling
• Avoid useless image format conversions
• Disk image cache
• Reduction of the number of memory copies to play videos
• Memory pressure handler support by using cgroup
• Better YouTube support including on-demand load of embedded YouTube videos for a much faster page load
DEVIEW 2014 16
Achievements
We’ve improved WebKit1 + Epiphany
• Faster fullscreen playback using dispmanx directly
• Hardware decoding of image & video through OMX
• Hardware scaling of video through gst-omx
• More responsive UI & scrolling even under heavy load
• Memory & CPU friendly tab management
• Startup is 3x faster
• Javascript JIT fixes for ARMv6
DEVIEW 2014 17
Technologies
DEVIEW 2014
• Scrolling doesn’t block even if the content is not available, instead we fill the area with a checkered pattern.
Progressive tiled rendering for smoother scrolling
18 http://ariya.ofilabs.com/2011/06/progressive-rendering-via-tiled-backing-store.html
GIF(16bit)
Videos(16bit)
PNG(16bit)
JPEG(16bit)JPEG(32bit) Videos(32bit)
PNG(32bit) GIF(32bit)
DEVIEW 2014
• Try to use internal buffers which use the same depth, 16 or 32 bits to prevent format conversions
‣ Raspberry Pi uses 16bit depth(RGB16_565) buffer as default.
‣ Basically images (JPEG, PNG, GIF) and video were decoded into 32 bits depth (ARGB32) buffers.
‣ By using same depth, we could use cairo image surface which can be painted quickly to the target.
Avoid useless image format conversions
19
GtkWidget (16bit)
TBS(16bit)
DEVIEW 2014
• We enhanced the disk image cache module of WebKit for the POSIX system.
• Decoded images are kept int memory mapped files as caches
• Saved CPU by avoiding multiple decodings
• Saved memory by using local disk space
• Not a magic wand : Big image over 20KB, Animated GIF
Disk image cache
20
Local disk space
Decoded image
Physical memory
DEVIEW 2014
• The video needs to be blotted on screen and that involves memory copies for no reason.
• If cairo surface of backingstore is a system memory then cairo creates an additional surface which wraps a shm pixmap and copies into this pixmap before copying into the final drawable.
‣ cairo_surface_create_similar
• When GdkWindow has already a cairo surface which wraps a X drawable, it is friendly to cairo image surfaces.
‣ Ensured that by calling gdk_cairo_create
‣ cairo_surface_create_similar_image
• When used correctly we can prevent cairo from calling XShmCreatePixmap at every copying the backingstore to the window.
• Available from gtk+3.10
Reduction of the number of memory copies to play video
21
DEVIEW 2014 22
GtkWidget
Cairo surfaces for TBS
Cairo surface for video
gst buffer
Video
SHM pixmap GtkWidget
Cairo image surfaces for TBS
Cairo image surface for video
gst buffer
Video
DEVIEW 2014
• Control groups(cgroups) is a Linux kernel feature to limit, account, and isolate resource usage (CPU, memory, disk I/O etc) of process groups.
‣ Merged into kernel version 2.6.24
‣ Resource limiting : groups can be set to not exceed a set memory limit
‣ Prioritization : some groups may get a larger share of CPU or disk I/O throughput
‣ Accounting : to measure how much resources certain systems use
‣ Control : freezing groups or checkpointing and restarting.
• We implemented memory pressure handler for POSIX systems in webkit by using cgroups.
• When the RPi system goes under pressure of memory, we free all unnecessary cache and memory and also run garbage collector to avoid OOM according to a pressure level.
• Not a magic wand : If the OOM is caused by other applications, not browser?
Memory pressure handler support through cgroups
23
DEVIEW 2014
• HTML5 video is required.
• YouTube has its own heavy UI
• Inject some simple javascript code which gets the URL for video stream and create a <video> for it.
• Get thumbnails through YouTube Data API, and get video with a similar way with the youtube-dl
• This allow us to block some extra JS on YouTube that was using a lot of CPU
• Block the comment section on YouTube since it took 30 seconds to fully load.
• Embedded YouTube video took too much time to load as well.
• We just load a fake placeholder showing the thumbnail and a fake play button.
• When a user clicks on it, the real video is actually loaded. This made loading pages with a lot of videos much much faster.
Better YouTube support
24
DEVIEW 2014 25
DEVIEW 2014 26
DEVIEW 2014 27
Mouse events are swallowed by this element because of StackingContext!
DEVIEW 2014 28
<body> <div id=“div1” style=“z-index:5”></div> <div id=“div2” style=“z-index:2”></div> <div id=“div3” style=“z-index:4”> <div id=“div4” style=“z-index:6”></div> <div id=“div5” style=“z-index:1”></div> <div id=“div6” style=“z-index:3”></div> </div> </body>
Stacking Context
https://developer.mozilla.org/en-US/docs/Web/Guide/CSS/Understanding_z_index/The_stacking_context
DEVIEW 2014 29
<div class=“html5-video-container” style=“z-index:900”>
<video style=“z-index:auto”>
<div class=“html5-video-controls” style=“z-index:940”>
<div class=“html5-video-player”>
<div class=“html5-video-info-panel” style=“z-index:960”>
ShadowRoot (Container node)
MediaControls (HTMLDivElement)
DEVIEW 2014 30
<video width=“xxx” height=“yyy” src=“A video URL extracted from youtube” controls />
DEVIEW 2014 31
1.Show a thumbnail and a fake play button 2.On click, inject the video wrapper 3.and then actual video is loaded. !Pretty useful for heavy pages embedding many YouTube videos.
var posterData = download_webpage( 'http://gdata.youtube.com/feeds/api/videos/' + this.videoId + '?v=2&alt=json');
url ='http://www.youtube.com/watch?v=' + video_id + '%s&gl=US&hl=en&has_verified=1'; video_webpage = download_webpage(url);
DEVIEW 2014
• Fullscreen mode is a very independent feature.
‣ It just shows video and controls.
‣ Need to do nothing except copying decoded video frame and drawing controls if necessary.
‣ Do not need to update backingstore at all under fullscreen mode.
• Dispmanx
‣ A subset of VideoCore library
‣ A windowing system in the process of being deprecated in favor of OpenWF
‣ Provide useful APIs like creating comprehensible layers to GPU, scaling/moving the layers etc.
• We directly wrote a video raw data into a dispmanx plane and scaled it to fit in with a screen through GPU.
• Not updating backingstore and scaling video through GPU allow us to save CPU very much.
• A fake cursor required since the bad integration of a GPU plane into the windowing system.
Faster fullscreen playback using dispmanx directly
32
DEVIEW 2014 33
DEVIEW 2014 34
TBS
Video
Controls
Cairo surface in GtkWidget. Absolutely hidden by Video plane. So we don’t need to update at all.
Dispmanx plane 1 Filled with a video draw data. Scaling is performed by GPU
Dispmanx plane 2, 3 Filled with a controls images
Controls
Cursor Dispmanx plane 4 A fake cursor image
DEVIEW 2014
• Raspberry Pi supports OpenMAX (shortened as “OMX”)
• OpenMAX
‣ A set of C-language programming interfaces that provides abstractions for routines especially useful for audio, video, and still images processing.
‣ Provide 3 layers of interfaces: AL(application layer), IL(integration layer) and DL(development layer)
• Especially OpenMAX DL is useful to decode image and video.
‣ AC : Audio Codecs (MP3 decoder & AAC decoder components) - Can’t because of licensing issue!
‣ IC : Image codecs (JPEG components) ‣ IP : Image processing (Generic image processing functions)
‣ SP : Signal Processing (Generic audio processing functions)
‣ VC : Video Codecs (H.264 & MP4 components)
• JPEG is decoded with OMX in WebKit
• Gst-omx is used to decode video with OMX in gstreamer.
‣ http://cgit.freedesktop.org/gstreamer/gst-omx
Hardware decoding of image & video through OpenMAX
35
DEVIEW 2014
• Often the video in web is not displayed at its natural size. It needs to be scaled.
• We enhanced gst-omx to scale the video through OMX as well.
Hardware scaling of video through gst-omx
36
<video width=“760” height=“340” controls>
DEVIEW 2014
• Progressive tiled backing store.
‣ Progressive tile base rendering on scroll as like mobile browsers do
‣ We can reduce an absolute amount of drawing with TBS so UI event could have more chances to be handled.
• Suspend javascript and animation while scrolling
‣ WebKit1 is single threaded for JS and rendering single process so that we could not get the scroll events while JS is running.
‣ But this is not perfect yet since we could not stop running javascript functions
• Tune priorities among events
‣ Make sure the handling of the UI event is higher priority than other things.
‣ Tweaking event priority should be conducted very carefully. It’s quite conditional.
‣ ex) Wiggling a mouse may make drawing events fall into a starvation.
More responsive UI and scrolling even under heavy load
37
DEVIEW 2014
• Unload tabs if too many(more than 3) are in use.
• Slow down javascript on background tabs.
Memory & CPU friendly tab management
38
DEVIEW 2014
• Optimized Adblock
‣ Adblock is built in Epiphany. It’s loaded automatically when startup.
‣ Use regular expressions only when needed.
‣ Reuse parsed regular expressions instead of recreating the same one every time.
‣ Asynchronously load filters for Adblock.
‣ Avoid running the converter tool used to convert epiphany config files from one version to another if not needed.
Start up is 3x faster
39
DEVIEW 2014
• Backported latest JIT related changes into our working WebKit.
• Bug fix for ARMv6
Javascript JIT fixes for ARMv6
40
DEVIEW 2014 41
Lessons
DEVIEW 2014
• Measuring cpu, memory and time will show you a way to go.
• Profiling quite depends on developer’s experience.
• Do not hesitate to share your know-how with your colleagues.
• Do not be afraid of learning new tools.
• Ex) perf tool is very useful on linux. ‣ Install relevant debug packages
‣ sudo apt-get install linux-tools ‣ sudo perf record -a -g -o perf.data ‣ sudo perf report -g -i perf.data
Lesson 1. Profiling, Profiling & Profiling
42
DEVIEW 2014 43
Debug package missed
DEVIEW 2014
ARMv6 is not a popular AP nowadays. Nobody cares. BUT…
• You’re not only guy concerning the problem!
• JIT compiler enabled on ARMv6
• Optimized pixman and libav for ARMv6
Lesson 2. Keep watching upstream
44
DEVIEW 2014
• Direct painting, not to use a timer based drawing mechanism.
• Disk image cache
• Reduction of the number of memory copies to play video
• Unique feature, fullscreen mode
• Avoid useless image format conversions
Lesson 3. Suspect useless, stupid and repeated things
45
DEVIEW 2014
Lesson 4. Just In Time
46
• Progressive tiled backing store.
• Suspend javascript and animations if necessary.
• Optimized Adblock
DEVIEW 2014
• Used mobile version pages for some sites.
• Better YouTube support by injecting custom video tag wrapper.
• Faster fullscreen video.
Lesson 5. Hackish but feasible then O.K
47
DEVIEW 2014
• Disk image cache
• Trade-off between memory and local disk space.
• OMX(OpenMAX) for decoding video and images
• Decode video through GPU, not CPU
• OMX for scaling video and images
• Scales videos through GPU, not CPU.
Lesson 6. Utilize all available resources in the platform
48
DEVIEW 2014
• Throttle video fps up to 30fps.
• Tune priorities among events
• Memory pressure handler by using cgroup
• Unload tabs if too many are in use.
• Slow down javascript on background tabs.
Lesson 7. Careful resource reallocation
49
DEVIEW 2014 50
Conclusion
• Optimization is literally finding the best solutions to fit your purpose or platform.
• It depends on your situation so it could be various ways
• SW engineer should not expect a better hardware to do anything instead of you.
• No magic, No universal solution for optimization
• Imagine your own way, don’t be afraid of trying your idea.
DEVIEW 2014 51
Thank you