+ All Categories
Home > Documents > c Consult author(s) regarding copyright matters - Ecosounds-v21...no previous technology has allowed...

c Consult author(s) regarding copyright matters - Ecosounds-v21...no previous technology has allowed...

Date post: 01-Feb-2021
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
11
This may be the author’s version of a work that was submitted/accepted for publication in the following source: Truskinger, Anthony, Brereton, Margot,& Roe, Paul (2018) Visualizing five decades of environmental acoustic data. In Mendrik, A, van Werkhoven, B, & van Nieuwpoort, R (Eds.) Proceedings of the 14th International IEEE eScience Conference. IEEE, United States of America, pp. 501-510. This file was downloaded from: https://eprints.qut.edu.au/122730/ c Consult author(s) regarding copyright matters This work is covered by copyright. Unless the document is being made available under a Creative Commons Licence, you must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the docu- ment is available under a Creative Commons License (or other specified license) then refer to the Licence for details of permitted re-use. It is a condition of access that users recog- nise and abide by the legal requirements associated with these rights. If you believe that this work infringes copyright please provide details by email to [email protected] License: Creative Commons: Attribution-Noncommercial 2.5 Notice: Please note that this document may not be the Version of Record (i.e. published version) of the work. Author manuscript versions (as Sub- mitted for peer review or as Accepted for publication after peer review) can be identified by an absence of publisher branding and/or typeset appear- ance. If there is any doubt, please refer to the published source. https://doi.org/10.1109/eScience.2018.00140
Transcript
  • This may be the author’s version of a work that was submitted/acceptedfor publication in the following source:

    Truskinger, Anthony, Brereton, Margot, & Roe, Paul(2018)Visualizing five decades of environmental acoustic data.In Mendrik, A, van Werkhoven, B, & van Nieuwpoort, R (Eds.) Proceedingsof the 14th International IEEE eScience Conference.IEEE, United States of America, pp. 501-510.

    This file was downloaded from: https://eprints.qut.edu.au/122730/

    c© Consult author(s) regarding copyright matters

    This work is covered by copyright. Unless the document is being made available under aCreative Commons Licence, you must assume that re-use is limited to personal use andthat permission from the copyright owner must be obtained for all other uses. If the docu-ment is available under a Creative Commons License (or other specified license) then referto the Licence for details of permitted re-use. It is a condition of access that users recog-nise and abide by the legal requirements associated with these rights. If you believe thatthis work infringes copyright please provide details by email to [email protected]

    License: Creative Commons: Attribution-Noncommercial 2.5

    Notice: Please note that this document may not be the Version of Record(i.e. published version) of the work. Author manuscript versions (as Sub-mitted for peer review or as Accepted for publication after peer review) canbe identified by an absence of publisher branding and/or typeset appear-ance. If there is any doubt, please refer to the published source.

    https://doi.org/10.1109/eScience.2018.00140

    https://eprints.qut.edu.au/view/person/Truskinger,_Anthony.htmlhttps://eprints.qut.edu.au/view/person/Brereton,_Margot.htmlhttps://eprints.qut.edu.au/view/person/Roe,_Paul.htmlhttps://eprints.qut.edu.au/122730/https://doi.org/10.1109/eScience.2018.00140

  • Visualizing Five Decades of

    Environmental Acoustic Data

    Anthony Truskinger, Margot Brereton, Paul Roe

    QUT Ecoacoustics Research Group

    Science and Engineering Faculty, QUT

    Brisbane, Australia

    [email protected]

    Abstract—Monitoring the environment with acoustic sensors is

    now practical; sensors are sold as commercial devices, storage is

    cheap, and the field of ecoacoustics is recognized as an effective

    way to scale monitoring of the environment. However, a pressing

    challenge faced in many eScience projects is how to manage,

    analyze, and visualize very large data so that scientists can benefit,

    with ecoacoustic data presenting its own particular challenges.

    This paper presents a new zoomable interactive visualization

    interface for the exploration of environmental audio data. The

    interface is a new tool in the Acoustic Workbench, an ecoacoustics

    software platform built for managing environmental audio data.

    This Google Maps like interface for audio data, enables zooming in

    and out of audio data by incorporating specialized, multi-

    resolution, visual representations of audio data into the

    workbench website. The ‘zooming’ visualization allows scientists

    to surface the structure, detail, and patterns in content that would

    otherwise be opaque to them, from scales of seconds through to

    weeks of data. The Ecosounds instance of the Acoustic Workbench

    contains 52 years (108 TB) of audio data, from 1016 locations,

    which results in a 180 million-tile, 8.3 terapixel visualization.

    The design and implementation of this novel big audio data

    visualization is presented along with some design considerations

    for storing visualization tiles.

    Keywords—Ecoacoustics; Visualization; Research Software

    Engineering; Application File Formats

    I. INTRODUCTION

    Ecoacoustics is a new branch of ecology and data science that investigates the natural and anthropogenic sounds of our environment [1]. Traditional field surveys of the environment are important but they do not scale; sending scientists to the field is expensive and captures only limited temporal information [2]. Ecoacoustics addresses this by monitoring a field site with passive acoustic recorders. These sensors capture a direct, permanent, and objective record of the acoustic environment in which they are deployed. With solar panels and large memory cards, these sensors can be configured to record continuously and can thus massively scale data collection both spatially and temporally.

    The ecoacoustics field includes research on terrestrial, freshwater, and marine ecosystems and is commonly seen as a landscape ecology inspired extension of bioacoustics, focusing on soundscapes—the totality of sounds as a whole rather than the behavior of individual vocalizing fauna [1, 3]. These soundscapes exhibit unique signatures defined by their

    1 https://www.ecosounds.org

    environments: the biophony, geophony, and anthropophony contained within all contribute [4].

    Collecting and storing acoustic data for these soundscapes is a solved problem [4, 5], particularly given that acoustic recorders are now easily purchasable commercial products, with the most recent new AudioMoth sensors being cheap, small, energy efficient, and open source [6]. The result of such commodity hardware is increased data collection which, in turn, necessitates a continuing investment in tools that manage, analyze, and visualize the massive amounts of data collected [7]. Given the scale of data collected, most will never be heard by a human.

    Although large scale audio data are less voluminous than other data formats (video, radio telescope) and are not as complex (bioinformatics), they pose some unique challenges. Audio data often have a high velocity (due to a multitude of sensors and high sample rates) and are generally opaque and unstructured. Audio data are typically difficult to conceptualize or manipulate at larger scales as they are intended to be listened to at natural (real-time) playback rates [8, 9].

    To manage the data generated by these sensors, we have developed and host web-based software that enables traditional scientists to manage, share, manipulate, listen to, analyze, and visualize their audio data. This Acoustic Workbench software [5, 10] powers Ecosounds1—a repository of large-scale, passively-recorded environmental audio data. Ecosounds hosts datasets collected by collaborating scientists, that include continuous in-depth studies (e.g. 2 years continuous for 2 sites) and short-term large-breadth studies (e.g. 36 hours continuous across 100 sites).

    It is not enough to collect data; data must support development of insight. Thus, it’s critical that scientists can access, navigate, and interrogate their data. Unlike geographic maps, there is no established method for representing environmental sound at different scales. For geographic data there exists advanced transforms that are applied to ensure features are scaled effectively. Audio data does not have equivalents; merely scaling spectrograms or waveforms does not yield useful visualizations. To address this gap, this paper presents a novel audio visualization tool that allows scientists to inspect large-scale audio collections though a zooming false-color index spectrogram interface. These false-color index spectrograms are multi-resolution composite images that surface the typically opaque structure of audio data at scale. This visualization has been incorporated into the Acoustic Workbench so that all hosted datasets on Ecosounds—and thus the scientists that own the data—may benefit. To our knowledge,

    mailto:[email protected]://www.ecosounds.org/

  • no previous technology has allowed for the navigation and exploration of ecoacoustic data at these scales. Additionally, we present a generalizable exploration and efficacy comparison of different application file formats for the storage and retrieval of visualization image tiles. Lastly, related work will be discussed and contrasted to the work introduced in this paper.

    II. ECOSOUNDS AND THE ACOUSTIC WORKBENCH

    The workbench is a web application that allows scientists to view, listen to, and annotate large volumes of environmental audio data2 [2, 5, 10]. The workbench is designed to scale ecological monitoring by providing scientists with tools to help them understand, share, and analyze their data. These tools target multiple temporal scales as biological and ecological patterns manifest at different scales; relevant patterns can be found in vocalization syllables, in hour-long choruses of vocalizations, and through to the diurnal patterns of a soundscape.

    Ecosounds, which runs an instance of the workbench, currently hosts over 108 TB of audio data3 (and another 30 TB of supplementary data), owned by 60 scientists, collected from sites across Australia (including Groote Eylandt, St Bees, Brisbane, Sturt National Park, Tasmania, and Cape Otway). Ecosounds also has small collections from the USA, Bhutan, Indonesia, and Papua New Guinea. The data span the equivalent of 52 years of continuous recording (≈ 452 thousand hours) from 1016 geographical locations, making it one of the largest repositories of environmental acoustic sensor data.

    A. Architecture

    The Acoustic Workbench is an open source, horizontally scalable, cloud and vendor agnostic web application [5]. Ecosounds is currently run on an OpenStack cloud run by the National eResearch Collaboration Tools and Resources (Nectar) research cloud and the Queensland Cyber Infrastructure Foundation (QCIF). QCIF provides access to cloud storage and cloud compute resources for data-driven collaborative research.

    The workbench has three main components: the web server, the client application, and the workers. The web server is a Ruby on Rails application written using the Ruby programming language and exposes the REST API that provides access to all metadata and data needed to manipulate the website. The client-side interface is an AngularJS single page web application that uses the REST API to render a dynamic and interactive user interface. The workers use the Resque job system to process data in a lightweight, distributed fashion [5, 11].

    2 Available from GitHub https://github.com/QutEcoacoustics/

    B. Major Features

    The Acoustic Workbench organizes recordings in a simple structure of projects and sites. Sites are the containers for audio data and have a geographical location. Projects are logical collections of sites (see Fig. 1 for a screenshot). Ecosounds has 138 projects, created by 48 users, which contain a total of 1477 sites.

    The workbench features a random-access playback interface for audio data—audio is cut, mixed, and generated on demand. When listening to the audio data, a spectrogram is shown to a user (Fig. 1). Spectrograms are a common visual representation of audio data that show both time and frequency information. They are derived from a Fast Fourier Transform (FFT) and show time along the x-axis, frequency along the y-axis, and intensity though the color or shading. Spectrograms shown on the Acoustic Workbench represent 30 seconds of data and are typically shown at a scale of 0.023 s/px—that is each horizontal pixel represents 23 milliseconds of audio. Spectrograms can be annotated by users to classify faunal vocalizations of interest (Fig 1).

    While previously users could listen to their data, there was no capability that allowed them to conceptualize their large collections of audio. This research introduces this capability for the Acoustic Workbench: a method to understand, navigate, and visualize collected audio data at multiple temporal scales.

    III. VISUALIZATION OF ENVIRONMENTAL AUDIO

    Acoustic data collected from multiple sources over extended periods of time can be fragmented both spatially and temporally. This may be due to explicit experimental design, practical considerations, or equipment failure. Regardless, navigating through large amounts of data and understanding the distribution and extent of acoustic data within a dataset can be a complex and unwieldy task. Visualization tools are therefore important for efficiently identifying where data exist, in terms of both spatial location, and temporal extent. Our visualization tool enables a user to obtain a spatial and temporal overview of data while they embark on multiscale exploration and interpretation of their audio data.

    This section defines required terminology for classifying visualizations. Then prior work on visualizing environmental audio data is summarized, beginning with Index Spectrograms (our raster representations of compressed environmental audio data), followed by Variable Resolution Index Spectrograms (a prototype of temporally variable index spectrograms). The

    3 September 2018 https://www.ecosounds.org/website_status

    Fig. 1. Screenshots of the Acoustic Workbench. Left: a project page, showing a collection of sites, at different locations. Scientists can make and manage their own projects. Right: the listen page features audio playback, spectrogram generation, and annotation capabilities for identifying acoustic events

    https://github.com/QutEcoacoustics/https://www.ecosounds.org/website_status

  • following section (section IV) details our contribution: the novel adaption and scaling of previous work into the Acoustic Workbench.

    Types of Visualization

    The visualizations implemented in the Acoustic Workbench show data in two formats: diagrammatic block-based charts that show metadata and raster-based images that show the structure and content of the data.

    Diagrammatic representations can show the extent, size, and pattern of collected audio, without needing to show the actual content of the audio data. Diagrammatic representations are easy to render as only audio metadata are needed to create them. The components used are shapes, like colored rectangles placed on an axis. The type of diagrammatic chart used for this research is known as a timeline chart [12] and is depicted in Fig. 2.

    Raster representations (also known as: dot matrix or bitmap), encode unique data in each pixel, just like a photo, and thus can be informationally dense. For ecoacoustic data, we use specialized raster images, which require a series of advanced transformation steps, to show structure, content, and distribution of acoustic energy within the recordings. Unlike diagrammatic

    representations, these raster representations require processing of audio data and thus are computationally expensive to generate. The raster representations are complex enough to require ahead of time generation and cached storage of results. Both images shown in Fig. 3 are raster images.

    A. Index Spectrograms

    The traditional method for visualization of audio data is a spectrogram: a raster image encoded with the information produced by a FFT. Given typical settings (sample rate: 22 050 Hz, window size: 512), 30 seconds of audio is rendered as a 1292 px wide image (0.023 s/px)—a suitable size for most digital screens. However, standard spectrograms do not scale for large acoustic datasets. A standard spectrogram, rendered at standard scale, for a day’s worth of audio is 3.8 million pixels wide (or 1.3 km wide at 72 DPI). Compressing that standard spectrogram to fit into one screen produces a low-quality image where most detail of the underlying data is lost through aggregation (Fig. 3 bottom). To address this problem, Towsey, et al. [14] created a visualization of a day’s worth of environmental audio compressed to a scale appropriate for digital screens. Their innovation was the creation of long-duration false-color multi-index spectrograms (henceforth index spectrograms). See Fig. 3 for a visual comparison of an index spectrogram and a standard spectrogram.

    These index spectrograms use a triplet of summary statistics, known as acoustic indices, to show different aspects of the distribution of acoustic energy for a defined period. At a fixed scale of 60.0 s/px, an entire days-worth of information is shown as a 1440 px wide image. Using three indices allows the image to highlight interesting aspects of the distribution of acoustic energy. The indices used include the Acoustic Complexity Index (ACI) [15] which measures the relative change in amplitude between frames, Temporal Entropy (ENT) which measures the concentration of energy [16], and Events (EVN) which measures the portion of signal that is above an automatically calculated background noise threshold [8]. The index spectrograms are described as false-color images because each matrix of index values is assigned to a color channel, either red, green, or blue, creating a composite true-color raster image.

    Fig. 2. A diagrammatic timeline chart showing the temporal extent of audio recordings (x-axis) across three sites (y-axis).

    Fig. 3. A 24-hour false-color index spectrogram (top) directly compared to a standard FFT spectrogram (bottom) at the same temporal scale (60.0 s/px). The y-axis shows frequency up to 9 kHz. The audio data shown were collected on the 13th of October 2010 from the Samford Ecological Reserve, located north-west of

    Brisbane, Queensland, Australia. Images reproduced with permission [8, 13].

  • Towsey et al. [3, 8] and Dema et al. [17, 18] have shown that index spectrograms are valuable constructs. With little training users can identify diurnal patterns, dawn (and the morning chorus), dusk, and periods of rain or wind based on the shape and color of the data. With some training, users can identify sources of biophony, geophony, and anthropophony. Many users learn to identify colors and patterns that signify interesting data, including patterns that uniquely identify vocalizing species. Index spectrograms succeed in making 24 hours of typically-opaque audio data understandable and interpretable by scientists.

    B. Variable Resolution Index Spectrograms

    However, there are three orders of magnitude difference in scale (2 600×) between a 60.0 s/px index spectrogram and a standard spectrogram with a 0.23 s/px scale. While the index spectrograms succeed in their stated goal of visualizing 24-hours of audio data, they require some training to learn how to interpret their content. Anecdotally, the primary training method for new users is association; association works when a user navigates between an identifiable pattern in the index spectrogram to viewing a detailed version of that pattern in a standard spectrogram. Once viewing a standard spectrogram, both visual and auditory (playback) inspection of the data are used to understand what acoustic events are forming the patterns identified in the index spectrograms. The associations are spatiotemporally invariant and applicable to different datasets, thus allowing users to scale their interpretation of a soundscape. However, feedback from users [17] indicated that the 2 600× scale jump was prohibitive to forming associations. While species with repetitive calls or distinctive frequencies can be seen together with weather and diurnal patterns, many other species are very difficult to see in the “day scale” 60.0 s/px index spectrogram.

    Towsey, et al. [8] determined that changing the calculation period for the acoustic indices used in the visualization allowed for scale-variant index spectrograms with varying temporal resolutions to be created. Towsey derived a series of temporal resolutions that would minimize the confusion experienced by users as they jumped through each zoom level. In these variable resolution index spectrograms, the time axis changes but the frequency axis remains fixed. A prototype interface using these variable resolution index spectrograms and an idealized single 24-hour audio recording was presented4 by Towsey, et al. [8].

    IV. BUILDING ZOOMABLE INDEX SPECTROGRAMS

    The variable resolution index spectrograms produced by Towsey et al. are not inherently zoomable but rather are only a collection of variable scale raster images (i.e. tiles). More work is required to use these images in an interactive, truly zoomable, production interface.

    Our contribution is a novel interface that incorporates variable-resolution index spectrograms into the Acoustic Workbench. The layout for this design is shown in Fig. 4. The interface is a heavily modified timeline chart. It features an interactive surface that can be panned and zoomed, via touch or mouse interaction, along the temporal x-axis. The x-axis has variable scales and extents while the y-axis is used to show frequency information and has a fixed scale.

    4 http://research.ecosounds.org/publications/supplementary/Zooming/zoomingDemo.html

    The zooming index spectrogram interface can be described as a “one-dimensional Google Maps for audio”. Just like Google Maps, a user can zoom in and out to explore audio data just as one would a geographic map, except that only the x-axis changes in resolution. This analogy frames the design of the interface and is also a useful analogy for new users.

    In its initial state, the interface renders at the scale needed to show the entire selected dataset in one screen—typically all data in a project or site. The interface primarily shows the distribution of audio data via diagrammatic blocks on the timeline chart. This rendering method is a high-speed fallback that shows the

    Fig. 4. The audio visualization interface showing data for three locations. Both diagrammatic and raster visualizations are used to show data. The currently

    selected location (the NE site) shows a variable resolution index spectrogram when the images have been generated. When raster images are not available, or a location

    is not selected, the visualization reverts to a diagrammatic representation. Each

    image, from top to bottom, is the same visualization of 2010-10-13 05:15:30 shown at different zoom steps. The continuous zoom shows interpolated raster tiles from

    scales of 60.0, 6.0, and 0.2 s/px at actual resolutions 169.33, 6.08, and 0.22 s/px,

    respectively.

    http://research.ecosounds.org/publications/supplementary/Zooming/zoomingDemo.html

  • temporal distribution, extent, and location of data when the raster index spectrograms are not available. This is analogous to Google Maps showing street data (diagrammatic) rather than satellite data (raster).

    As the user explores and zooms in, they will zoom in past a resolution threshold (the equivalent of 5% of the scale of the most zoomed-out index spectrogram): once past this threshold the rendering of the variable resolution index spectrograms is enabled. The index spectrograms, where available, are overlaid on the diagrammatic representation of the currently selected data. The index spectrograms are shown only after the threshold because their form of visualization is not an appropriate way to summarize more than a week of audio data at a time. Due to the diurnal patterns of soundscapes, other visualization methods are better choices for showing month and year-long trends [14, 19].

    The index spectrogram tiles are generated in a series of steps, where each step has a different resolution. While the image tiles themselves always have a fixed resolution, the zooming interface allows for smooth and continuous zooming and does not have fixed resolution steps. The interface chooses the set of tiles with the resolution below the current dynamic resolution and compresses the raster images to the correct scale using standard interpolation effects—thus creating a continuous zooming effect.

    A. Technical implementation

    The visualization interface is rendered using D3.js [20], a JavaScript library that facilitates the creation of data-driven scalar vector graphics (SVG) documents. Each of the controls in the visualization interface is a dynamically rendered, interactive, in-browser SVG document. Since SVG is a vector-based technology it is an ideal choice for rendering diagrammatic representations. Conveniently, SVG also supports embedding other image types, including raster images, which makes it a great choice for implementing mixed-mode visualizations.

    B. Tile sizes and resolutions

    The size and resolutions (number of zoom levels) of the index tiles reflect a balance of competing concerns. The resolutions chosen for this work are summarized in Tables I and II.

    The number of tiles required follows a power law, doubling for every halving in resolution. Towsey, et al. [8] measured changes in scale using Fitt’s law [21]: 𝑆𝑡𝑒𝑝 = log2(A/B). With that equation, doubling in resolution will create a scale step, measured in bits, of 1.0, which is defined as ideal for scale transitions. Since Towsey’s original work we have adjusted the resolutions chosen so that the scale step is consistently set to 1.0, with the notable exception of the transition from 7.5 s/px to 3.2 s/px; the scale was slightly modified so that the tile count remained an easily factorable number. We also differ from Towsey’s method by: not rendering the most zoomed in step (the highest resolution) which reduces the complexity and size of the computation; and by rendering two extra low-resolution layers (120.0 s/px and 240 s/px) which allows for larger week-long blocks of audio to be visualized. The chosen resolutions result in 441 megapixels of data generated for each day of audio data. For the Ecosounds website approximately 8.3 terapixels of raster images will be generated in 1.8×108 tiles to visualize all 52 years of their audio data.

    5 September 2018. Retrieved from https://www.ecosounds.org/website_status

    The tile height of the index spectrograms is fixed to 256 px and is directly determined by the FFT window size used in index generation. However, an appropriate width is not naturally defined. Choosing fixed size tiles that have variable resolution simplifies the generation and rendering of the tiles. We chose a tile width that was a multiple of 60 so that the tiles would align optimally in the temporal axis with natural times of the day. We considered tile widths of 180 px and 360 px. The smaller tile size allows tiles to be streamed in smaller chunks, which in turn results in quicker incremental updates when rendered in the browser. Consequently, the smaller size doubles the number of tiles that need to be stored and increases network overhead, as each tile has a header. Ultimately, we chose to prioritize incremental rendering, and proceeded with a tile width of 180 px. Thus far our choice performs within our requirements, typically rendering in under a second. However, in the case of future problems, the tile size can be changed.

    As a minor note, for streaming tiles efficiently to our interface, we ensure that the webserver has HTTP/2.0 enabled. The persistent connection feature [22], where a single TCP/IP session is maintained, provides far better performance when loading many small chunks of data6.

    C. Computational complexity

    Given the magnitude of audio data that needs to be processed, focus was placed on keeping overall system complexity low. Thus, the system is designed to have no post-processing or aggregation steps after generating the tiles for each audio recording. Aggregation is often required to create a final composite of the data, however, avoiding an aggregation step means missing, imperfect, or even late data does not need to be considered which vastly simplifies back end architecture. Avoiding an aggregation step leads to some ideal properties:

    • Easy parallelism: our typical source audio files are two or six-hour recordings which are conveniently sized parallel workloads

    • Isolated parallelism: as audio recordings are added to the system, their index tiles can be generated asynchronously, independent of any other files already

    6 See https://http2.akamai.com/demo, or http://http2.golang.org/gophertiles

    TABLE I. TILE RESOLUTIONS FOR 24 HOURS OF AUDIO

    All tiles at all resolutions are 180px wide and 256px high

    Scale

    (s/px)

    Scale Step

    (Bits)

    Tile

    Duration

    Number of tiles (in 24 hours)

    Megapixels (in 24 hours)

    0.1 #N/A 00:00:18 4800.0 221.18

    0.2 1.00 00:00:36 2400.0 110.59

    0.4 1.00 00:01:12 1200.0 55.30

    0.8 1.00 00:02:24 600.0 27.65

    1.6 1.00 00:04:48 300.0 13.82

    3.2 1.00 00:09:36 150.0 6.91

    7.5 *1.23 00:22:30 64.0 2.95

    15.0 1.00 00:45:00 32.0 1.47

    30.0 1.00 01:30:00 16.0 0.74

    60.0 1.00 03:00:00 8.0 0.37

    120.0 1.00 06:00:00 4.0 0.18

    240.0 1.00 12:00:00 2.0 0.09

    TABLE II. TOTALS NUMBER OF TILES AND PIXELS

    Number of tiles Megapixels

    1 day 9576.0 441.26

    All Ecosounds audio (18 828 days5) 180 296 130.0 8 308 045.67

    https://www.ecosounds.org/website_statushttps://http2.akamai.com/demohttp://http2.golang.org/gophertiles

  • added. Error cases (bad code or bad files) only affect one audio file at a time

    • Immediacy: as soon as tiles are generated, the interface can show the tiles – there’s no need to wait for other processes to complete

    • Simpler computation: the tile generator does not need to orchestrate multiple files as only one source file is needed for computation

    D. Coordinate systems

    Despite the usefulness of the one-dimensional Google Maps analogy, there are distinct differences between time-series audio data and geographical data:

    • Geographical data uses polar coordinates to identify tiles whereas our tiles exist within a temporal coordinate system

    • The geographical coordinate system has an absolute limit (±180°/±90°) whereas the range of our audio data can continue indefinitely. We must cater for indefinite growth in the time dimension as more data is collected

    • Zoomable geographical data requires three coordinates to identify a tile (latitude, longitude, and zoom) whereas our data only requires two (datetime and zoom)

    These differences, coupled with our aforementioned computation method, result in the use of the following tuple to identify a tile:

    𝑡𝑖𝑙𝑒𝑖𝑑 = (𝑟𝑒𝑐𝑜𝑟𝑑𝑖𝑛𝑔𝑖𝑑 , 𝑡𝑖𝑙𝑒𝑑𝑎𝑡𝑒𝑡𝑖𝑚𝑒 , 𝑡𝑖𝑙𝑒𝑟𝑒𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛)

    Practically, tiles are stored as files, where their names take the form of:

    4c77b524-1857-4550-afaa-c0ebe5e3960a_20101012-140000Z__BLENDED.Tile_20101012-140000Z_0.125.png

    where the first (red) portion is the recording id, the second (blue) portion is a tag describing the image format, the third (green) portion is the datetime locating the start of the tile, and the fourth (purple) is the resolution of the tile (in seconds per pixel). It should be noted that the recording id is a UUID that matches with a full metadata database record for each audio recording; only the metadata needed to disambiguate tiles are stored in the filename.

    E. Imperfect data

    Dealing with imperfect data is a core requirement for the visualization system. Frequently data are corrupt, missing, or, most commonly, simply not aligned neatly to tile boundaries. Imperfect data is dealt with by allowing multiple tiles from the same temporal index to be shown in the visualization via layered transparency. Fig. 5 shows that for each audio recording within a viewport (the view of currently visible data) a virtual layer of tiles exists. Audio recordings that do not align within the coordinate system are padded with transparency—blank sections of the image—so that their tiles always align. Because the transparency and content align through the layers the result appears to be single composite image. Transparency padding can occur either before the data, after the data, or in the case of short recordings, on both sides of the data. The layered transparency occurs at each zoom step.

    This layered transparency model is a product of the same reasoning used to choose the coordinate system: to avoid system

    complexity. Since the visualization interface aligns and composes the tiles in the browser, a final computational aggregation step is not needed during generation, thus catering for missing, misaligned, and variable duration recordings.

    V. EXPLORING TILE STORAGE TECHNIQUES

    The calculations for generating a set of zoomable index spectrograms are complex and time consuming. Eager generation of the tiles is less complex and better performing than on-demand generation. Initial implementations of the zooming interface simply generated tile images that were stored on disk. This worked well for prototypes, however, storing two hundred million (see Table II) small files (mean size 68 kB) on disk is not feasible. The method in which tiles are generated and stored has significant effects on system performance. This section introduces and discusses the requirements for storing ecoacoustic tile data.

    Backup systems that rely on iterating files take longer to run if they must scan more files. Similar problems occur for NFS mounts: files must be partitioned to avoid poor performing directory lookups, NFS client caches experience more churn, and NFS servers experience higher load. Additionally, storing tiles as files is a waste of resources: tiles don’t need inodes (standard filesystem metadata) as all the tiles have the same permissions and their individual metadata, such as date stamps, are irrelevant. Further, big data storage systems are optimized for large files and have large block sizes; small files often do not fill a block, and this results in wasted space. This wasted space varies based on the filesystem used. Our anecdotal

    Fig. 5. Zooming interface tile diagram from two views; the top image represents a standard 2D render (with added guidelines) and the bottom image is a conceptual perspective of the SVG elements rendered. The diagram shows

    how tiles are organized while rendering a resolution (in this case 60.0 s/px).

    Three audio recordings (blue) are shown that are visualized in a viewport (white). There are five index spectrogram tiles shown in three layers – one

    layer per recording. Tile boundaries are shown in green. Four tiles do not need

    to be rendered (pink cross) and are omitted. Tiles that show partial data are

    padded with transparency (blank sections of tile).

  • measurements, based on the filesystems we regularly use, indicate an average of 3.6% wasted space for one of our tile sets.

    Storing scientific data is hardly a new problem: more established sciences commonly use HDF5 [23], NetCDF [24], custom file formats, or even relational database management systems (RDBMS) [25] to efficiently store and retrieve domain specific data. Nonetheless, ecoacoustics is a new field and there are no standards for storing data. Consequently, for the Acoustic Workbench, we needed to determine which method was ideal for storing our domain-specific tile sets.

    Continuing with the focus of keeping overall system complexity low, we defined the core requirements as:

    • Cloud and vendor agnosticism—a portable solution is required

    • Inherently cross platform

    • Easy to distribute—so batches of tiles for each input audio recording can be shared for offline use or analysis

    • No monolithic services or daemons

    We considered solutions like HDFS, GPFS, Ceph, and filesystems tuned for small files. Those solutions require new infrastructure and ultimately do not allow for effective sharing of data. Arguably, the solution to this problem is to use cloud-based object stores such as Swift (the OpenStack object store), Azure blob storage, or Amazon S3. A critical analysis of our requirements reveal that the distributed and high availability properties of these systems aren’t needed. Additionally, they still store individual files; it is not easy to share batches of files for offline use.

    Further, within the geographic information systems community, tile servers, software dedicated to storing and serving image tiles, are their own category of software. We briefly explored existing tile server solutions, like the OpenStreetMap server but due to the differences in coordinate systems (see Coordinate systems above) and the nature of our unconstrained data, most were not appropriate.

    There is also precedent for storing tiles in large RDBMS [25]. However, the typical access pattern for our data fades over time. There is no need for the complexity of a monolithic, always-on service solution for the entirety of the tile set.

    Ultimately, we chose to use an application file format as a container for the tiles. An application file format is a file format that is “used to persist application state to disk or to exchange information between programs” [26]. Reusing an established application file format is ideal because most are cross platform and allow for grouping files into distributable batches of tiles. Further, application file formats don’t require the setup of new infrastructure or services. We only investigated using existing file formats so that existing support for these formats could be taken advantage of. The objective of grouping these tiles is reduce the number of files stored in the file system. With an application file format, we can group tiles by their associated recordings. The result is a reduction in the number of files required for storing the tiles from 185 million (one per tile), to 545 thousand (one per recording)—0.3% of the original.

    We narrowed down possible application file formats to the following formats: TAR, uncompressed Zip, HDF Version 5 (common in other sciences [27]), and SQLite databases [26]. No

    7 https://www.sqlite.org/withoutrowid.html

    compressed formats were tested as the tiles were PNG encoded and further compression would have no meaningful effect. All chosen formats have cross-platform libraries (including in scientific programming languages), command line tools, and support parallel read access which is ideal for serving tiles in a webserver scenario. We excluded the NetCDF format from evaluation because it has the same feature set as HDF5. The SQLite format had parameters that needed to be tuned and tested: specifically, whether to use a ROWID7, and what page size was most appropriate (of 8192, 16384, and 32768) for storing BLOBs8. This results in six SQLite tests per experiment.

    A. Tile format benchmark methodology

    To pick the best format we measured tile storage size and retrieval performance. For the performance benchmarks, the code used has been published to GitHub [28] and this experiment can be reproduced easily by the reader. The benchmarks test these factors:

    • File formats: HDF5, Raw (files on disk as a baseline), SQLite (for six cases), TAR, and uncompressed Zip

    • Two tile sets: small (8369 tiles), and large (50 225 tiles) to test format scalability (min size: 9 kB, avg size: 68 kB, max size: 141 kB)

    • Direct API invocation and the equivalent command line interface (CLI) tools

    • Retrieval speed (time to copy a tile from format to disk) and format size overhead

    • The benchmarks were run on two machines

    o SSD: Windows 10 64-bit, 16GB RAM, Intel i7-3770 CPU (3.4 Ghz), Samsung SSD 850 PRO SSD

    o HDD: Windows Server 2016, 256GB RAM, Intel Xeon E5-2665 (2.4 GHz), Hitachi HDS72303 HDD

    The benchmarks were written in C# and used the BenchmarkDotNet library [29]. The library runs benchmarks continually until the calculated error is less than 99.9% of the confidence interval. The library also accounts for warmup effects, JIT caching, and outliers ensuring stable and reliable benchmarks.

    B. Tile size storage overhead results

    For storage overhead, we were interested in how much storage a format would need compared to the raw size of the tiles. See Table III for results. Generally, the formats introduced very little overhead in terms of storage. Most formats also exhibited linear or fixed overhead as the number of files changed; however, all SQLite formats showed a concerning non-linear increase in overhead. SQLite formats with large page sizes were the worst performing format in terms of size.

    Additionally, the raw format listed is the actual raw size of the data, not the size of data on disk—which incurs a block size inefficiency overhead of 3.6%. This means that most formats use less space than storing files on disk, including the HDF5, Zip, Tar, and SQLite (8192-page size only) formats. Interestingly. not using a ROWID resulted in more space used by the SQLite formats.

    8 https://www.sqlite.org/intern-v-extern-blob.html

    https://www.sqlite.org/withoutrowid.htmlhttps://www.sqlite.org/intern-v-extern-blob.html

  • C. Tile retrieval performance results

    Tile retrieval performance of the tested benchmarks are reported in Table IV and Fig. 6. The benchmarks were conducted in rounds. Before each round a tile was randomly selected. Then, all benchmarks extracted the same tile from their respective formats and wrote the tile to disk. The rounds were repeated until the measured error was below the set threshold. Benchmarks that are considered stable are filtered out of subsequent rounds. An average number of 24 rounds were run per benchmark, excluding the warmup stages.

    All formats showed sensitivity to the drive speed differences of the different test computers, however SQLite formats generally only showed differences of 2% between computers. Other formats like TAR and Zip were affected more. While the TAR format can only be linearly searched, we suspect that the very slow results for the TAR and Zip API tests are due to inefficient libraries used. We cannot explain the extremely poor performance of the HDF5 CLI tool, especially when compared to HDF5’s direct API performance.

    All CLI methods incorporate a fixed cost of approximately 18 ms to start a process due to the high cost of creating a process on the Microsoft Windows operating system. Accounting for this cost, we see that many of the CLI tools—particularly SQLite—are as fast to use as using their APIs directly.

    Fig. 6. Comparison of access times for the tested application file formats. Error bars are one standard deviation of result for each benchmark.

    TABLE III. FORMAT SIZE OVERHEAD COMPARISON

    File Format File

    Count

    File Size

    (GB)

    Size

    Overheada

    HDF5 50 225 1.88 1.01

    Raw (actual size) 50 225 1.86 1.00

    SQLite[RowId][PageSize=16384] 50 225 2.02 1.09

    SQLite[RowId][PageSize=32768] 50 225 2.20 1.18

    SQLite[RowId][PageSize=8192] 50 225 1.94 1.05

    SQLite[NoRowId][PageSize=16384] 50 225 2.20 1.19

    SQLite[NoRowId][PageSize=32768] 50 225 2.46 1.32

    SQLite[NoRowId][PageSize=8192] 50 225 2.02 1.09

    TAR 50 225 1.89 1.02

    Zip 50 225 1.87 1.01

    HDF5 8369 0.58 1.01

    Raw (actual size) 8369 0.57 1.00

    SQLite[RowId][PageSize=16384] 8369 0.59 1.04

    SQLite[RowId][PageSize=32768] 8369 0.64 1.12

    SQLite[RowId][PageSize=8192] 8369 0.58 1.02

    SQLite[NoRowId][PageSize=16384] 8369 0.62 1.09

    SQLite[NoRowId][PageSize=32768] 8369 0.69 1.21

    SQLite[NoRowId][PageSize=8192] 8369 0.60 1.05

    TAR 8369 0.58 1.01

    Zip 8369 0.57 1.00

    a. Overhead calculated as File Size divided by Raw File Size, for the equivalent file counts

    TABLE IV. TILE EXTRACTION PERFORMANCE FOR TESTED FILE FORMATS

    All format performance columns are reported in milliseconds

    Disk Tile

    Count

    CLI/

    API

    HDF5 Raw SQLite

    NoRowId

    16384

    SQLite

    NoRowId

    32768

    SQLite

    NoRowId

    8192

    SQLite

    RowId

    16384

    SQLite

    RowId

    32768

    SQLite

    RowId

    8192

    Tar Zip

    SSD

    50 225 API 4.04 0.60 2.13 1.97 2.10 1.58 1.60 1.61 25 222.31 364.34

    CLI 11 510.84 21.33 19.28 19.45 19.59 18.70 18.89 18.41 380.22 107.20

    8369 API 1.59 0.63 2.20 2.29 2.22 1.50 1.48 1.51 876.23 55.33

    CLI 444.16 21.23 19.53 19.72 19.65 18.98 18.64 18.63 83.46 56.78

    HDD

    50 225 API 5.29 0.77 4.81 3.04 3.26 2.51 2.43 2.48 6311.46 780.80

    CLI 4302.37 55.65 49.06 48.61 49.02 47.23 46.70 46.92 879.96 456.29

    8369 API 2.50 0.69 3.83 3.88 3.76 2.37 2.46 2.50 1272.81 122.96

    CLI 1088.51 55.57 49.59 49.31 49.13 46.90 46.77 47.07 244.13 129.90

    0.10

    1.00

    10.00

    100.00

    1000.00

    10000.00

    Hd

    f5

    Raw

    SqLi

    teN

    oR

    ow

    Id16

    38

    4

    SqLi

    teN

    oR

    ow

    Id32

    76

    8

    SqLi

    teN

    oR

    ow

    Id81

    92

    SqLi

    teR

    ow

    Id1

    63

    84

    SqLi

    teR

    ow

    Id3

    27

    68

    SqLi

    teR

    ow

    Id8

    19

    2

    Tar

    Zip

    Hd

    f5

    Raw

    SqLi

    teN

    oR

    ow

    Id16

    38

    4

    SqLi

    teN

    oR

    ow

    Id32

    76

    8

    SqLi

    teN

    oR

    ow

    Id81

    92

    SqLi

    teR

    ow

    Id1

    63

    84

    SqLi

    teR

    ow

    Id3

    27

    68

    SqLi

    teR

    ow

    Id8

    19

    2

    Tar

    Zip

    API CLI

    Pe

    rfo

    rman

    ce (m

    s, lo

    gari

    thm

    ic)

    Benchmark (grouped by access type)

    Performance of various application file formats when retrieving a file

    SSD - large (50 225) SSD - small (8369) HDD - large (50 225) HDD - small (8369)

  • D. Tile format conclusions

    We chose SQLite (with ROWID and a page size of 8192) as our tile storage application file format. Anecdotally, SQLite was the easiest tool to use. It also comes with great indexing performance, future extensibility options (since additional data can be included as columns), and a low storage size overhead for small tile counts. Since the format is common, there are bindings for every language we need to support, which means we can distribute the tiles—possibly even to a browser’s local storage cache. Finally, SQLite is not a RDMBS and does not require a hosted service to run effectively. While HDF5 performed admirably, we found that shoehorning in PNG image data made for an awkward use of its API. We will likely investigate using HDF5 to store raw index data in the future. Moreover, we believe that the results of these application file format benchmarks are sufficiently generic that they can be applied to other similar problems outside of our field.

    VI. RELATED WORK

    Visualization of acoustic data is a common feature of many bioacoustics software packages. Common audio software packages include Audacity which is a general-purpose audio editing tool and Raven, SongScope, and Kaleidoscope which are acoustic event recognition software packages. These packages all visualize audio data as standard FFT based spectrograms and have various settings for configuring window size (the height of the spectrogram), color palette used, and contrast/brightness settings. Of these programs, only Audacity is capable of rendering spectrograms for long durations of audio—which is how the 24-hour standard spectrogram was rendered in Fig. 1, albeit requiring significant resources. More commonly these programs operate at a fixed scale which is fine enough to allow effective playback of target audio data; this is their primary purpose.

    For a more direct comparison to our work, we refer to other online digital repositories of environmental audio data. Most don’t offer large scale visualization features; though for most, scale is not an explicit goal. For example, Xeno-canto [30] is an excellent online repository of acoustic recordings that has an active contributor community. Xeno-canto specializes in short high-quality recordings that capture just one target species. They aim to collect a wide range of birdcalls, for as many species and regions as possible. As such, they are considered a bioacoustics repository and the only visualization they have, and arguably need, are standard FFT spectrograms.

    Pumilio [31] is a digital sound archive designed to host collections of environmental audio data. Pumilio, unlike Xeno-canto, is designed for collecting large amounts of passively recorded audio data. Pumilio shows a standard FFT spectrogram for each audio recording in its database, either in a list format when showing many recordings, or by itself when showing one recording [31]. The generated spectrogram is compressed to fit within the container it is shown in. Pumilio has implicit assumptions about the size of recordings added to its database and assumes that many short recordings are typical. For short recordings (1–10min) these visualizations show a reasonable amount of information. However, they are non-interactive images and are compressed to inconsistent scales; these factors combined result in visualizations that do not meet our requirements for exploring large audio datasets. We found one example [32] of a customized Pumilio website that had generated compressed spectrograms for 12 hour files. The resulting images are similar to the compressed spectrogram seen

    in Fig. 3: Some detail can be seen yet most of the fine structure of the audio is lost through aggregation.

    Other acoustics digital repositories, like R.E.A.L [33] and ARBIMON [34] also only use standard FFT spectrograms to visualize data.

    Ecoacoustics researchers have previously explored visualization for large scales of data. Their approach, also using acoustic indices, reduces the audio down to a series of low dimensional points that allow a single site to be expressed as a series on a line chart [35, 36]. Extensions to this concept apply the calculation to many sites (locations) in a region to plot a surface or contour map that changes over time [35-37]. These techniques argue that the acoustic indices used to represent their data have meaning [15, 38]. However, these visualizations aggregate over all the fine structure in the audio recordings, still treat the audio as an opaque data source, and are not interactive.

    Other examples of generic audio visualization do exist. Lin, et al. [39] created an interactive, zoomable, saliency-maximized spectrogram interface, entitled TimeLiner. Their method showed that enhancing certain aspects of audio data improved the detection time for acoustic events when compared to a standard spectrogram. This enabled users to scan data faster than they could otherwise listen to it and thus understand greater amounts of their data. Their work is important but differs from our interface in terms of scale; TimeLiner was tested on 80-minute recordings whereas our zoomable index spectrograms show up to a week of audio data. We do not believe their saliency-maximized spectrograms are comparable to index spectrograms at the 60.0 s/px scale, yet, we do think their method may have utility at higher resolutions (from 0.5 s/px to 7.5 s/px).

    To summarize, to the best of our knowledge, no other software can provide the interactive and scalable visualization capabilities that allow users to understand the content and fine structure of the soundscapes in their environmental audio data.

    VII. CONCLUSION

    Ecoacoustics is a dynamic field that is generating continually larger acoustic data sets. Our ecoacoustics software, the Acoustic Workbench, now provides a new soundscape visualization whose features were summarized in this paper. We introduced our design for a zooming visualization interface that allows scientists to understand their typically opaque datasets through variable resolution index spectrograms. We have anecdotal evidence that this interface helps scientists not only interrogate their data at scale but also allows them to make discoveries at variable temporal resolutions. Future work includes both quantitative and qualitative user studies to evaluate the effectiveness of the interface for extracting ecologically relevant information.

    An exploratory study for choosing an appropriate application file format for the visualization’s pre-rendered raster tiles was also presented, as this choice has a significant impact on system performance. Our results led us to choose the SQLite format as our tile storage application file format. We believe that our findings are generalizable and could benefit research software engineers outside of the ecoacoustics field.

    ACKNOWLEDGMENT

    The authors thank Mark Cottman-Fields, Phil Eichinski, and Jessica Oliver for their contributions to Ecosounds and the Acoustic Workbench. Additionally, we thank Michael Towsey

  • for the extensive work involved in developing the index spectrograms used in the Acoustic Workbench. This work was supported by an ARC Discovery Project DP170104004 Earth soundscapes: A human-computer approach to environmental sound analysis.

    This research was supported by the Queensland Cyber Infrastructure Foundation (QCIF) and by use of the Nectar Research Cloud. The Nectar Research Cloud is a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy (NCRIS). This research was also supported by QUT’s High Performance Computing group.

    REFERENCES

    [1] J. Sueur and A. Farina, "Ecoacoustics: the Ecological Investigation

    and Interpretation of Environmental Sound," Biosemiotics, vol. 8,

    no. 3, pp. 493-502, 2015// 2015.

    [2] J. Wimmer, M. Towsey, B. Planitz, I. Williamson, and P. Roe, "Analysing environmental acoustic data through collaboration and

    automation," Future Generation Computer Systems, vol. 29, no. 2,

    pp. 560-568, 2013. [3] M. Towsey et al., "Long-duration, false-colour spectrograms for

    detecting species in large audio data-sets," Journal of Ecoacoustics,

    vol. 2, p. IUSWUI, 2018. [4] B. C. Pijanowski, A. Farina, S. H. Gage, S. L. Dumyahn, and B. L.

    Krause, "What is soundscape ecology? An introduction and

    overview of an emerging new science," (in English), Landscape Ecology, vol. 26, no. 9, pp. 1213-1232, 2011/11/01 2011.

    [5] A. Truskinger, M. Cottman-Fields, P. Eichinski, M. Towsey, and P.

    Roe, "Practical Analysis of Big Acoustic Sensor Data for Environmental Monitoring," presented at the 2014 IEEE Fourth

    International Conference on Big Data and Cloud Computing

    (BdCloud), Sydney, Australia, 3-5 December 2014, 2014. Available: http://dx.doi.org/10.1109/BDCloud.2014.29

    [6] A. Hill, P. Prince, E. Piña Covarrubias, P. Doncaster, J. Snaddon,

    and A. Rogers, "AudioMoth: Evaluation of a smart open acoustic device for monitoring biodiversity and the environment," Methods

    in Ecology and Evolution, vol. 9, no. 5, pp. 1199-1211, 2018. [7] I. Potamitis, S. Ntalampiras, O. Jahn, and K. Riede, "Automatic bird

    sound detection in long real-field recordings: Applications and

    tools," Applied Acoustics, vol. 80, no. 0, pp. 1-9, 6// 2014. [8] M. W. Towsey, A. M. Truskinger, and P. Roe, "The Navigation and

    Visualisation of Environmental Audio Using Zooming

    Spectrograms," in 2015 IEEE International Conference on Data Mining Workshop (ICDMW), 2015, pp. 788-797.

    [9] J. Foote, "An overview of audio information retrieval," Multimedia

    Systems, journal article vol. 7, no. 1, pp. 2-10, 1999. [10] A. Truskinger, M. Cottman-Fields, and P. Roe, "Acoustic

    Workbench (Version 1.5.1)," Computer Software, 2018.

    [11] Github and various authors, "Resque," 1.25.2 ed, 2014. [12] M. Friendly, "A Brief History of Data Visualization," in Handbook

    of Data VisualizationBerlin, Heidelberg: Springer Berlin

    Heidelberg, 2008, pp. 15-56. [13] M. W. Towsey, J. Wimmer, I. Williamson, P. Roe, and P. Grace,

    "The calculation of acoustic indices to characterise acoustic

    recordings of the environment," QUT ePrints, Brisbane, Australia2012, Available: http://eprints.qut.edu.au/53710/.

    [14] M. Towsey, L. Zhang, M. Cottman-Fields, J. Wimmer, J. Zhang,

    and P. Roe, "Visualization of Long-duration Acoustic Recordings of the Environment," Procedia Computer Science, vol. 29, no. 0, pp.

    703-712, // 2014.

    [15] N. Pieretti, A. Farina, and D. Morri, "A new methodology to infer the singing activity of an avian community: the Acoustic

    Complexity Index (ACI)," Ecological Indicators, vol. 11, no. 3, pp.

    868-873, 2011. [16] J. Sueur, S. Pavoine, O. Hamerlynck, and S. Duvail, "Rapid

    Acoustic Survey for Biodiversity Appraisal," PLoS ONE, vol. 3, no.

    12, p. e4065, 2008. [17] T. Dema, M. Brereton, J. L. Cappadonna, P. Roe, A. Truskinger, and

    J. Zhang, "Collaborative Exploration and Sensemaking of Big

    Environmental Sound Data," Computer Supported Cooperative

    Work (CSCW), journal article May 31 2017.

    [18] T. Dema et al., "An Investigation into Acoustic Analysis Methods for Endangered Species Monitoring: A Case of Monitoring the

    Critically Endangered White-Bellied Heron in Bhutan," in 2017

    IEEE 13th International Conference on e-Science (e-Science), 2017, pp. 177-186.

    [19] Y. F. Phillips, M. Towsey, and P. Roe, "Revealing the ecological

    content of long-duration audio-recordings of the environment through clustering and visualisation," PLOS ONE, vol. 13, no. 3, p.

    e0193345, 2018.

    [20] M. Bostock, "D3.js," in Data Driven Documents, 3.5.5 ed, 2012. [21] P. M. Fitts and J. R. Peterson, "Information capacity of discrete

    motor responses," Journal of Experimental Psychology, vol. 67, no.

    2, pp. 103-112, 1964. [22] M. Belshe, R. Peon, and M. Thomson, "RFC 7540: hypertext

    transfer protocol version 2 (HTTP/2)," Internet Engineering Task

    Force (IETF), May 2015. [23] M. Folk, A. Cheng, and K. Yates, "HDF5: A file format and I/O

    library for high performance computing applications," in

    Proceedings of supercomputing, 1999, vol. 99, pp. 5-33. [24] R. Rew and G. Davis, "NetCDF: an interface for scientific data

    access," IEEE computer graphics and applications, vol. 10, no. 4,

    pp. 76-82, 1990. [25] A. S. Szalay et al., "The SDSS skyserver: public access to the sloan

    digital sky server data," in Proceedings of the 2002 ACM SIGMOD

    international conference on Management of data, 2002, pp. 570-581: ACM.

    [26] SQLite. (2017). SQLite As An Application File Format. Available:

    https://sqlite.org/appfileformat.html [27] The HDF Group. (2017). Academic Research. Available:

    https://www.hdfgroup.org/our-industries/academic-research/

    [28] A. Truskinger, "Index File Peformance Benchmarks," vol. v1.0. doi: 10.5281/zenodo.817630 Available:

    https://github.com/atruskie/index-file-perf-test/tree/v1.0

    [29] A. Akinshin, "BenchmarkDotNet," Available: https://github.com/dotnet/BenchmarkDotNet/releases/tag/v0.10.8

    [30] Xeno-canto Foundation. (2014, 22/07/14). Colophon and Credits. Available: http://www.xeno-canto.org/about/credits

    [31] L. J. Villanueva-Rivera and B. C. Pijanowski, "Pumilio: A Web-

    Based Management System for Ecological Recordings," Bulletin of the Ecological Society of America, vol. 93, no. 1, pp. 71-81,

    2012/01/01 2012.

    [32] Terrestrial Ecosystem Research Network. (2017, 2017-06-30). BioAcoustics Portal. Available: https://bioacoustics.tern.org.au/

    [33] E. P. Kasten, S. H. Gage, J. Fox, and W. Joo, "The remote

    environmental assessment laboratory's acoustic library: An archive for studying soundscape ecology," Ecological Informatics, vol. 12,

    no. 0, pp. 50-67, 11// 2012.

    [34] T. M. Aide, C. Corrada-Bravo, M. Campos-Cerqueira, C. Milan, G. Vega, and R. Alvarez, "Real-time bioacoustics monitoring and

    automated species identification," PeerJ, vol. 1, p. e103, 2013/07/16

    2013.

    [35] J. Sueur, A. Farina, A. Gasc, N. Pieretti, and S. Pavoine, "Acoustic

    Indices for Biodiversity Assessment and Landscape Investigation,"

    Acta Acustica united with Acustica, vol. 100, no. 4, pp. 772-781, // 2014.

    [36] S. H. Gage and A. Farina, "The Role of Sound in Terrestrial

    Ecosystems: Three Case Examples from Michigan, USA," Ecoacoustics: The Ecological Role of Sounds, p. 31, 2017.

    [37] A. Farina, E. Lattanzi, R. Malavasi, N. Pieretti, and L. Piccioli,

    "Avian soundscapes and cognitive landscapes: theory, application and ecological perspectives," Landscape Ecology, vol. 26, no. 9, pp.

    1257-1267, 2011 2011.

    [38] A. Gasc et al., "Assessing biodiversity with sound: Do acoustic diversity indices reflect phylogenetic and functional diversities of

    bird communities?," Ecological Indicators, vol. 25, no. 0, pp. 279-

    287, 2// 2013. [39] K.-H. Lin, X. Zhuang, C. Goudeseune, S. King, M. Hasegawa-

    Johnson, and T. S. Huang, "Improving faster-than-real-time human

    acoustic event detection by saliency-maximized audio visualization," in 2012 IEEE International Conference on Acoustics,

    Speech and Signal Processing (ICASSP), 2012, pp. 2277-2280:

    IEEE.

    http://dx.doi.org/10.1109/BDCloud.2014.29http://eprints.qut.edu.au/53710/https://sqlite.org/appfileformat.htmlhttps://www.hdfgroup.org/our-industries/academic-research/https://github.com/atruskie/index-file-perf-test/tree/v1.0https://github.com/dotnet/BenchmarkDotNet/releases/tag/v0.10.8http://www.xeno-canto.org/about/creditshttps://bioacoustics.tern.org.au/

    I. IntroductionII. Ecosounds and the Acoustic WorkbenchA. ArchitectureB. Major Features

    III. Visualization of Environmental AudioTypes of VisualizationA. Index SpectrogramsB. Variable Resolution Index Spectrograms

    IV. Building Zoomable Index SpectrogramsA. Technical implementationB. Tile sizes and resolutionsC. Computational complexityD. Coordinate systemsE. Imperfect data

    V. Exploring Tile Storage TechniquesA. Tile format benchmark methodologyB. Tile size storage overhead resultsC. Tile retrieval performance resultsD. Tile format conclusions

    VI. Related WorkVII. ConclusionAcknowledgmentReferences


Recommended