Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale ComputersJohann Hauswald, Michael A. Laurenzano, Yunqi Zhang, Cheng Li, Austin Rovinski, Arjun Khurana, Ron Dreslinski, Trevor Mudge, Vinicius Petrucci, Lingjia Tang, Jason Mars
Intelligent Personal Assistants (IPAs) are standard in today’s mobile devices. The rapid rise in IPA equipped devices means more compute intensive queries will be hitting current datacenters which are ill-suited to handle this type of workload.
1. Problem Statement:Redesigning the Datacenter for Intelligent Personal Assistants
2. Sirius: An Open End-to-End Voice and Vision Personal AssistantAnswer
Question-Answering
Search Database
Question
ActionExecute
Action
Mob
ile
Ser
ver
DisplayAnswer
ImageDatabase
Image Matching
Image
Image D
ata
Voice Questionor
Action
Query Classifier
AutomaticSpeech-Recognition
Users
Figure 1: End-to-End Sirius Pipeline
Users
Voice Command(VC)
Voice Query(VQ)
Voice-Image Query(VIQ) Query Taxonomy
IPA Services
AlgorithmicComponents
Gaussian Mixture Model (GMM)or
Deep Neural Network (DNN)
Automatic-Speech Recognition
(ASR)
StemmerRegularExpression
ConditionalRandom Fields
Question Answering(QA)
Feature Extraction
Feature Description
Image Matching(IMM)
TasksNatural LanguageProcessing
Image ProcessingSignal Processing
Open SourceTools
CMU Sphinx
Figure 2: Top-down view of Sirius
Sirius: built from the latest open source tools; Sirius resembles current production intelligent personal assistants in its algorithmic components.
Figure 3: Sirius Service Cycle Breakdown
Clarity Lab, University of Michigan, Ann Arbor, MI, USA
4. Implications for Future Warehouse Scale Computers3. Accelerating Sirius-suiteSirius-suite: extracted from Sirius, this is a suite of the 7 most computationally demanding kernels in Sirius representing 92% of the total execution time.
Sirius-suite
SpeedupFigure 4: Heat-map of Sirius-suite Acceleration
Platform Model Clock Threads
CMP Intel Xeon E3-1240 V3 3.40 GHz 8
GPU NVIDIA GTX 770 1.05 GHz 12288
Intel Phi Phi 5110P 1.05 GHz 240
FPGA Xilinx Virtex-6 ML605 400 MHz N/A
Table 1: Sirius-suite Ported Platforms
Latency Reduction: FPGA: 16x GPU: 10x
Figure 5: Latency ReductionAcross Sirius Services
Total Cost of Ownership (TCO) Improvement
GPU: 2.6x FPGA: 1.4x
Call my doctor. Who’s the lead singer of U2?
When does this bar close?
Figure 6: Latency Reduction Figure 7: Performance/Watt Improvement
Figure 8: TCO Reduction