Date post: | 09-Apr-2017 |
Category: |
Documents |
Upload: | muhammad-ahsan |
View: | 254 times |
Download: | 1 times |
Architecture Framework for Trapped-ion Quantum Computer
Architecture Framework for Trapped Ion Quantum Computer
Muhammad Ahsan[Ph.D. Defense]Department of Computer Science, Duke University
AgendaBackground and Motivation
Quantum Hardware and Architecture Models
Benchmark Application Circuits
Performance Simulation Tool
Results
My Research BackgroundComputer ArchitectureQuantum ComputingFault Tolerance Quantum Error CorrectionQuantum Computer ArchitectureResource Performance EstimationToolTopics of the defenseSome Interesting findings!Computer Systems
Undergraduate
Ph.D. Research Computer ScienceElectrical EngineeringPhysics
Quick Introduction to Quantum Computing (1)Quantum Computer consists ofQuantum- Bit (Qubit) : Store Information in binary basis states:|0>,|1> (e.g. Trapped-(171Yb+) Ion energy levels)Gates: Process Information (e.g. Lasers to cause transition between energy levels)
Q u b i t sT i m e
Quantum Circuit
Quantum HardwareQ u b i t sLasersGatesIons
XUnitary UU = I operationsU |a> |a>
Quick Introduction to Quantum Computing (2)What makes quantum computers interesting (non-conventional)Superposition of two states : a|0> + b|1> 1 Entanglement between qubits: a|00> + b|11>
What makes quantum computers more powerfulPhase-Gates: (a|0> + ei/2 b|1> ) Amplitudes (a, b) are complex (e.g. a|0> - b|1> )Quantum Speedup: Amplitude cancellation can efficiently eliminate incorrect candidate solutions to search problem
Universal Quantum Computation {H, , X, Z , S}
H
ZS
CNOTHadamard /2 phase-shift gate phase-shift gate
T/4 phase-shift gate (T gate) UControlled CNOT(Toffoli gate)Toffoli , T gates are Double-Edged SwordsPractical Quantum SpeedupPractically Resource ConsumingInsufficient for arbitrary Quantum Computation
orClifford-Gates
1 |a|2+|b|2=1
Quantum Computing in the nutshellTheoretically, Quantum Computers can solve certain important problems much faster than conventional (classical) computers: Shors Integer Factorization Algorithm (Exponential speedup)
Practically, quantum device component (qubits, gates) are very noisy and unreliable than classical computers
Need Error-Correction (Redundancy) to protect quantum InformationMean Time to Failure:Classical:~ 107 108 hoursQuantum:~Seconds MinutesFailure Prob.p = 10-31 in 1,000 Quantum Gate fails
Example: Fault Tolerant 3-qubit (Toffoli) Gate
Error CorrectionEncodingEncodingEncodingError CorrectionError Correction
4-cat4-cat dec.4-cat4-cat dec.
4-cat4-cat dec.
4-cat4-cat dec.4-cat4-cat dec.4-cat4-cat dec-1-1-1-1-1-1-1-1-1-1-1-1RecoveryUnprotected Quantum GateFault Tolerant Quantum Gate
LOGICAL QUBITANCILLA QUBITSANCILLA QUBITS
Parity ChecksSpecialEntangled Qubit State
Large Number of Additional Qubits, Gates to reduce effective noise level from O(p) -> O(p2)
e.g. Steane [[7,1,3]] code- -- -- -- -- -- -- -
Multiple Layers of Encoding in [[7,1,3]] codeNo EncodingSingle-Layer (L1) EncodingTwo-Layers (L2) EncodingQubitsClifford-Gates{H,X,Z,CNOT}Noise LevelFailure Prob. (p)1772 = 49p [e.g. p = 10-7]O(p2)O(p4) [e.g. 10-16]1772 = 49Non-Clifford Gates{e.g. Toffoli}1O(103)O(105) [e.g. 10-10]
Good News: Gain in Reliability > Qubit, Gate Overhead 0O(101)O(102)Ancilla Qubits
102-103x
106x
Fundamental Research Question
To Estimate
How Reliable? And How Many?
Qubits, Gates are needed to accomplish what classical computer cannot in Realistic Time Scale(e.g., 2,048-bit factorization)
Answer depends on Compilation of Quantum Application into Fault-tolerant gates (e.g. Gate Decomposition Methods)Fault-Tolerance Overhead (e.g. Error-Correcting Codes)Integrate Qubits, Gates onHardware (e.g. Trapped-Ion)Quantum Application (e.g. Shors Algorithm)
Research Progress (Theory)Research Progress (Experiment)Precise Estimate Needs Information about the Quantum Hardware Loose Lower Bounds on the Estimate
Fundamental Research Question
How Reliable? And How Many?
Qubits, Gates are needed to accomplish what classical computer cannot in Realistic Time Scale (e.g., 2,048-bit factorization)Answer Heavily Depends on the Architecture of Quantum Hardware
Impact of Hardware Assumptions on the Speed of Quantum Computer
(Included with permission of rdv, TDL, KMI, quant-ph/0507023)Days -> YearsHours -> DaysClassical QuantumArchitectureAssumptionMatters!1,024-bit Factorization
Question
Why think about architecture for large quantum computer ??
WHEN We do not know exactly how to build a small quantum computer
AnswerArchitecture canCompensate Technology Limitations (Memory hierarchy, Multi-Core Designs)Reveal performance-limiting factorsGuide future advances in technology
Example from the History
SlowerUnreliable ComputersDiscreteTransistors
IntegratedCircuits (IC)
Fast and ReliableComputers
MOORES LAW
Research MethodologyNeed to Define Mechanism in which very large number of Qubits are Allocated FunctionedProtectedConnectedin a realistically constructible quantum computer system
Need a method to efficiently Model Quantum Computer ArchitectureMap Quantum Application on the ArchitectureEvaluate the Performance Limiting Factors
Quantum Computer Architecture
Performance SimulationCrucial Components of my Research
Communication Channel
Tool: Taxonomy of Important TermsDevice Parameters (DPs)e.g. physical gate times, failure probability
Resource Investmente.g. Total physical qubits used in the system
Architecture ParametersFunctional Allocation (Data, Ancilla) and Connectivity of qubitsPerformance Metricse.g. Total Execution Time (TEXEC), Failure Probability (PFAIL) prob. That quantum circuit gives incorrect output
Design Space
Research Methodology
Quantum Circuit (Quantum Adder)Quantum Hardware (e.g., Trapped-ion)Quantum Architecture (MUSIQC)Mapping (Qubits -> physical resources)
Scheduling (Gates -> Sequence of physical operations)
Performance Analysis (Latency, Reliability)
Papers/PublicationsPerformance simulator based on Hardware Resources Constraints for Ion- Trap Quantum Computer (ICCD 2013)
Optimization of a Quantum Computer Architecture Using Resource Performance Simulator (DATE 2015)
Designing Million-Qubit Quantum Computer Using Resource Performance Simulator [In Submission (2nd Attempt)]
Challenge: Target Community Mostly Unfamiliar with Quantum Computing
Philosophy of Performance Simulation Toolset(1)Device ParametersResource OverheadPerformance EvaluationArchitecture Parameters
Mapping, SchedulingQuantum CircuitFault ToleranceResource AllocationIn HardwareLower levelHardware constraints
QuantumCircuitPrior Work:
Svore et al. (2004)Balensiefer et al. (2005)Whitney et al. (2007)Dousti and Pedram (2012)F I X E DLESS FLEXIBLELESS FLEXIBLEINSUFFICIENT I N S I G H T
Philosophy of Performance Simulation Toolset (2)Device ParametersResource OverheadPerformance EvaluationArchitecture Parameters
Mapping, SchedulingQuantum CircuitFault ToleranceResource AllocationIn HardwareLower levelHardware constraints
QuantumCircuitMy Contribution
Fine Tuning KnobMagnifyingGlass
My Contribution
Philosophy of Performance Simulation Toolset (2)Device ParametersResource OverheadPerformance EvaluationArchitecture Parameters
Mapping, SchedulingQuantum CircuitFault ToleranceResource AllocationIn HardwareLower levelHardware constraints
QuantumCircuitMy Contribution
KnobMagnifyingGlass
DesiredPerformance?
NoNoNo
Qubits how Reliable? Qubits How Many?
What have Learned So FarQuestion: Quantum Computer Practically Faster Classical Computer?What is the quality and the amount of resources needed?
Answer: Depends upon Quantum Computer Architecture study Performance Simulation Tool for Architecture Study
Quantum Computer Design CycleFlexible Tool to balance improvement in Device Parameters and investment in Resource , Architecture
AgendaBackground and Motivation
Quantum Hardware and Architecture Models
Benchmark Application Circuits
Performance Simulation Tool
Results
Quantum Gate and Hardware Model
Laser (gate)
Ions (qubits)ElectrodesOptical SwitchPhotonDetectorsPhotonsBallistic Shuttling Channel
U
M{+1, -1}Entangled Pair(EPR pair) Quantum GatesQuantum bits (Qubits)
Video credit: Jason Amini
Beam Splitter
Fault-tolerant and Scalable Quantum Computer Architecture Planar ion traps. . . Layer-1 Optical Switch (OS)
. . .. . .. . .. . .
. . .. . .. . .. . .
. . .. . .. . .. . .
Segment
photonic Links
Basic Ion-trapCell
Layer-2 Optical Switch (2x time expensive than Layer-1 OS)Empty ChannelsFor Ballistic Shuttling
Different Qubit BlocksCommunication PortArchitecture Idea:
Combines the good of
IONS:ReliableStorage and Computation
PHOTONS:Communication
Hardware Description and Device ParametersU
MIDevice Parameters (DPs)
2L x 5000s. . .. . .5000s @ L=0
10,000s @ L=120,000s @ L=2
Speed and Reliability of Computation > Speed and Reliability of Communication
AgendaBackground and Motivation
Quantum Hardware and Architecture Models
Benchmark Application Circuits
Performance Simulation Tool
Results
Shors Integer Factorization Algorithm Circuit Controlled Modular Exponentiation: U(x) = ax mod NContains O(n2) Adder calls: 512-bit ~ 1 Million Adders 1024-bit ~4 Million Adders 2048-bit ~16 Million Adderm-qubit Registerm = 2n...
. . . . .. . . . .. . . . .. . . . .. . . . .
Inverse Quantum Fourier Transform...MZMZMZMZMZ
n-qubit register
Contains:O(n2) Small angle phase Rz(/2n+1) =Depth = O(n)For n-bit integer NGCD (a, N) = 1, a < N
N = (ar/2-1) (ar/2+1)
Period r is hidden inEigenvalues of U(x) = ax mod N
Classical Complexity: Exponential in n
Quantum Complexity:Polynomial O(n3)
Bulk of Shors Algorithm.
Benchmark Circuits (Approx. QFT)
HTHTZ. . .
. . .T
TXT
DecodeMZ7 cat|0>LT|+>Magic State Preparation (latency: 78 ms)
SXMT|+>|Data>T|Data>Data Teleportation into Magic State (latency: 12 ms)
|Data>|Data>A1A2A3
TimeExec. DelayV. Kliuchnikov et al. (2013)Fowler et al. (2005)Rz(/28) sufficient to factorize Integers > 4096-bit long
7 Ancilla give Delay -> 0 ~375 gates (~150 T gates) forApprox. Accuracy within 10-16Long Sequence of Approximation Gates
Benchmark Circuits (Quantum Adders)
Quantum Carry-Look Ahead Adder(QCLA) log-depthDraper et al. (2004)Quantum Carry-Look Ahead Adder(QCLA) linear-depthCuccaro et al. (2004)
Magic State PreparationData Injection into Magic StateNeed about 4 Ancilla blocksManyLong-Distance GatesOnlyNearest-NeighborGatesToffoli
Non-Local Toffoli Gatee1e2
Local Quantum operations
Local Quantum operationsSegment-1Segment-2QuantumTeleportation
CommunicationMagic-State Prep.Error Correction
DATA Qubits Ancilla QubitsCommunication (COMM) Qubits
Fault-tolerant and Scalable Quantum Computer Architecture Planar ion traps. . . Layer-1 Optical Switch (OS)
COMM. Qubit Block
. . .. . .. . .. . .
. . .. . .. . .. . .
. . .. . .. . .. . .
Segment
Ancilla Qubit BlockData Qubit BlockOptical Links
Basic Ion-trapCell
Layer-2 Optical Switch (2x time expensive than Layer-1 OS)Empty ChannelsFor Ballistic ShuttlingArchitecture ParametersList
NSeg: Number of Segments
NData: Data Tiles/Segment
NComm: Comm. Tiles /Segment
NAnc: Total Ancilla Tiles
Comm. TileAncilla TileData Tile
AgendaBackground and Motivation
Quantum Hardware and Architecture Models
Benchmark Application Circuits
Performance Simulation Tool
Results
Toolbox Description(1)
Tile DesignerTile DatabaseLow Level SchedulerLow Level MapperLow Level Error AnalyzerFault Tolerant Circuit GeneratorHigh Level SchedulerHigh Level MapperHigh Level Error AnalyzerQuantum Application Circuit GeneratorInput: Architecture ParametersOutput: Failure Probability Probability, Latency, Resource count
Tile Designer and Performance Analyzer (TDPA)
VisualizerPerformance Metrics DecomposerInput: DPsApplication-Level Designer and Performance Analyzer (ADPA) FLEXIBILITY
Critical Tool Components
Do not Assume Fixed Device, Architecture Parameter ValuesDEEPER INSIGHTS
AdvancedOutput Analysis Components
Toolbox Description (2)
ErrorCorrection Circuit
Quantum Carry Look-Ahead AdderMUSIQC ArchitectureTDPAADPA
Low-Level Mapper: Physical Qubits to Ions Low-Level Scheduler: Physical Gates to Ion Manipulation, MovementLow-Level Error Analyzer: Calculates Logical Failure Prob. from Physical Fail Prob. High-Level Mapper: Logical Qubits to Tiles High-Level Scheduler: Logical Gates to Tile operations and Cross-Segment Movement High-Level Error Analyzer: Calculates Application level Failure Prob.
Tile
Tile Performance (TEXEC, PFAIL)
Tool and Analysis Method
q1q2q3q4q5q6q7q8q9q10
q1,q2q3q4,q5q6,q7q8, q9q10
q1,q2q3q4,q5q6,q7q8, q9q10
T1T2T3T4Quantum CircuitDevice Parameters
Architecture DefinitionPERFORMANCEMETRICSDECOMPOSER/VISUALIZERHIGH-LEVEL SCHEDULERHIGH-LEVEL MAPPERPre-Computed Tile Performance (TDPA)InputInput
Main Algorithms/HeuristicsMapper:Goal: Circuit-Level Connectivity Hardware Level Proximity Algorithm: Polynomial Time Graph-Theory algo. solving Optimal Linear Arrangement Problem
Scheduler: Goal: Reduce Execution Time, Insert Error Correction to Minimize Failure Prob. Algorithm: [Greedy] Dispatch gate for execution AS SOON AS resources become available
Error Analyzer:Goal: To evaluate PFAIL by fully counting logical error eventsAlgorithm: O(n3) Fault-path Counting
Details in the Thesis
Validity/Optimality of Scheduler
Correctness: Detailed analysis the complete schedule (Smaller Circuits )Overall validation by using Visualizer Output (Larger Circuits)
Optimality Comparing the TEXEC of circuit with and without resource constraintsX-fold Resource should yield X-fold (or more) in TEXEC TEXEC with constraint approach TEXEC with no constraints
64-bit Quantum Carry Look-Ahead Adder Circuit
FewerResourceRegimeSufficientResourceRegimeShow that Mapper, Scheduler good atWorking with Fewer ResourcesAchieve optimal Performance with Sufficient Resource
Demo
Breakdown
of
TEXEC
A2B2C2
S1S2
Magic State Preparation OverheadToffoli Gate ExecutionCNOT Gate ExecutionCross-Segment Swapping Overhead
Time
Breakdown of Critical Path
Segment S2Magic State Prep
A1B1C1
Segment S1Magic State PrepData Tel. into Magic StateData Tel. into Magic StateMagic State PrepMagic State PrepB1B2C2A1A2C1Data Tel. into Magic State
A1B1C1A2B2C2Circuit To be Scheduled
Requires Cross-SegmentSwapping
Execution Time (seconds)Segment ID NumberAdder Circuit (QCLA)Time StepsScheduled Adder Circuit (QCLA)Demo VisualizationQ u b I t s Horizontal Lines:Delays due to fewerAncilla qubitsNon-horizontal Lines:Delays due to fewerComm. qubits
What Have Learned So Far Benchmark CircuitsApproximate Quantum Fourier Transform (AQFT: Long sequence of Approx. gates)Quantum Ripple Carry Adder (QRCA: nearest-neighbor gates)Quantum Carry Look-Ahead Adder (QCLA: Highly Parallelizable circuit, long-distance gates)
Performance-Simulation Tool Mapper, Scheduler, Error Analyzer (Standard components)Can work with varying device and architecture parametersAdvanced components:VisualizerPerformance metrics DecomposerValidation and optimality of Mapper, Scheduler
Remember this picturePlanar ion traps. . . Layer-1 Optical Switch (OS)
COMM Tile
. . .. . .. . .. . .
. . .. . .. . .. . .
. . .. . .. . .. . .
Segment
ANCILLA TileDATA TileOptical Links
Basic Ion-trapCell
Layer-2 Optical Switch (2x time expensive than Layer-1 OS)Empty ChannelsFor Ballistic ShuttlingArchitecture ParametersList
NSeg: Number of Segments
NData: Data Tiles/Segment
NComm: Comm. Tiles /Segment
NAnc: Total Ancilla Tiles
AgendaBackground and Motivation
Quantum Hardware and Architecture Models
Benchmark Application Circuits
Performance Simulation Tool
Results
First Set of Simulations (Setup)Goal: Study Performance Limiting Arch. Device Parameters Benchmark Circuit1024-bit QCLA
Error Correction (Steane [[7,1,3]] code)Two layers of concatenation (L1, L2) L2 Error-correction after each gateQubit sitting idle for long enough timePerform L2 Error Correction only(No L1 Error-Correction), Qubits can decohere for ~ 4.8ms
ComputationCNOT (Local, Non-Local)TOFFOLI (Local) Can execute Toffoli in any Segment
Precomputed Tile performance numbers
L2 Toffoli : ~50 ms, O(10-14)L2 cross-Seg Teleportation: ~10ms, O(10-11)
Labelling the Architecture SpaceTEXEC as function of Architecture Parameters
Large Segments, TEXEC depends on Ancilla(Less Distributed System)ANCILLA- REGIMESmall Segments, TEXEC depends on Comm. Qubits(Highly Distributed System)TEL- REGIME
Optimal Architecture Selection (Minimizing TEXEC - Qubits product)
Ancilla Regime Architecture
Tel-Regime ArchitectureMinimum Exec. Time-Qubit Product for (NSeg = 4, NAnc = 1362, NComm = 1) Minimum Exec. Time-Qubit Product for (NSeg = 16, NAnc = 1362 NComm = 6)
Reducing PFAIL using Device Parameters (DP)
Improving DP: Qubit Memory 10 x Failure Prob. Reduction ~ 100 x 1000 x 4 million Adder calls need Failure Prob.