+ All Categories
Home > Documents > Ph.D. Defense

Ph.D. Defense

Date post: 09-Apr-2017
Category:
Upload: muhammad-ahsan
View: 254 times
Download: 1 times
Share this document with a friend
69
Architecture Framework for Trapped Ion Quantum Computer Muhammad Ahsan [Ph.D. Defense] Department of Computer Science, Duke University
Transcript

Architecture Framework for Trapped-ion Quantum Computer

Architecture Framework for Trapped Ion Quantum Computer

Muhammad Ahsan[Ph.D. Defense]Department of Computer Science, Duke University

AgendaBackground and Motivation

Quantum Hardware and Architecture Models

Benchmark Application Circuits

Performance Simulation Tool

Results

My Research BackgroundComputer ArchitectureQuantum ComputingFault Tolerance Quantum Error CorrectionQuantum Computer ArchitectureResource Performance EstimationToolTopics of the defenseSome Interesting findings!Computer Systems

Undergraduate

Ph.D. Research Computer ScienceElectrical EngineeringPhysics

Quick Introduction to Quantum Computing (1)Quantum Computer consists ofQuantum- Bit (Qubit) : Store Information in binary basis states:|0>,|1> (e.g. Trapped-(171Yb+) Ion energy levels)Gates: Process Information (e.g. Lasers to cause transition between energy levels)

Q u b i t sT i m e

Quantum Circuit

Quantum HardwareQ u b i t sLasersGatesIons

XUnitary UU = I operationsU |a> |a>

Quick Introduction to Quantum Computing (2)What makes quantum computers interesting (non-conventional)Superposition of two states : a|0> + b|1> 1 Entanglement between qubits: a|00> + b|11>

What makes quantum computers more powerfulPhase-Gates: (a|0> + ei/2 b|1> ) Amplitudes (a, b) are complex (e.g. a|0> - b|1> )Quantum Speedup: Amplitude cancellation can efficiently eliminate incorrect candidate solutions to search problem

Universal Quantum Computation {H, , X, Z , S}

H

ZS

CNOTHadamard /2 phase-shift gate phase-shift gate

T/4 phase-shift gate (T gate) UControlled CNOT(Toffoli gate)Toffoli , T gates are Double-Edged SwordsPractical Quantum SpeedupPractically Resource ConsumingInsufficient for arbitrary Quantum Computation

orClifford-Gates

1 |a|2+|b|2=1

Quantum Computing in the nutshellTheoretically, Quantum Computers can solve certain important problems much faster than conventional (classical) computers: Shors Integer Factorization Algorithm (Exponential speedup)

Practically, quantum device component (qubits, gates) are very noisy and unreliable than classical computers

Need Error-Correction (Redundancy) to protect quantum InformationMean Time to Failure:Classical:~ 107 108 hoursQuantum:~Seconds MinutesFailure Prob.p = 10-31 in 1,000 Quantum Gate fails

Example: Fault Tolerant 3-qubit (Toffoli) Gate

Error CorrectionEncodingEncodingEncodingError CorrectionError Correction

4-cat4-cat dec.4-cat4-cat dec.

4-cat4-cat dec.

4-cat4-cat dec.4-cat4-cat dec.4-cat4-cat dec-1-1-1-1-1-1-1-1-1-1-1-1RecoveryUnprotected Quantum GateFault Tolerant Quantum Gate

LOGICAL QUBITANCILLA QUBITSANCILLA QUBITS

Parity ChecksSpecialEntangled Qubit State

Large Number of Additional Qubits, Gates to reduce effective noise level from O(p) -> O(p2)

e.g. Steane [[7,1,3]] code- -- -- -- -- -- -- -

Multiple Layers of Encoding in [[7,1,3]] codeNo EncodingSingle-Layer (L1) EncodingTwo-Layers (L2) EncodingQubitsClifford-Gates{H,X,Z,CNOT}Noise LevelFailure Prob. (p)1772 = 49p [e.g. p = 10-7]O(p2)O(p4) [e.g. 10-16]1772 = 49Non-Clifford Gates{e.g. Toffoli}1O(103)O(105) [e.g. 10-10]

Good News: Gain in Reliability > Qubit, Gate Overhead 0O(101)O(102)Ancilla Qubits

102-103x

106x

Fundamental Research Question

To Estimate

How Reliable? And How Many?

Qubits, Gates are needed to accomplish what classical computer cannot in Realistic Time Scale(e.g., 2,048-bit factorization)

Answer depends on Compilation of Quantum Application into Fault-tolerant gates (e.g. Gate Decomposition Methods)Fault-Tolerance Overhead (e.g. Error-Correcting Codes)Integrate Qubits, Gates onHardware (e.g. Trapped-Ion)Quantum Application (e.g. Shors Algorithm)

Research Progress (Theory)Research Progress (Experiment)Precise Estimate Needs Information about the Quantum Hardware Loose Lower Bounds on the Estimate

Fundamental Research Question

How Reliable? And How Many?

Qubits, Gates are needed to accomplish what classical computer cannot in Realistic Time Scale (e.g., 2,048-bit factorization)Answer Heavily Depends on the Architecture of Quantum Hardware

Impact of Hardware Assumptions on the Speed of Quantum Computer

(Included with permission of rdv, TDL, KMI, quant-ph/0507023)Days -> YearsHours -> DaysClassical QuantumArchitectureAssumptionMatters!1,024-bit Factorization

Question

Why think about architecture for large quantum computer ??

WHEN We do not know exactly how to build a small quantum computer

AnswerArchitecture canCompensate Technology Limitations (Memory hierarchy, Multi-Core Designs)Reveal performance-limiting factorsGuide future advances in technology

Example from the History

SlowerUnreliable ComputersDiscreteTransistors

IntegratedCircuits (IC)

Fast and ReliableComputers

MOORES LAW

Research MethodologyNeed to Define Mechanism in which very large number of Qubits are Allocated FunctionedProtectedConnectedin a realistically constructible quantum computer system

Need a method to efficiently Model Quantum Computer ArchitectureMap Quantum Application on the ArchitectureEvaluate the Performance Limiting Factors

Quantum Computer Architecture

Performance SimulationCrucial Components of my Research

Communication Channel

Tool: Taxonomy of Important TermsDevice Parameters (DPs)e.g. physical gate times, failure probability

Resource Investmente.g. Total physical qubits used in the system

Architecture ParametersFunctional Allocation (Data, Ancilla) and Connectivity of qubitsPerformance Metricse.g. Total Execution Time (TEXEC), Failure Probability (PFAIL) prob. That quantum circuit gives incorrect output

Design Space

Research Methodology

Quantum Circuit (Quantum Adder)Quantum Hardware (e.g., Trapped-ion)Quantum Architecture (MUSIQC)Mapping (Qubits -> physical resources)

Scheduling (Gates -> Sequence of physical operations)

Performance Analysis (Latency, Reliability)

Papers/PublicationsPerformance simulator based on Hardware Resources Constraints for Ion- Trap Quantum Computer (ICCD 2013)

Optimization of a Quantum Computer Architecture Using Resource Performance Simulator (DATE 2015)

Designing Million-Qubit Quantum Computer Using Resource Performance Simulator [In Submission (2nd Attempt)]

Challenge: Target Community Mostly Unfamiliar with Quantum Computing

Philosophy of Performance Simulation Toolset(1)Device ParametersResource OverheadPerformance EvaluationArchitecture Parameters

Mapping, SchedulingQuantum CircuitFault ToleranceResource AllocationIn HardwareLower levelHardware constraints

QuantumCircuitPrior Work:

Svore et al. (2004)Balensiefer et al. (2005)Whitney et al. (2007)Dousti and Pedram (2012)F I X E DLESS FLEXIBLELESS FLEXIBLEINSUFFICIENT I N S I G H T

Philosophy of Performance Simulation Toolset (2)Device ParametersResource OverheadPerformance EvaluationArchitecture Parameters

Mapping, SchedulingQuantum CircuitFault ToleranceResource AllocationIn HardwareLower levelHardware constraints

QuantumCircuitMy Contribution

Fine Tuning KnobMagnifyingGlass

My Contribution

Philosophy of Performance Simulation Toolset (2)Device ParametersResource OverheadPerformance EvaluationArchitecture Parameters

Mapping, SchedulingQuantum CircuitFault ToleranceResource AllocationIn HardwareLower levelHardware constraints

QuantumCircuitMy Contribution

KnobMagnifyingGlass

DesiredPerformance?

NoNoNo

Qubits how Reliable? Qubits How Many?

What have Learned So FarQuestion: Quantum Computer Practically Faster Classical Computer?What is the quality and the amount of resources needed?

Answer: Depends upon Quantum Computer Architecture study Performance Simulation Tool for Architecture Study

Quantum Computer Design CycleFlexible Tool to balance improvement in Device Parameters and investment in Resource , Architecture

AgendaBackground and Motivation

Quantum Hardware and Architecture Models

Benchmark Application Circuits

Performance Simulation Tool

Results

Quantum Gate and Hardware Model

Laser (gate)

Ions (qubits)ElectrodesOptical SwitchPhotonDetectorsPhotonsBallistic Shuttling Channel

U

M{+1, -1}Entangled Pair(EPR pair) Quantum GatesQuantum bits (Qubits)

Video credit: Jason Amini

Beam Splitter

Fault-tolerant and Scalable Quantum Computer Architecture Planar ion traps. . . Layer-1 Optical Switch (OS)

. . .. . .. . .. . .

. . .. . .. . .. . .

. . .. . .. . .. . .

Segment

photonic Links

Basic Ion-trapCell

Layer-2 Optical Switch (2x time expensive than Layer-1 OS)Empty ChannelsFor Ballistic Shuttling

Different Qubit BlocksCommunication PortArchitecture Idea:

Combines the good of

IONS:ReliableStorage and Computation

PHOTONS:Communication

Hardware Description and Device ParametersU

MIDevice Parameters (DPs)

2L x 5000s. . .. . .5000s @ L=0

10,000s @ L=120,000s @ L=2

Speed and Reliability of Computation > Speed and Reliability of Communication

AgendaBackground and Motivation

Quantum Hardware and Architecture Models

Benchmark Application Circuits

Performance Simulation Tool

Results

Shors Integer Factorization Algorithm Circuit Controlled Modular Exponentiation: U(x) = ax mod NContains O(n2) Adder calls: 512-bit ~ 1 Million Adders 1024-bit ~4 Million Adders 2048-bit ~16 Million Adderm-qubit Registerm = 2n...

. . . . .. . . . .. . . . .. . . . .. . . . .

Inverse Quantum Fourier Transform...MZMZMZMZMZ

n-qubit register

Contains:O(n2) Small angle phase Rz(/2n+1) =Depth = O(n)For n-bit integer NGCD (a, N) = 1, a < N

N = (ar/2-1) (ar/2+1)

Period r is hidden inEigenvalues of U(x) = ax mod N

Classical Complexity: Exponential in n

Quantum Complexity:Polynomial O(n3)

Bulk of Shors Algorithm.

Benchmark Circuits (Approx. QFT)

HTHTZ. . .

. . .T

TXT

DecodeMZ7 cat|0>LT|+>Magic State Preparation (latency: 78 ms)

SXMT|+>|Data>T|Data>Data Teleportation into Magic State (latency: 12 ms)

|Data>|Data>A1A2A3

TimeExec. DelayV. Kliuchnikov et al. (2013)Fowler et al. (2005)Rz(/28) sufficient to factorize Integers > 4096-bit long

7 Ancilla give Delay -> 0 ~375 gates (~150 T gates) forApprox. Accuracy within 10-16Long Sequence of Approximation Gates

Benchmark Circuits (Quantum Adders)

Quantum Carry-Look Ahead Adder(QCLA) log-depthDraper et al. (2004)Quantum Carry-Look Ahead Adder(QCLA) linear-depthCuccaro et al. (2004)

Magic State PreparationData Injection into Magic StateNeed about 4 Ancilla blocksManyLong-Distance GatesOnlyNearest-NeighborGatesToffoli

Non-Local Toffoli Gatee1e2

Local Quantum operations

Local Quantum operationsSegment-1Segment-2QuantumTeleportation

CommunicationMagic-State Prep.Error Correction

DATA Qubits Ancilla QubitsCommunication (COMM) Qubits

Fault-tolerant and Scalable Quantum Computer Architecture Planar ion traps. . . Layer-1 Optical Switch (OS)

COMM. Qubit Block

. . .. . .. . .. . .

. . .. . .. . .. . .

. . .. . .. . .. . .

Segment

Ancilla Qubit BlockData Qubit BlockOptical Links

Basic Ion-trapCell

Layer-2 Optical Switch (2x time expensive than Layer-1 OS)Empty ChannelsFor Ballistic ShuttlingArchitecture ParametersList

NSeg: Number of Segments

NData: Data Tiles/Segment

NComm: Comm. Tiles /Segment

NAnc: Total Ancilla Tiles

Comm. TileAncilla TileData Tile

AgendaBackground and Motivation

Quantum Hardware and Architecture Models

Benchmark Application Circuits

Performance Simulation Tool

Results

Toolbox Description(1)

Tile DesignerTile DatabaseLow Level SchedulerLow Level MapperLow Level Error AnalyzerFault Tolerant Circuit GeneratorHigh Level SchedulerHigh Level MapperHigh Level Error AnalyzerQuantum Application Circuit GeneratorInput: Architecture ParametersOutput: Failure Probability Probability, Latency, Resource count

Tile Designer and Performance Analyzer (TDPA)

VisualizerPerformance Metrics DecomposerInput: DPsApplication-Level Designer and Performance Analyzer (ADPA) FLEXIBILITY

Critical Tool Components

Do not Assume Fixed Device, Architecture Parameter ValuesDEEPER INSIGHTS

AdvancedOutput Analysis Components

Toolbox Description (2)

ErrorCorrection Circuit

Quantum Carry Look-Ahead AdderMUSIQC ArchitectureTDPAADPA

Low-Level Mapper: Physical Qubits to Ions Low-Level Scheduler: Physical Gates to Ion Manipulation, MovementLow-Level Error Analyzer: Calculates Logical Failure Prob. from Physical Fail Prob. High-Level Mapper: Logical Qubits to Tiles High-Level Scheduler: Logical Gates to Tile operations and Cross-Segment Movement High-Level Error Analyzer: Calculates Application level Failure Prob.

Tile

Tile Performance (TEXEC, PFAIL)

Tool and Analysis Method

q1q2q3q4q5q6q7q8q9q10

q1,q2q3q4,q5q6,q7q8, q9q10

q1,q2q3q4,q5q6,q7q8, q9q10

T1T2T3T4Quantum CircuitDevice Parameters

Architecture DefinitionPERFORMANCEMETRICSDECOMPOSER/VISUALIZERHIGH-LEVEL SCHEDULERHIGH-LEVEL MAPPERPre-Computed Tile Performance (TDPA)InputInput

Main Algorithms/HeuristicsMapper:Goal: Circuit-Level Connectivity Hardware Level Proximity Algorithm: Polynomial Time Graph-Theory algo. solving Optimal Linear Arrangement Problem

Scheduler: Goal: Reduce Execution Time, Insert Error Correction to Minimize Failure Prob. Algorithm: [Greedy] Dispatch gate for execution AS SOON AS resources become available

Error Analyzer:Goal: To evaluate PFAIL by fully counting logical error eventsAlgorithm: O(n3) Fault-path Counting

Details in the Thesis

Validity/Optimality of Scheduler

Correctness: Detailed analysis the complete schedule (Smaller Circuits )Overall validation by using Visualizer Output (Larger Circuits)

Optimality Comparing the TEXEC of circuit with and without resource constraintsX-fold Resource should yield X-fold (or more) in TEXEC TEXEC with constraint approach TEXEC with no constraints

64-bit Quantum Carry Look-Ahead Adder Circuit

FewerResourceRegimeSufficientResourceRegimeShow that Mapper, Scheduler good atWorking with Fewer ResourcesAchieve optimal Performance with Sufficient Resource

Demo

Breakdown

of

TEXEC

A2B2C2

S1S2

Magic State Preparation OverheadToffoli Gate ExecutionCNOT Gate ExecutionCross-Segment Swapping Overhead

Time

Breakdown of Critical Path

Segment S2Magic State Prep

A1B1C1

Segment S1Magic State PrepData Tel. into Magic StateData Tel. into Magic StateMagic State PrepMagic State PrepB1B2C2A1A2C1Data Tel. into Magic State

A1B1C1A2B2C2Circuit To be Scheduled

Requires Cross-SegmentSwapping

Execution Time (seconds)Segment ID NumberAdder Circuit (QCLA)Time StepsScheduled Adder Circuit (QCLA)Demo VisualizationQ u b I t s Horizontal Lines:Delays due to fewerAncilla qubitsNon-horizontal Lines:Delays due to fewerComm. qubits

What Have Learned So Far Benchmark CircuitsApproximate Quantum Fourier Transform (AQFT: Long sequence of Approx. gates)Quantum Ripple Carry Adder (QRCA: nearest-neighbor gates)Quantum Carry Look-Ahead Adder (QCLA: Highly Parallelizable circuit, long-distance gates)

Performance-Simulation Tool Mapper, Scheduler, Error Analyzer (Standard components)Can work with varying device and architecture parametersAdvanced components:VisualizerPerformance metrics DecomposerValidation and optimality of Mapper, Scheduler

Remember this picturePlanar ion traps. . . Layer-1 Optical Switch (OS)

COMM Tile

. . .. . .. . .. . .

. . .. . .. . .. . .

. . .. . .. . .. . .

Segment

ANCILLA TileDATA TileOptical Links

Basic Ion-trapCell

Layer-2 Optical Switch (2x time expensive than Layer-1 OS)Empty ChannelsFor Ballistic ShuttlingArchitecture ParametersList

NSeg: Number of Segments

NData: Data Tiles/Segment

NComm: Comm. Tiles /Segment

NAnc: Total Ancilla Tiles

AgendaBackground and Motivation

Quantum Hardware and Architecture Models

Benchmark Application Circuits

Performance Simulation Tool

Results

First Set of Simulations (Setup)Goal: Study Performance Limiting Arch. Device Parameters Benchmark Circuit1024-bit QCLA

Error Correction (Steane [[7,1,3]] code)Two layers of concatenation (L1, L2) L2 Error-correction after each gateQubit sitting idle for long enough timePerform L2 Error Correction only(No L1 Error-Correction), Qubits can decohere for ~ 4.8ms

ComputationCNOT (Local, Non-Local)TOFFOLI (Local) Can execute Toffoli in any Segment

Precomputed Tile performance numbers

L2 Toffoli : ~50 ms, O(10-14)L2 cross-Seg Teleportation: ~10ms, O(10-11)

Labelling the Architecture SpaceTEXEC as function of Architecture Parameters

Large Segments, TEXEC depends on Ancilla(Less Distributed System)ANCILLA- REGIMESmall Segments, TEXEC depends on Comm. Qubits(Highly Distributed System)TEL- REGIME

Optimal Architecture Selection (Minimizing TEXEC - Qubits product)

Ancilla Regime Architecture

Tel-Regime ArchitectureMinimum Exec. Time-Qubit Product for (NSeg = 4, NAnc = 1362, NComm = 1) Minimum Exec. Time-Qubit Product for (NSeg = 16, NAnc = 1362 NComm = 6)

Reducing PFAIL using Device Parameters (DP)

Improving DP: Qubit Memory 10 x Failure Prob. Reduction ~ 100 x 1000 x 4 million Adder calls need Failure Prob.


Recommended