+ All Categories
Home > Documents > Self-driving Reconfigurable Silicon Photonic Interconnects ... · (ToR) switches interconnected...

Self-driving Reconfigurable Silicon Photonic Interconnects ... · (ToR) switches interconnected...

Date post: 20-Mar-2020
Category:
Upload: others
View: 22 times
Download: 0 times
Share this document with a friend
3
Self-driving Reconfigurable Silicon Photonic Interconnects (Flex-LIONS) with Deep Reinforcement Learning Roberto Proietti, Yu Shang, Xian Xiao, Xiaoliang Chen, Yu Zhang, and S. J. Ben Yoo Electrical and Computer Engineering, University of California, Davis, CA, USA, [email protected] ABSTRACT We propose a self-driving reconfigurable optical interconnect architecture for HPC systems exploiting a deep reinforcement learning (DRL) algorithm and a reconfigurable silicon photonic (SiPh) switching fabric to adapt the interconnect topology to different traffic demands. Preliminary simulation results show that after training, the DRL-based SiPh fabric provides the lowest average end-to-end latency for time-varying traffic patterns. CCS CONCEPTS • Networks~Control path algorithms • Networks~Topology analysis and generation Hardware~Photonic and optical interconnect • Hardware~Emerging optical and photonic technologies KEYWORDS Arrayed waveguide grating, deep reinforcement learning, silicon photonics 1 Introduction Current high-performance computing (HPC) systems are increasingly exploiting heterogeneous computing nodes to improve performance in terms of latency and energy utilization for completing specific computation tasks [1, 2]. While the communication patterns driven by modern workloads exhibit temporal bursts and spatial non-uniformity [3, 4], today’s interconnection networks based on electronic switches and optical fibers are inherently rigid, incapable of changing the network topology or link bandwidth to adequately cope with the significant variations of traffic patterns. It would then be desirable to design a bandwidth-reconfigurable interconnection network that can adapt its connectivity to the various traffic demands [5-7]. There have been recent advances in silicon photonic (SiPh) integrated reconfigurable wavelength routing and space switching that allows to redefine the connectivity in both spectral and spatial domains on demand. Indeed, wavelength-and-space selective switching fabrics that can reconfigure the bandwidth between selected pair of input and output ports have been demonstrated [8, 9]. Recently, we proposed and demonstrated a SiPh bandwidth- reconfigurable all-to-all interconnection switch, ‘Flexible Low- Latency Interconnect Optical Network Switch (Flex-LIONS),’ enabled by combination of all-to-all interconnection using an arrayed waveguide grating router (AWGR) and multi-wavelength selective switches [10]. While Flex-LIONS has superior performance in terms of scalability and energy consumption when compared with other proposed architectures (see [10] for more details), specific reconfiguration policies and algorithms at the network and application layers to take advantage of such physical- layer reconfiguration capability are still needed. The aim of this paper is to propose the use of DRL technique to drive reconfiguration of the SiPh switches according to the traffic characteristics. 2 System Architecture and Algorithm Figure 1. DRL-based Flex-LIONS architecture interconnecting multiple ToR switches. Figure 1 shows the architecture of the DRL-based self-driving reconfigurable Flex-LIONS system. Flex-LIONS (see Figure 2 and the text below for more details) exploits a combination of wavelength routing and multi-wavelength switching to form different interconnect topologies as all-to-all, mesh, torus, etc. As we know, traffic loads in HPC systems are dominated by specific applications which exhibits specific patterns. Hence, we developed a centralized controller for resource management and control with a DRL agent aiming at learning the optimal possible network interconnections for different traffic patterns. We designed the DRL agent based on an Advantage Actor-Critic algorithm [11]
Transcript

Self-driving Reconfigurable Silicon Photonic Interconnects (Flex-LIONS) with Deep Reinforcement Learning

Roberto Proietti, Yu Shang, Xian Xiao, Xiaoliang Chen, Yu Zhang, and S. J. Ben Yoo Electrical and Computer Engineering, University of California, Davis, CA, USA, [email protected]

ABSTRACT We propose a self-driving reconfigurable optical interconnect

architecture for HPC systems exploiting a deep reinforcement learning (DRL) algorithm and a reconfigurable silicon photonic (SiPh) switching fabric to adapt the interconnect topology to different traffic demands. Preliminary simulation results show that after training, the DRL-based SiPh fabric provides the lowest average end-to-end latency for time-varying traffic patterns.

CCS CONCEPTS • Networks~Control path algorithms • Networks~Topology

analysis and generation • Hardware~Photonic and opticalinterconnect • Hardware~Emerging optical and photonictechnologies

KEYWORDS Arrayed waveguide grating, deep reinforcement learning, silicon

photonics

1 Introduction Current high-performance computing (HPC) systems are

increasingly exploiting heterogeneous computing nodes to improve performance in terms of latency and energy utilization for completing specific computation tasks [1, 2]. While the communication patterns driven by modern workloads exhibit temporal bursts and spatial non-uniformity [3, 4], today’s interconnection networks based on electronic switches and optical fibers are inherently rigid, incapable of changing the network topology or link bandwidth to adequately cope with the significant variations of traffic patterns. It would then be desirable to design a bandwidth-reconfigurable interconnection network that can adapt its connectivity to the various traffic demands [5-7].

There have been recent advances in silicon photonic (SiPh) integrated reconfigurable wavelength routing and space switching that allows to redefine the connectivity in both spectral and spatial domains on demand. Indeed, wavelength-and-space selective switching fabrics that can reconfigure the bandwidth between selected pair of input and output ports have been demonstrated [8, 9]. Recently, we proposed and demonstrated a SiPh bandwidth-reconfigurable all-to-all interconnection switch, ‘Flexible Low-Latency Interconnect Optical Network Switch (Flex-LIONS),’ enabled by combination of all-to-all interconnection using an arrayed waveguide grating router (AWGR) and multi-wavelength selective switches [10]. While Flex-LIONS has superior performance in terms of scalability and energy consumption when

compared with other proposed architectures (see [10] for more details), specific reconfiguration policies and algorithms at the network and application layers to take advantage of such physical-layer reconfiguration capability are still needed. The aim of this paper is to propose the use of DRL technique to drive reconfiguration of the SiPh switches according to the traffic characteristics.

2 System Architecture and Algorithm

Figure 1. DRL-based Flex-LIONS architecture interconnecting multiple ToR switches.

Figure 1 shows the architecture of the DRL-based self-driving reconfigurable Flex-LIONS system. Flex-LIONS (see Figure 2 and the text below for more details) exploits a combination of wavelength routing and multi-wavelength switching to form different interconnect topologies as all-to-all, mesh, torus, etc.

As we know, traffic loads in HPC systems are dominated by specific applications which exhibits specific patterns. Hence, we developed a centralized controller for resource management and control with a DRL agent aiming at learning the optimal possible network interconnections for different traffic patterns. We designed the DRL agent based on an Advantage Actor-Critic algorithm [11]

SC19, November, 2019, Denver, Colorado, USA

that parameterizes the reconfiguration policy with deep neural networks (DNNs). The DRL agent learns policies with a reward-

driven mechanism. We define an instant reward as 𝑟 𝜃𝐶 ,

where 𝑑 represents the average network end-to-end delay, and 𝐶 is the cost of reconfiguration. At each training time, the agent collects samples from the experience buffer, and trains the DNNs by reinforcing actions leading to higher long-term rewards. In this way, we implement the adapting reconfiguration of Flex-LIONS. The actions correspond to different interconnection topologies that can be implemented by reconfiguring Flex-LIONS architecture (see Figure 1).

Figure 2. (Top) Flex-LIONS (N=4, b=3) architecture with AWGR, MRR add-drop filters and multi-wavelength MRR crossbar switch. (Bottom) Microscope image of fabricated eight-port SiPh Flex-LIONS (N=8, b=3) and transmission spectra of 8×8 AWGR from input port 4.

Figure 2 illustrates the working principle of Flex-LIONS. The SiPh Flex-LIONS has an N-port AWGR and b microring resonator (MRR) add-drop filters at each AWGR input/output port. For uniform traffic, all MRR add-drop filters can be set off-resonance so that each input port provides N wavelength division multiplexing (WDM) signals to interconnect with all the N output ports according to the all-to-all wavelength routing property of the AWGR [12]. For different traffic patterns, the MRR filters can be tuned in resonance to select specific wavelengths channels to be switched by the multi-wavelength switch (for the SiPh chip shown in Figure 2 the multi-wavelength switch is implemented as an MRR crossbar [10]), practically creating a different topology as well as increasing by a factor of b the bandwidth between the port pairs connected through the multi-wavelength switch.

3 Results We used OMNeT++ and TensorFlow to simulate the DRL-

based reconfigurable architecture. We assumed 32 Top-of-Rack (ToR) switches interconnected with one 32-port Flex-LIONS. We considered four possible topologies the DRL algorithm

R. Proietti et al.

can choose from. We utilized a time-varying traffic consisting of four traffic patterns: adversarial, neighbor exchange, and all-to-all for inter and intra-groups (a group is composed of four racks). The four patterns appear periodically. For training process, the four changing traffic patterns and the four network topologies are all set as part of the DNNs’ input features. The DNN models consisted of two convolutional layers and five fully connected layers, and each layer contains 128 neurons.

Figure 3. (Top) Reward of DRL scheme for different learning rates. (Bottom) Average End-to-end delay for different injection rates for fixed topologies and DRL-based scheme under time varying traffic.

Figure 3 (Top) shows how the reward value converges via training, which means the DRL agent works efficiently to maintain the lowest network end-to-end delay. In addition, convergences act differently according to different learning rate. We compared our DRL-based reconfigurable architecture to different fixed networks in terms of average end-to-end delay [see Figure 3 (bottom)]. The proposed DRL-based reconfigurable architecture always achieves the lowest average network latency among all packet injection rates.

The current preliminary results only leverage Flex-LIONS capability to re-arrange the interconnect topology among four possible configurations and for a limited set of data traffic patterns. Further studies will aim at assessing the effectiveness of the proposed approach when using a finer reconfiguration granularity under a larger set of traffic pattern scenarios, including also the cost of reconfiguration in terms of packet loss as well as leveraging the capability of Flex-LIONS to enhance the bandwidth of the reconfigured links.

ACKNOWLEDGMENTS This work was supported in part by DoD contract H98230-16-C-0820 and NSF grant 1611560.

Self-driving Reconfigurable Silicon Photonic Interconnects (Flex-LIONS) with Deep Reinforcement Learning

SC19, November, 2019, Denver, Colorado, USA

REFERENCES [1] Mittal, S., Jeffrey, S. V.: 'A survey of CPU-GPU heterogeneous computing

techniques', ACM Computing Surveys (CSUR) 47.4 (2015): 69. [2] Schulte, M. J., Ignatowski, M., Gabriel, H. L., et al.: 'Achieving exascale

capabilities through heterogeneous computing', IEEE Micro 35.4 (2015): 26-36 [3] Roy, A., Zeng, H., Bagga, J., et al.: 'Inside the social network's (datacenter)

network', ACM SIGCOMM Computer Communication Review. Vol. 45. No. 4. ACM, 2015.

[4] Zhang, Q., Liu, V., Zeng, H., et al.: 'High-resolution measurement of data center microbursts'. Proceedings of the 2017 Internet Measurement Conference. ACM, 2017

[5] Cao, Z., Proietti, R., Clements, M., et al.: 'Experimental demonstration of flexible bandwidth optical data center core network with all-to-all interconnectivity', Journal of Lightwave Technology 33.8 (2015): 1578-1585

[6] Proietti, R., Liu, G., Xiao, X., et al.: 'FlexLION: A Reconfigurable All-to-All Optical Interconnect Fabric with Bandwidth Steering'. 2019 Conference on Lasers and Electro-Optics (CLEO). IEEE, 2019

[7] S. Salman, C. Streiffer, H. Chen, T. Benson, and A. Kadav, “DeepConf: Automating data center network topologies management with machine learning,” in Proc. of NetAI, (2018), pp. 8–14.

[8] Seok, T. J., Luo, J., Huang, Z., et al.: 'MEMS-Actuated 8× 8 Silicon Photonic Wavelength-Selective Switches with 8 Wavelength Channels'. 2018 Conference on Lasers and Electro-Optics (CLEO). IEEE, 2018.

[9] Khope, A. S. P., Saeidi, M., Yu, R., et al.: 'Multi-wavelength selective crossbar switch', Optics Express 27.4 (2019): 5203-5216

[10] Xiao, X., Proietti, R., Liu, G., Lu, H., Zhang, Y., Yoo, S.J.B., "Experimental Demonstration of SiPh Flex-LIONS for Bandwidth-Reconfigurable Optical Interconnects", ECOC, 2019

[11] Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch, “Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments, ” arXiv.org > cs > arXiv:1706.02275

[12] Proietti, R., Cao, Z., Nitta, C. J., et al.: 'A scalable, low-latency, high-throughput, optical interconnect architecture based on arrayed waveguide grating routers', Journal of Lightwave Technology 33.4 (2015): 911-920


Recommended