Presenter: Monia Ghobadi
Co-Authors: Ratul
Mahajan, Amar Phanishayee, Nikhil Devanur, Janardhan Kulkarni, Gireeja Ranade,
Pierre-Alexandre Branche, Houman Rastegarfar, Madeleine Glick
Currently, datacenter networks have electrical fibers
that connect TOR switches to others. These links, as a result, have static
capacity and the only way to change the connection is to send someone
physically to the field to change it. However, according
to the authors, many rack switches are either under-utilized or over-utilized.
Many
of the racks don't exchange much traffic, but some of them generate a lot of
traffic. Hence, there is a need for reconfigurable interconnects that can
adjust capacity dynamically.
Desirable
properties of such interconnects include
a way of augmenting the standard capacity by maintaining separate static and
reconfigurable portions. There is also a need for high fan-out or a huge
number of direct links to other racks so that high traffic can be sent along
these links. This should also involve low switching time to send the traffic
fast. The authors argue that ProjecToR accounts for all of these.
The key insight in ProjecToR is to remove all the cables and use light instead. The medium of transmission is free
space and the device used is a digital
micro-mirror (DMD) to direct the light and a magnifier to
adjust reach. By changing the bit pattern uploaded on the DMD, light can be
redirected elsewhere. The number of accessible locations where the light is redirected to is
proportional to the total number of micromirrors, but some accessible locations
need to be skipped to eliminate interference resulting in about 18 K accessible locations or fan-out (all
of these are within +- 3 degrees though). To address the last point or the narrow angular reach, ProjecToR uses angled mirrors to further the reach and then
design the mirror assembly according to
the datacenter requirements (like using a disco ball structure).
How feasible in a ProjecToR interconnect?
The authors build a small
prototype and micro-benchmarked it. The prototype contained 3 ToR switches that used ProjecToR with a source
laser to send light, FPGA to program the DMD and mirrors to reflect light
towards receiving TORs. The authors ran evaluations on this for over a day in 10 second
intervals. The throughput of this setup matches that of a wired link. The switching time, measured by changing destination to ToR 3 and measuring light intensity at both
places to know when switching is completed, was about 12 microseconds.
The topology can be
configured using connections between lasers and photodetectors. But the
switching time is still larger than packet transmission time. So, the system
uses a dedicated default topology and then uses opportunistic topology on the
fly. The dedicated topology uses K-shortest paths routing and a virtual queue and is used for smaller flows. An opportunistic link can be created on the
fly and is used instantly for elephant flows. Looking for active
links for the opportunistic topology is similar to current switch scheduling
problems. The system uses the Glae-Shapely algorithm and is very close to an
optimal offline scheduling algorithm.
Cost is estimated
based on cost breakdown of the individual components.
Simulations were run
on 128 Tor with 16 lasers and photodetector and day long traffic. ProjecToR average
flow completion time increases very slowly because of re-configurability even on a
skewed traffic matrix. Firefly and Fat Tree are upto 95 % worse, the former
because of low switching time and low fanout (multi-hop needed) while Fat Tree
has no re-configurability.
Q: Free-space optics
are very vulnerable to vibrations and outdoor variations which is an issue in
datacenters. Have you had any experience with this?
A: Firefly has shown
that it can be tolerable within 5mm, we use an optical bench that is more
tolerant. If you miss something due to a photodetector, you might able to
redirect it and capture it.
Q: Have you
discussed this with computational geometry people? They might be able to provide some insights.
A: I have done simulations with typical data center topologies and came up
with the disco ball, but haven't looked into reconfiguring the data center
itself
Q: Two practical
points from optical communication perspective. Cross talk is one and
link-recovery in the optical receiver is the other (transceiver also has a link
detection phase and would be available only after milliseconds).
A: We eliminate
adjacent angles to remove crosstalk (by skipping every 4 degrees).
Prior work has
showed that you might be able to overcome the latency in link-detection to
nanoseconds.
Q: I am trying to compare
free-space space to switch-spectrum space (you could get single
hop/reconfigurability there too). Not sure why free-space is more desirable.
A: One reason is
scalability since building a large switch in closed form is complicated.
No comments:
Post a Comment