We’re Building the Future of Data Infrastructure

Posts Tagged 'AI'

  • June 18, 2024

    Custom Compute in the AI Era

    This article is the final installment in a series of talks delivered Accelerated Infrastructure for the AI Era, a one-day symposium held by Marvell in April 2024. 

    AI demands are pushing the limits of semiconductor technology, and hyperscale operators are at the forefront of adoption—they develop and deploy leading-edge technology that increases compute capacity. These large operators seek to optimize performance while simultaneously lowering total cost of ownership (TCO). With billions of dollars on the line, many have turned to custom silicon to meet their TCO and compute performance objectives.

    But building a custom compute solution is no small matter. Doing so requires a large IP portfolio, significant R&D scale and decades of experience to create the mix of ingredients that make up custom AI silicon. Today, Marvell is partnering with hyperscale operators to deliver custom compute silicon that’s enabling their AI growth trajectories.

    Why are hyperscale operators turning to custom compute?

    Hyperscale operators have always been focused on maximizing both performance and efficiency, but new demands from AI applications have amplified the pressure. According to Raghib Hussain, president of products and technologies at Marvell, “Every hyperscaler is focused on optimizing every aspect of their platform because the order of magnitude of impact is much, much higher than before. They are not only achieving the highest performance, but also saving billions of dollars.”

    With multiple business models in the cloud, including internal apps, infrastructure-as-a-service (IaaS), and software-as-a-service (SaaS)—the latter of which is the fastest-growing market thanks to generative AI—hyperscale operators are constantly seeking ways to improve their total cost of ownership. Custom compute allows them to do just that. Operators are first adopting custom compute platforms for their mass-scale internal applications, such as search and their own SaaS applications. Next up for greater custom adoption will be third-party SaaS and IaaS, where the operator offers their own custom compute as an alternative to merchant options.

    Progression of custom silicon adoption in hyperscale data centers.

    Progression of custom silicon adoption in hyperscale data centers.

  • June 12, 2024

    How AI Will Change the Building Blocks of Semis

    By Michael Kanellos, Head of Influencer Relations, Marvell

    Aaron Thean, points to a slide featuring the downtown skylines of New York, Singapore and San Francisco along with a prototype of a 3D processor and asks, “Which one of these things is not like the other?”

    The answer? While most gravitate to the processor, San Francisco is a better answer. With a population well under 1 million, the city’s internal transportation and communications systems don’t come close to the level of complexity, performance and synchronization required by the other three.

    With future chips, “we’re talking about trillions of transistors on multiple substrates,” said Thean, the deputy president of the National University of Singapore and the director of SHINE, an initiative to expand Singapore’s role in the development of chipets, during a one-day summit sponsored by Marvell and the university.

  • June 11, 2024

    How AI Will Drive Cloud Switch Innovation

    This article is part five in a series on talks delivered at Accelerated Infrastructure for the AI Era, a one-day symposium held by Marvell in April 2024. 

    AI has fundamentally changed the network switching landscape. AI requirements are driving foundational shifts in the industry roadmap, expanding the use cases for cloud switching semiconductors and creating opportunities to redefine the terrain.

    Here’s how AI will drive cloud switching innovation.

    A changing network requires a change in scale

    In a modern cloud data center, the compute servers are connected to themselves and the internet through a network of high-bandwidth switches. The approach is like that of the internet itself, allowing operators to build a network of any size while mixing and matching products from various vendors to create a network architecture specific to their needs.

    Such a high-bandwidth switching network is critical for AI applications, and a higher-performing network can lead to a more profitable deployment.

    However, expanding and extending the general-purpose cloud network to AI isn’t quite as simple as just adding more building blocks. In the world of general-purpose computing, a single workload or more can fit on a single server CPU. In contrast, AI’s large datasets don’t fit on a single processor, whether it’s a CPU, GPU or other accelerated compute device (XPU), making it necessary to distribute the workload across multiple processors. These accelerated processors must function as a single computing element. 

    AI calls for enhanced cloud switch architecture

    AI requires accelerated infrastructure to split workloads across many processors.

  • June 06, 2024

    Silicon Photonics Comes of Age

    This article is part four in a series on talks delivered at Accelerated Infrastructure for the AI Era, a one-day symposium held by Marvell in April 2024. 

    Silicon photonics—the technology of manufacturing the hundreds of components required for optical communications with CMOS processes—has been employed to produce coherent optical modules for metro and long-distance communications for years. The increasing bandwidth demands brought on by AI are now opening the door for silicon photonics to come inside data centers to enhance their economics and capabilities.  

    What’s inside an optical module?

    As the previous posts in this series noted, critical semiconductors like digital signal processors (DSPs), transimpedance amplifiers (TIAs) and drivers for producing optical modules have steadily improved in terms of performance and efficiency with each new generation of chips thanks to Moore’s Law and other factors.

    The same is not true for optics. Modulators, multiplexers, lenses, waveguides and other devices for managing light impulses have historically been delivered as discrete components.

    “Optics pretty much uses piece parts,” said Loi Nguyen, executive vice president and general manager of cloud optics at Marvell. “It is very hard to scale.”

    Lasers have been particularly challenging with module developers forced to choose between a wide variety of technologies. Electro-absorption-modulated (EML) lasers are currently the only commercially viable option capable of meeting the 200G per second speed necessary to support AI models. Often used for longer links, EML is the laser of choice for 1.6T optical modules. Not only is fab capacity for EML lasers constrained, but they are also incredibly expensive. Together, these factors make it difficult to scale at the rate needed for AI.

  • June 02, 2024

    A Deep Dive into the Copper and Optical Interconnects Weaving AI Clusters Together

    This article is part three in a series on talks delivered at Accelerated Infrastructure for the AI Era, a one-day symposium held by Marvell in April 2024.

    Twenty-five years ago, network bandwidth ran at 100 Mbps, and it was aspirational to think about moving to 1 Gbps over optical. Today, links are running at 1 Tbps over optical, or 10,000 times faster than cutting edge speeds two decades ago.

    Another interesting fact. “Every single large language model today runs on compute clusters that are enabled by Marvell’s connectivity silicon,” said Achyut Shah, senior vice president and general manager of Connectivity at Marvell.

    To keep ahead of what customers need, Marvell continually seeks to boost capacity, speed, and performance of the digital signal processors (DSPs), transimpedance amplifiers or TIAs, drivers, firmware and other components inside interconnects. It’s an interdisciplinary endeavor involving expertise in high frequency analog, mixed signal, digital, firmware, software and other technologies. The following is a map to the different components and challenges shaping the future of interconnects and how that future will shape AI.

    Inside the Data Center

    From a high level, optical interconnects perform the task their name implies: they deliver data from one place to another while keeping errors from creeping in during transmission. Another important task, however, is enabling data center operators to scale quickly and reliably.

    “When our customers deploy networks, they don’t start deploying hundreds or thousands at a time,” said Shah. “They have these massive data center clusters—tens of thousands, hundreds of thousands and millions of (computing) units—that all need to work and come up at the exact same time. These are at multiple locations, across different data centers. The DSP helps ensure that they don’t have to fine tune every link by hand.”

    Optical Interconnect Module

     

  • May 23, 2024

    Scaling AI Means Scaling Interconnects

    This article is part two in a series on talks delivered at Accelerated Infrastructure for the AI Era, a one-day symposium held by Marvell in April 2024.

    Interconnects have played a key role in enabling technology since the dawn of computing. During World War II, Alan Turing used the Turing machine to perform mathematical computations to break the Nazi’s code. This fast—at least at the time—computer used a massive parallel system and numerous interconnects. Eighty years later, interconnects play a similar role for AI—providing a foundation for massively parallel problems. However, with the growth of AI comes unique networking challenges—and Marvell is poised to meet the needs of this ever-growing market.

    What’s driving interconnect growth?
    Before 2023, the interconnect world was a different place. Interconnect speeds were driven by the pace of cloud data center server upgrades: the upgrades occurred every four years so the speed of interconnects doubled every four years at the same time. In 2023, generative AI took the interconnect wheel, and demand for AI is driving speeds to double every two years. And, while copper remains a viable technology for chip-to-chip and other short reach connections, optical is the dominant medium for AI.

    “Optical is the only technology that can give you the bandwidth and reach needed to connect hundreds and thousands and tens of thousands of servers across the whole data center,” said Dr. Loi Nguyen, executive vice president and general manager of Cloud Optics at Marvell. “No other technology can do the job—except optical.”

    AI doubles interconnect speed in half the time

  • May 14, 2024

    The AI Opportunity at Marvell

    Two trillion dollars. That’s the GDP of Italy. It’s the rough market capitalization of Amazon, of Alphabet and of Nvidia. And, according to analyst firm Dell’Oro, it’s the amount of AI infrastructure CAPEX expected to be invested by data center operators over the next five years. It’s an historically massive investment, which begs the question: Does the return on AI justify the cost?

    The answer is a resounding yes.

    AI is fundamentally changing the way we live and work. Beyond chatbots, search results, and process automation, companies are using AI to manage risk, engage customers, and speed time to market. New use cases are continuously emerging in manufacturing, healthcare, engineering, financial services, and more. We’re at the beginning of a generational inflection point that, according to McKinsey, has the potential to generate $4.4 trillion in annual economic value. 

    In that light, two trillion dollars makes sense. It will be financed through massive gains in productivity and efficiency.

    Our view at Marvell is that the AI opportunity before us is on par with that of the internet, the PC, and cloud computing. “We’re as well positioned as any company in technology to take advantage of this,” said chairman and CEO Matt Murphy at the recent Marvell Accelerated Infrastructure for the AI Era investor event in April 2024.

  • October 19, 2023

    Shining a Light on Marvell Optical Technology and Innovation in the AI Era

    By Kristin Hehir, Senior Manager, PR and Marketing, Marvell

    The sheer volume of data traffic moving across networks daily is mind-boggling almost any way you look at it. During the past decade, global internet traffic grew by approximately 20x, according to the International Energy Agency. One contributing factor to this growth is the popularity of mobile devices and applications: Smartphone users spend an average of 5 hours a day, or nearly 1/3 of their time awake, on their devices, up from three hours just a few years ago. The result is incredible amounts of data in the cloud that need to be processed and moved. Around 70% of data traffic is east-west traffic, or the data traffic inside data centers. Generative AI, and the exponential growth in the size of data sets needed to feed AI, will invariably continue to push the curb upward.

    Yet, for more than a decade, total power consumption has stayed relatively flat thanks to innovations in storage, processing, networking and optical technology for data infrastructure. The debut of PAM4 digital signal processors (DSPs) for accelerating traffic inside data centers and coherent DSPs for pluggable modules have played a large, but often quiet, role in paving the way for growth while reducing cost and power per bit.

    Marvell at ECOC 2023

    At Marvell, we’ve been gratified to see these technologies get more attention. At the recent European Conference on Optical Communication, Dr. Loi Nguyen, EVP and GM of Optical at Marvell, talked with Lightwave editor in chief, Sean Buckley, on how Marvell 800 Gbps and 1.6 Tbps technologies will enable AI to scale.   

  • September 05, 2023

    800G: An Inflection Point for Optical Networks

    By Samuel Liu, Senior Director, Product Line Management, Marvell

    Digital technology has what you could call a real estate problem. Hyperscale data centers now regularly exceed 100,000 square feet in size. Cloud service providers plan to build 50 to 100 edge data centers a year and distributed applications like ChatGPT are further fueling a growth of data traffic between facilities. Similarly, this explosive surge in traffic also means telecommunications carriers need to upgrade their wired and wireless networks, a complex and costly undertaking that will involve new equipment deployment in cities all over the world.

    Weaving all of these geographically dispersed facilities into a fast, efficient, scalable and economical infrastructure is now one of the dominant issues for our industry.

    Pluggable modules based on coherent digital signal processors (CDSPs) debuted in the last decade to replace transponders and other equipment used to generate DWDM compatible optical signals. These initial modular products didn’t offer the same performance as the incumbent solutions, and could only be deployed in limited use cases. These early modules, with their large form factors, had performance limitations and did not support the required high-density data transmission. Over time, advances in technology optimized the performance of pluggable modules, and CDSP speeds grew from 100 to 200 and 400 Gbps. Continued innovation, and the development of an open ecosystem, helped expand the potential applications.

  • June 12, 2023

    AI and the Tectonic Shift Coming to Data Infrastructure

    By Michael Kanellos, Head of Influencer Relations, Marvell

    AI’s growth is unprecedented from any angle you look at it. The size of large training models is growing 10x per year. ChatGPT’s 173 million plus users are turning to the website an estimated 60 million times a day (compared to zero the year before.). And daily, people are coming up with new applications and use cases. 

    As a result, cloud service providers and others will have to transform their infrastructures in similarly dramatic ways to keep up, says Chris Koopmans, Chief Operations Officer at Marvell in conversation with Futurum’s Daniel Newman during the Six Five Summit on June 8, 2023. 

    “We are at the beginning of at least a decade-long trend and a tectonic shift in how data centers are architected and how data centers are built,” he said.  

    The transformation is already underway. AI training, and a growing percentage of cloud-based inference, has already shifted from running on two-socket servers based around general processors to systems containing eight more GPUs or TPUs optimized to solve a smaller set of problems more quickly and efficiently.  

Archives