There’s no question that the explosion of AI over the past few years has fundamentally changed how we work, think, and communicate. At the same time, it has driven data center costs and energy consumption through the roof. AI infrastructure is now significantly more expensive than traditional cloud data centers[1], driven by increasing energy demands[2] and growing strains on cooling systems and network bandwidth. However, what’s less obvious – and even more problematic – is why this is happening.
A common misconception is that these strains result from the end of Moore’s Law – i.e., computing power isn’t increasing as quickly today as it did historically. However, the true culprit isn’t a lack of computing power; it’s communication. To support the exponential growth and complexity of next-generation AI models, the world won’t just need faster (and more costly) processors, but rather faster networks that can efficiently and economically transport vast amounts of data at scale. That’s why we believe the most urgent challenge in AI infrastructure is not building more powerful processors but fixing how they talk to each other.
Before the era of AI, compute power was mostly dependent on the performance of a single CPU within a server. Traditional workloads were largely “serial,” meaning the computer processed one instruction after another in a specified sequence. However, over time – and especially with the rise of AI – computing requirements have grown more complex, prompting changes to the architecture behind them. For example, AI workloads involve massive numbers of simple calculations (i.e., matrix multiplications) that can be done in parallel. As a result, AI systems have shifted away from chips designed for complex sequential operations (e.g., CPUs), toward chips built to perform many smaller calculations simultaneously (e.g., GPUs). GPUs became a better fit because they have thousands of smaller engines that process tasks in parallel. As AI gained traction, GPU adoption soared, fueling incredible sales growth and subsequent enterprise value for GPU companies like NVIDIA.
However, a more critical change in architecture was not the type of chip used but rather how many. With the rise of Large Language Models (LLMs) such as ChatGPT, the amount of computation rose exponentially, way beyond what a single GPU could produce. In response, companies began clustering GPUs together to multiply compute power, shifting the cloud industry from focusing on vertical scaling (i.e., making chips faster) to horizontal scaling (i.e., connecting many chips to enable them to work in parallel). While traditional compute workloads used one or a few processors (collectively referred to as XPUs – a catch-all phrase for CPUs, GPUs, and ASICs), modern AI clusters span many servers and now require dozens (soon to be hundreds, or even thousands) of XPUs. As a result, performance now depends less on the speed of a single chip and more on how efficiently those chips communicate.
This horizontal shift, while a key unlock for the industry, has simultaneously revealed a new bottleneck. Now, performance issues are no longer dominant at the chip level but at the network level between the chips. In other words, AI performance is no longer processing-bound, but communication bandwidth-bound. The costs of this bottleneck at scale can be enormous, especially when it comes to training LLMs. If data can’t move fast enough across interconnects, then areas of GPUs will go unused, wasting power and capital. Having more powerful GPUs can intensify these utilization problems and the waste associated with them. Reports from 2024 estimate that GPU utilization in large AI training workloads was as low as 30–40% due to these chip-to-chip and chip-to-memory bottlenecks.[3] The issue creates a mismatch of higher operating costs and lower efficiency. To solve this, AI compute requires faster data movement between chips, memory, and systems at scale.
One of the reasons this interconnect bottleneck is so difficult to solve comes from the interconnect material itself – copper wires. For decades, copper has been the standard material used for most interconnect technology. While cheap and highly conductive, it also suffers from signal degradation and heat buildup over longer distances. So, for AI, which requires high-density, high-speed interconnects, copper is failing to keep up.
To solve this, the semiconductor industry is moving to replace copper with the fastest transmission method on the planet – light. Unlike copper-based electric signal transmissions, light can travel hundreds of meters with almost no signal losses. While optical fibers have been used in data centers for years, their use case has traditionally been confined to connecting XPUs to the broader network (“scale out”). Today, with a new architecture known as co-packaged optics (“CPO”), optical interconnects are being embedded within the chip’s package to connect multiple XPUs together (“scale up”). This enables data to move between XPUs and memory without copper, unlocking high-speed, low-latency communication. CPO was considered a “science project” more than a realistic adoption curve for years, given its difficulty to manufacture and engineer into modern data center infrastructure. However, as this bottleneck becomes more pronounced and challenging to overcome, the industry is embracing it.
KDT has been actively following the optical interconnect and CPO space since 2021, even before the major bottlenecks in AI became pronounced. One of KDT’s first investments in this space was through Celestial AI, a company developing a “Photonic Fabric” – a CPO solution for scale-up networks. What sets Celestial apart isn’t just being early to market, it’s their unique approach to enabling interconnect. Most optical interconnects solutions are temperature-sensitive and therefore can only be placed at the edge of the chip (“the beach-front”), where space, heat, and fiber access are more manageable. Celestial overcomes that limitation with its Photonic Fabric, which is thermally stable and can be placed more flexibly across the chip, not just at the perimeter. This gives system designers more options when designing chips and managing their thermal constraints. The flexibility of Celestial’s architecture makes it the most adaptable and scalable solution for widespread CPO adoption.
Since the initial investment, our team, with the help of photonics experts from Molex (a Koch operating company and leader in “scale out” optical transceivers), has become increasingly bullish on the market need for CPO technologies like Celestial. We have doubled, tripled, and quadrupled down on our investments in the CPO space over the last few years and are proud to support teams ahead of the curve. At last month’s OFC (Optical Fiber Communications) conference, one thing was clear: CPO is finally here and will begin to scale in 2027.
The move to CPO isn’t just a hardware upgrade; it’s redefining how AI systems are built and scaled. As models grow and clusters expand, the limiting factor won’t be compute, but communication. While networking is not the most visible layer in the AI stack, it’s becoming one of the most important ones. As demands on AI infrastructure intensify, CPO is emerging as a practical necessity, not a future bet.
At KDT, we’re focused on advancing technological breakthroughs like CPO, whether through innovation in photonics, data movement, or other data center infrastructure. With the help of strategic partners like Molex, we aim to support founders as they move beyond the lab to scaled, production-ready technology. If you’re a founder or partner working in this space, we’d love to talk and support your transition from emerging technology to deployment at scale.
About KDT and Molex:
Koch Disruptive Technologies, LLC and subsidiaries (“KDT”), are the investment arms of Koch Inc., focused on partnering with transformational companies solving foundational infrastructure challenges. To date, we have deployed over $400mm into technologies reshaping the future of compute, connectivity, and data center infrastructure. At KDT, we aim to bring value beyond capital. Through our relationship with Molex, a Koch subsidiary and leader in electrical and interconnect solutions, KDT offers startups a unique mix of technical insight, GTM pathways, and commercialization opportunities. Molex’s rich expertise in optical systems and its global manufacturing footprint make it an invaluable resource for scaling next-generation infrastructure technology.
The collaboration between KDT and Molex exemplifies the mutual benefit we are striving to build, with startups gaining access to Molex’s deep technical expertise, extensive manufacturing capabilities and global scale, while also staying close to the cutting edge of innovation. “Through our relationship with KDT, Molex engages with breakthrough technologies earlier in their lifecycle. These partnerships allow us to improve our roadmap, build the future of connectivity, and support companies redefining the future of infrastructure,” said Aldo Lopez, the SVP of Data Center Solutions at Molex.
Simultaneously, the partnership gives startups access to otherwise inaccessible resources, enabling them to scale from innovative technologies to fully actualized businesses. In the words of Celestial AI co-founder and CEO Dave Lazovsky, “Through KDT, we’ve gained access to operational depth, strategic relationships, and enterprise-level infrastructure that are unattainable by most startups”. By combining the agility of startups with the scale and capability of Koch, KDT is aiming to redefine the future of computing.
About the Authors:
- Isaac Sigron leads Koch’s investments in data infrastructure globally with over 10 years of investment experience. Isaac focuses on picks and shovels solutions for the gold rush of our time – the mass adoption of Artificial Intelligence (AI). He is a Managing Director at Koch Disruptive Technologies (Israel), Ltd, an affiliate of KDT.
- Aliza Goldberg is an Investment Professional at Koch Disruptive Technologies (Israel), Ltd., covering data infrastructure – particularly technology in interconnect, power delivery, and hardware systems that support AI workloads.
- Deanna Grunfeld is an Investment Professional at Koch Disruptive Technologies (Israel), Ltd., covering AI infrastructure and deep tech, including semiconductors, optics, and software tools.
[1] McKinsey & Company. (2023). The cost of compute: A $7 trillion race to scale data centers. Retrieved from https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/the-cost-of-compute-a-7-trillion-dollar-race-to-scale-data-centers
[2] Joule Editorial Team. (2023). AI’s energy problem: The hidden costs of deep learning. Joule. https://www.cell.com/joule/fulltext/S2542-4351(23)00365-3
[3] Yeluri, S. (2024, January 2). GPU fabrics for GenAI workloads. Juniper Networks. https://community.juniper.net/blogs/sharada-yeluri/2024/01/02/gpu-fabrics-for-genai-workloads