Brian Venturo and a couple of fellow hedge fund buddies bought their first GPUs as part of an elaborate joke. It was late 2016, and for the fun of playing with something volatile that could be either trash or treasure, they had been staking their pool and fantasy football games with bitcoin instead of cash. When bitcoin prices surged, Venturo says, the group realized, “Maybe we should take this seriously.”
The friends rush-ordered GPUs, also known as graphics chips, for same-day delivery from Amazon to start mining bitcoin while their enthusiasm ran hot. The powerful processors are prized for their ability to crunch the math of crypto-mining at high speeds. Over time, the hobby turned into a business called Atlantic Crypto, which filled a garage and later warehouses with GPUs. Some of the beefy chips were rented out via the cloud to armchair cryptocurrency miners. As coin prices climbed, time was money, so the company’s technicians grew skilled at speedily installing GPUs and pushing their performance to the brink.
When crypto prices cratered heading into 2019, Venturo and team pivoted. They raised money to buy cut-price GPUs from struggling crypto miners, rebranded as CoreWeave, and offered up their tens of thousands GPUs as a specialist cloud service for companies in need of the powerful chips, like visual effects studios and AI startups. GPUs are essential to machine learning projects like training algorithms to process images or text. CoreWeave was unknown in those spheres, but won over clients by dropping everything to address support queries. “We would resolve things in 15 minutes,” a manager from that era says. “The dedication to customers was phenomenal.”
Everything changed after OpenAI debuted ChatGPT late in 2022. A rush of interest and investment in all things generative AI caused a GPU shortage—especially of Nvidia’s prized chips. With its couple of data centers packed with GPUs and a long-standing relationship with Nvidia, CoreWeave quickly became vital to the accelerating AI hype train. The company picked up clients including Microsoft, OpenAI’s crucial sponsor, and image-generation startup Stability AI, according to former CoreWeave employees. Venturo’s biggest regret is not ordering more Nvidia chips back when he could. “If I had unlimited amounts, I could have it all sold in two weeks,” he says.
Things still worked out. As demand for CoreWeave’s GPUs surged through 2023, the company piled its new revenue into buying more of the chips and building out new data centers. A company that began as a beer-money side project now doesn’t touch crypto, and is all in on cloud computing. Over the past year the startup has picked up a $2.3 billion loan—using GPUs as collateral—and nearly $2.2 billion in investment funding, including $1.1 billion announced this week, from investors including Nvidia. With its latest infusion of cash, the startup’s valuation jumped to $19 billion.
By the end of this year, CoreWeave expects to have 28 data center campuses spread across a dozen or more US states, Europe, and potentially Asia—overall more than nine times its footprint at the beginning of 2023. Venturo, who is now CoreWeave’s chief strategy officer, says long-term contracts to supply AI giants justify the buildout. “It’s transformed this small cloud startup into one of the biggest juggernauts,” he says. “We’re lucky enough to be in this position because of this crazy origin story where we had so much expertise building at this rapid, frenetic pace.”
For some of CoreWeave’s 550 workers, that pace has translated into a culture of grueling hours and high-pressure management that feels unsustainable, according to six former workers speaking on the condition of anonymity to avoid retaliation. But with the AI boom continuing, CoreWeave doesn’t look likely to slow down.
Core Components
GPUs are needed to train and operate machine learning systems like OpenAI’s GPT-4 that analyze vast amounts of content, interpret prompts from users, and return eloquent writing or astonishing imagery. When startup Databricks recently trained an open source AI model designed to rival GPT-4, the project occupied 3,072 Nvidia H100 chips, which debuted last March, for three whole months, running up a cloud tab of $10 million. Similar bills are chewing through much of the $29.1 billion that generative AI companies raised last year from investors, according to market researcher PitchBook’s estimate.
CoreWeave’s blazing 2023 set an example that rivals old and new are eager to follow. Other cryptocurrency miners, such as Applied Digital, Hut 8 Mining, and Iris Energy, are trying to become AI prospectors by offloading excess GPU capacity to software developers. With the financial reward for mining bitcoin falling, more GPUs are set to be freed up for AI.
And leading cloud players are spending heavily to expand their cloud offerings of GPUs and other AI-centric chips. Amazon plans to unleash almost $150 billion on data centers over the next 15 years; Microsoft has talked of tripling its data center growth in the second half of this year; and Google just announced $3 billion in investments to build and expand two US data centers.
But CoreWeave is one of only seven companies that Nvidia calls “elite” cloud partners in the US; cloud giants like Amazon aren’t on the list. That would seem to give CoreWeave a shot at snatching up GPUs even when supplies are slim, although the head of Nvidia’s investment arm told the Financial Times last year that it doesn’t help anyone “jump the queue.” As long as CoreWeave can maintain its supply of GPUs, its biggest advantage may be its scrappiness. So far it hasn’t found too many hurdles it can’t get over—or slide past. “We survived due to our hustle and our creativity,” Venturo says.
Fulfilling the world’s surging demand for AI chatbots and image generators depends on mundane components as much as glitzy GPUs. Server cabinets, heavy-duty metal enclosures to store GPU systems, have at times been a crucial bottleneck.
As it scrambled to build up its facilities, CoreWeave once ordered 1,400 of the wrong cabinets. It was a costly mistake because supply chain backups have delayed new shipments by months. “You’re moving so fast, and somewhere along the line a process fails—and you don’t realize until you have 17 tractor trailers with cabinets outside the door and you have to turn them all away,” Venturo says.
But in an example of the shrewdness that’s been key to rapid expansion, Venturo’s team in that crisis quickly put aside frustration and decided to buy used cabinets off what he called the gray market. The move prevented a significant delay. “This was just one of the instances of challenges we faced and overcame to make sure we delivered for our partners,” he says.
To keep things moving, CoreWeave has turned to the “gray market” for more than just cabinets. It’s bought networking switches and routers from eBay to sidestep waits as long as two years for new gear, a former employee says. The security and reliability of used parts can be questionable, but amid the urgency of the AI boom, some conventional practices had to be brushed aside, the person says.
In Plano, CoreWeave last year outfitted four 1-megawatt data center halls in under three days each, a feat that normally takes weeks. “We can go in with the gloves off and build really quickly,” Venturo says.
CoreWeave was forced into another creative solution when an internet provider was slow to install a broadband connection at a new site—a problem familiar to many home internet users. Three senior executives met by phone one morning to decide how to avoid delaying the project. All happened to offer the same fix: Buy satellite internet through the convenient but not cheap Starlink service of Elon Musk’s SpaceX until the fiber provider showed up. It eliminated weeks of potential delay. “We have to be incredibly flexible,” Venturo says.
Helter Skelter
Lessons from early projects inform CoreWeave’s standard procedures today. CoreWeave has opted to pay a premium for custom manufacturing of tens of thousands of fiber-optic cables because its special design means installation takes just an hour, instead of 10. After US customs authorities held up one vital shipment of important equipment, CoreWeave immediately began pushing orders through multiple alternate ports. To avoid shortages, the company now orders far more parts than it needs, betting that the leftovers can be shipped to the next project that comes along.
Haste has sometimes had unwanted consequences. CoreWeave’s data center in Las Vegas still smells like burning plastic, one source says, because it blew out some electrical components when it fired up too many GPUs at once when the site was set up several years ago.
At the core of CoreWeave’s operations are its data center technicians. The most skilled of them operate like a special operations unit, flying from site to site to get new data centers fired up instead of working at one campus full-time. Venturo declines to say how many miles his crack squad of technicians have logged, but he says they installed about 6,000 miles of fiber-optic cabling last year. “I probably have the most interaction with that team, more than any other team in the company, just because they are so incredibly important,” he says.
Some former CoreWeave employees say that working at the company can be needlessly intense. There’s an expectation that employees are always available. Their personal phone numbers are visible on their company Slack profiles and can’t be removed. A former employee says, frustratingly, it enabled colleagues to text him at odd hours about work.
Venturo says that a few years ago he introduced weekly massages as a perk for employees at CoreWeave’s headquarters in New Jersey, after recognizing his own back and neck pains at work. “If I feel like this, other people feel like this too,” he says. (One of the former employees says executives were mostly the only ones who felt comfortable getting treated at the office.)
In Venturo’s view, the big demands on workers have a reason. “Everything that we do is about identifying and removing problems and blockers,” he says. “We push our teams to move quickly, and when things go wrong, we lift them up by solving the problems together.”
Tight staffing and equipment supplies have sometimes affected the reliability of CoreWeave’s services as it has grown. The number of server outages started to increase as it shifted away from crypto because CoreWeave until recently had just one engineer focused on maintaining uptime, former employees say. The company sometimes didn’t have enough working GPUs to fulfill contract obligations, according to two of the sources. “They never had an appropriate number of spares,” one says. To compensate, the company skipped some testing of newly installed GPUs, the two sources say. “We don’t have another choice I think,” a vice president said in one internal Slack message seen by WIRED. Venturo says the company’s testing platform has been designed to ensure clients meet their timelines to get online.
CoreWeave’s fast pace has forced some considerations—like accounting for the company’s environmental impact—down the priority list, according to former employees. Venturo says most of CoreWeave’s data centers are operating on 100 percent renewable energy and that the company hasn’t pursued data centers in areas such as floodplains that aren’t insurable. “It’s not so frenetic that we’re put in a position where we have to make irrational choices,” he says. Asked whether CoreWeave would release a sustainability report, as some other cloud companies do, to report its water usage and efforts to reduce supply-chain emissions, Venturo told WIRED in October that it would have more to share soon. It’s yet to release any report.
Behind Your Copilot
If you’ve ever used ChatGPT, Microsoft’s Copilot, or other generative AI offerings, or image creator Stable Diffusion, you may have reaped the fruits of CoreWeave’s frenetic labors.
The company has put together facilities for Microsoft, with input from OpenAI, sources say, and is hosting a supercomputer data center for Nvidia in Plano, Texas, according to the chipmaker. CoreWeave’s Plano campus cost about $1.6 billion and runs about 450,000 square feet, around the size of the largest Ikea store in the US. Another space dedicated to a single customer is in Oregon, where it has served business chatbot software startup Inflection AI. These sites tend to have somewhere between 16,000 to 32,000 GPUs, Venturo says.
Other CoreWeave locations provide GPU access to multiple companies, with customers including image generator developer Stability AI and Anlatan, which offers writing tool NovelAI. These sites tend to serve a broader range of uses by offering access to more affordable chips, such as Nvidia A100s, launched in 2020. They are typically located at the edge of major cities—Venturo calls them places that have National Football League teams—to minimize delays in customers getting their AI-generated essays and art. “We’re really trying to build our cloud to be as flexible and dynamic as our customers demand,” Venturo says.
CoreWeave’s locations have spanned new construction, empty buildings it filled in, and fully equipped data centers leased from others, including a bitcoin miner. That nimble strategy puts CoreWeave in good business shape. Raul Martynek, CEO of data center provider DataBank, estimates that companies like CoreWeave offering GPU access can have 70 percent gross profit margins by avoiding the costs of developing their own facilities from the dirt up. Venturo declines to comment on gross margins.
Big cloud providers such as Amazon and Google could squeeze out smaller rivals like CoreWeave by undercutting prices if and when demands for GPU time stabilize. But Tony Harvey, a senior director analyst tracking data centers for Gartner, says a niche vendor like CoreWeave could survive. “It’s an interesting dance to play, and they’re going to have to work hard at it,” he says.
More immediately worrisome for CoreWeave and the entire data center industry is that growth could be stymied by electricity shortages, due to issues including a lack of new power and transmission construction. Today, US campuses of cloud giants such as Microsoft consume about 11 gigawatts of power, and smaller players including CoreWeave an additional 11 gigawatts, according to real estate firm JLL. It expects that usage to grow about 20 percent on average annually through 2030 to an estimated 79 gigawatts combined.
CoreWeave for its part hasn’t developed a data center in the popular region of Northern Virginia because it’s such “a food fight to get power” there, Venturo says. But the company could be just one fresh, clever idea away from getting ahead of the pack.
Source : Wired