Keep in mind what 200K H100s means when comparing Grok to other models
If Grok 3 was truly trained on that many GPUs, then it may be like comparing a hydrogen bomb to a coughing baby. GPT 4 was trained on 25000 A100s. An H100 is roughly equivalent to 5 A100s. So Grok 3 was trained on roughly 40x more compute than GPT4, a 1.8 trillion parameter model. Then keep in mind that GPT 4o and the o1 and o3 base model are said to be 200 billion parameter models. In terms of cost, Grok 3 must have cost over 2 Billion just in hardware.
Just something to consider.