Digits for Inference
Okay so I'm looking around and I see everyone saying that they are disappointed with the bandwidth.
Is this really a major issue? Help me to understand.
Does it bottleneck the system?
What about the flops?
For context I aim to run Inference server with maybe 2/3 70B parameter models handling Inference requests from other services in the business.
To me £3000 compared with £500-1000 per month in AWS EC2 seems reasonable.
So, be my devil's advocate and tell me why using digits to serve <500 users (maybe scaling up to 1000) would be a problem? Also the 500 users would sparsely interact with our system. So not anticipating spikes in traffic. Plus they don't mind waiting a couple seconds for a response.
Also, help me to understand if Daisy chaining these systems together is a good idea in my case.
Cheers.