Powerful Budget AI-Workstation Build Guide (48 GB VRAM @ $1.1k)
I built an AI workstation with 48 GB of VRAM, capable of running LLAMA 2 70b 4bit sufficiently at the price of $1,092 for the total end build. I got decent stable diffusion results as well, but this build definitely focused on local LLM's, as you could build a much better and cheaper build if you were planning to do fast and only stable diffusion AI work. But my build can do both, but I was just really excited to share. The guide was just completed, I will be updating it as well over the next few months to add vastly more details. But I wanted to share for those who're interested.
Public Github Guide Link:
https://github.com/magiccodingman/Magic-AI-Wiki/blob/main/Wiki/R730-Build-Sound-Warnnings.md
Note I used Github simply because I'm going to link to other files, just like how I created a script within the guide that'll fix extremely common loud fan issues you'll encounter. As adding Tesla P40's to these series of Dell servers will not be recognized by default and blast the fans to the point you'll feel like a jet engine is in your freaking home. It's pretty obnoxious without the script.
Also, just as a note. I'm not an expert at this. I'm sure the community at large could really improve this guide significantly. But I spent a good amount of money testing different parts to find the overall best configuration at a good price. The goal of this build was not to be the cheapest AI build, but to be a really cheap AI build that can step in the ring with many of the mid tier and expensive AI rigs. Running LLAMA 2 70b 4bit was a big goal of mine to find what hardware at a minimum could run it sufficiently. I personally was quite happy with the results. Also, I spent a good bit more to be honest, as I made some honest and some embarrassing mistakes along the way. So, this guide will show you what I bought while helping you skip a lot of the mistakes I made from lessons learned.
But as of right now, I've run my tests, the server is currently running great, and if you have any questions about what I've done or would like me to run additional tests, I'm happy to answer since the machine is running next to me right now!
Update 1 - 11/7/23:
I've already doubled the TPS I put in the guide thanks to a_beautiful_rhind comments and bringing the settings I was choosing to my attention. I've not even begun properly optimizing my model, but note that I'm already getting much faster results than what I originally wrote after very little changes already.
Update 2 - 11/8/23:
I will absolutely be updating my benchmarks in the guide after many of your helpful comments. I'll be working to be extremely more specific and detailed as well. I'll be sure to get multiple tests detailing my results with multiple models. I'll also be sure to get multiple readings as well on power consumption. Dell servers has power consumption graphs they track, but I have some good tools to test it more accurately as those tools often miss a good % of power it's actually using. I like recording the power straight from the plug. I'll also get out my decibel reader and record the sound levels of the dells server based on being idle and under load. Also I may have an opportunity to test Noctua's fans as well to reduce sound. Thanks again for the help and patience! Hopefully in the end, the benchmarks I can achieve will be adequate, but maybe in the end, we learn you want to aim for 3090's instead. Thanks again yall, it's really appreciated. I'm really excited that others were interested and excited as well.
Update 3 - 11/8/23:
Thanks to CasimirsBlake for his comments & feedback! I'm still benchmarking, but I've already doubled my 7b and 13b performance within a short time span. Then candre23 gave me great feedback for the 70b model as he has a dual P40 setup as well and gave me instructions to replicate TPS which was 4X to 6X the results I was getting. So, I should hopefully see significantly better results in the next day or possibly in a few days. My 70b results are already 5X what I originally posted. Thanks for all the helpful feedback!
Update 4 - 11/9/23:
I'm doing proper benchmarking that I'll present on the guide. So make sure you follow the github guide if you want to stay updated. But, here's the rough important numbers for yall.
Llama 2 70b (nous hermes) - Llama.cpp:
empty context TPS: ~7
Max 4k context TPS: ~4.5
Evaluation 4k Context TPS: ~101
Note I do wish the evaluation TPS was roughly 6X faster like what I'm getting on my 3090's. But when doing ~4k context which was ~3.5k tokens on OpenAI's tokenizer, it's roughly 35 seconds for the AI to evaluate all that text before it even begins responding. Which my 3090's are running ~670+ TPS, and will start responding in roughly 6 seconds. So, it's still a great evaluation speed when we're talking about $175 tesla p40's, but do be mindful that this is a thing. I've found some ways around it technically, but the 70b model at max context is where things got a bit slower. THough the P40's crusted it in the 2k and lower context range with the 70b model. They both had about the same output TPS, but I had to start looking into the evaluation speed when it was taking ~40 seconds to start responding to me after slapping it with 4k context. What's it in memory though, it's quite fast, especially regenerating the response.
Llama 2 13b (nous hermes) - Llama.cpp:
empty context TPS: ~20
Max 4k context TPS: ~14
I'm running multiple scenarios for the benchmarks
Update 5 - 11/9/2023
Here's the link to my finalized benchmarks for the scores. Have not yet got benchmarks on power usage and such.
https://github.com/magiccodingman/Magic-AI-Wiki/blob/main/Wiki/2x-P40-Benchmarks.md
for some reason clicking the link won't work for me but if you copy and paste it, it'll work.
Update 6 - 11/10/2023
Here's my completed "Sound" section. I'm still rewriting the entire guide to be much more concise. As the first version was me brain dumping, and I learned a lot from the communities help. But here's the section on my sound testing:
https://github.com/magiccodingman/Magic-AI-Wiki/blob/main/Wiki/R730-Build-Sound-Warnnings.md
Update 7 - 6/20/2024
SourceWebMD has been updating me on his progress of the build. The guide is being updated based on his insight and knowledge share. SourceWebMD will be likely making a tutorial as well on his site https://sillytavernai.com which will be cool to see. But expect updates to the guide as this occurs.