Modern machine learning workloads are somewhat paradoxical, GPUs are more powerful than ever, yet the performance, complexity, and cost of securing them are often too high.
This is the gap that Bomfather and NeuralRack AI are solving together!
We are trying to solve a simple problem with significant implications. How do you give users access to extremely powerful, affordable GPUs without security issues?
Why use NeuralRack AI?
NeuralRack offers bare metal GPU hosting designed for modern AI workloads, with RTX Pro 6000s and RTX 5090s readily available to rent, as well as custom clusters and on-premises full stack AI for regulated sectors.
Bare Metal Should be the Standard for AI
Bare metal GPUs are necessary for efficiently running machine learning workloads. On the other hand, virtualization incurs many small throughput losses, leading to extremely inefficient AI workloads.
Bare metal is one of the most secure primitives for running any sort of workload. When running workloads on VMs, a cloud provider could potentially snoop on your workloads since it controls the virtualization/hypervisor.
The RTX Pro 6000 was built for 2026’s AI Workloads.
The RTX Pro 6000 has 96 GB of memory per GPU. Having a large amount of memory reduces the need for tensor parallelism, model sharding, and other offloading techniques. 96GB is the perfect amount to run quantized models up to 100B parameters on a single card, or the largest open-source models like Kimi K2.5 1T on an 8xPro6000 machine. The Pro 6000 currently offers the most VRAM per dollar on the market.
Not only can a Pro 6000 server run almost every LLM, but it is also excellent for diffusion based tasks such as image or video generation. The powerful full-fat GB202 die (+10% more transistors than the RTX 5090) at a 600w power budget enables rapid iteration at high batch sizes, thanks to the 96GB VRAM. The ECC memory ensures studio-quality visual fidelity with fewer artifacts. The RTX Pro 6000 is the ‘Swiss army knife’ of GPU models.
Most importantly, the RTX Pro 6000 is within reach for startups, researchers, and companies alike. A complete machine costs under $100k, while a traditional 8xH200SXM server costs $250k+ and an 8xB200SXM server costs $375k+. Additionally, it can run in standard office server room environments with air cooling, unlike most B200 implementations.
Additionally, bare metal RTX Pro 6000s outperform the H200 in both latency and cost efficiency. For more details, check out NeuralRack AI’s blog post: Emerging Edge and In-House Inference Use Cases: When RTX PRO 6000 Outperforms H200/B200 SXM.
Why use Bomfather?
GPU security is often a blind spot in AI pipelines. A malicious actor can steal and modify proprietary model data and user data (and they can do this without sudo permissions!).
We realized that GPUs are fundamentally insecure (up until 5 years ago, they were primarily used for gaming) and that there wasn’t really a good solution for GPU or runtime security. So Bomfather was born to fix three issues in existing solutions, speed, security, and simplicity.
Speed
Confidential computing is one of the best ways to secure GPUs, but studies have shown that it can add over 900% performance overhead when inferencing and over 4,000% during training (https://bomfather.dev/blog/confidential-computing-overhead/). On the other hand, Bomfather only adds a 1% to 3% overhead.
Security
There are many runtime security solutions, but most run in the userspace, rendering them vulnerable to tampering by other processes on the machine. The remaining solutions are vulnerable to policy manipulation (https://bomfather.dev/blog/attacking-and-securing-ebpf-maps/) and can be shut down by malicious actors (https://bomfather.dev/blog/stopping-kill-signals-against-your-ebpf-programs/). Bomfather’s solution is built with eBPF, so we run directly in the kernel and are constantly at the forefront of innovation in GPU and runtime security.
Simplicity
Confidential computing and existing runtime security solutions are painful to use. Confidential Computing is really hard to set up, and kind of feels like shooting yourself in the foot when debugging. Existing runtime security solutions have configs that are extremely complex, convoluted, and generally painful to read/write/understand. And when changing a workflow, those page long configs need to be rewritten. On the other hand, Bomfather runs as a background process (so it doesn’t interfere with your workflow) it runs with a single command, and has an extremely short, simple policy (Take a look at our policy inheritance demo for a bit more information: https://youtu.be/5hVO2S7aO6w?si=zKweTj-2A_QlMXLy).
Bomfather’s solution is unique, and no other company can offer it because it redefines how security is done. Bomfather isn’t built on a single breakthrough or tweak, the Bomfather team has spent years perfecting their solution through research to deliver an extremely fast, secure, and easy to use solution.
Why use Bomfather + NeuralRack AI?
When running Bomfather and NeuralRack together, security is guaranteed. NeuralRack’s GPUs are bare metal, so there is no hypervisor mediating GPU access, no noisy neighbor problem, and no shared GPU scheduler across tenants. NeuralRack can even implement private internet uplinks, physical/virtual network isolation, and physical cabinet/cage isolation for deployments requiring high security.
But bare metal doesn’t protect from all risks. Viruses, malware, and exploits can affect organizations even with strict security practices, via social engineering, zero-day vulnerabilities, or operator error. Without proper enforcement, any process with access to the GPU can modify the model weights in GPU memory and read user data. This is where Bomfather comes into the picture. Bomfather blocks malicious users from accessing GPU data.