Business AI Backup Archives

Last updated: June 2026 Part 2 of 4

Setting up local AI as a continuity backup starts with hardware. The wrong GPU choice makes Ollama unusable; the right one makes it nearly invisible. This guide covers the exact hardware specifications for production use, the installation process on macOS, Linux, and Windows, and the configuration variables that turn a development setup into a reliable failover service.

Part 1 of this series established why every AI-dependent business needs a continuity plan and introduced Ollama as the most practical local AI runtime for that role. This part addresses the implementation. Hardware first, because every decision after it depends on hardware reality. Installation second, with the configuration choices that matter for production failover use.

By the end of this guide, Ollama will be running on the target hardware, models will be downloaded, and the foundation for a working continuity plan will be in place. The remaining two parts of the series cover model selection with monitoring, then security and auto-failover integration.

What hardware does Ollama actually need?

Ollama is software. The constraint on whether it works for a business is the hardware it runs on. Three hardware paths qualify for production use, one path does not, and the difference between them is roughly an order of magnitude in response speed.

NVIDIA GPUs (Linux and Windows)

NVIDIA is the most common business path. Ollama requires Compute Capability 5.0 or higher, which includes the GTX 960 and every NVIDIA card released since 2015¹. The driver version must be 535 or higher on Linux, or 531 or higher on Windows. Modern data center cards (A100, H100, L40S) work and provide significant headroom for larger models.

Verification is a single command. Run nvidia-smi in a terminal. The output shows the driver version and lists available GPUs. If the command is not found or shows errors, the driver is missing or outdated and must be installed before Ollama will use GPU acceleration.

Apple Silicon (M1 through M4)

Apple Silicon is the simplest path. M1, M2, M3, and M4 chips all support Ollama through Metal GPU acceleration with zero configuration². Install Ollama and it uses the GPU automatically. The unified memory architecture is particularly effective for large models because GPU and CPU share the same memory pool, which means a 32 GB Mac can load models that would require dedicated 32 GB GPU cards on PC hardware.

Intel Macs are not viable. Even on a high-end Intel i9 MacBook Pro, generation speed is in the 4 to 6 tokens-per-second range, similar to CPU-only operation on PC hardware.

AMD GPUs (Linux only, as of mid-2026)

AMD support is real but limited. ROCm 7 on Linux works for most modern AMD GPUs¹. ROCm on Windows is still classified as experimental and is not officially supported by Ollama. Organizations standardized on AMD GPUs on Windows should plan around this reality, either by switching the AI workload to Linux, using WSL2 with the understanding that performance and stability vary, or running Ollama on CPU as a stopgap.

Hardware sizing by model

Different models require different amounts of memory. The table below shows the recommended hardware tier for each common model at standard quantization (Q4_K_M, which is the default in Ollama and balances quality with memory efficiency).

Available Memory	Recommended Model	Typical Speed	Best For
8 GB RAM (CPU only)	`phi3:mini` or `gemma2:2b`	3 to 8 tokens/sec	Simple Q&A only, not viable for production
16 GB RAM (CPU only)	`llama3.1:8b`	5 to 10 tokens/sec	Still too slow for most workflows
8 to 12 GB VRAM (NVIDIA)	`llama3.1:8b`, `qwen2.5-coder:7b`	50 to 70 tokens/sec	Email, documents, code generation
16 GB unified memory (Apple)	`llama3.1:8b`	40 to 60 tokens/sec	General business workflows
24 GB VRAM (RTX 3090, 4090, A5000)	`qwen2.5-coder:14b`, `llama3.1:70b` (tight)	30 to 100 tokens/sec	Complex reasoning, near-frontier quality
48 GB+ VRAM or 64 GB+ unified	`llama3.1:70b` with headroom	20 to 50 tokens/sec	Highest-quality local inference

The pattern in the table is consistent: GPU acceleration delivers roughly 10x to 20x faster generation than CPU-only operation. For business failover, only the GPU rows are viable.

Should the organization self-host or stick with cloud AI?

Not every organization should build local AI infrastructure. The hardware investment and engineering time matter. A practical decision framework looks at three factors.

Existing hardware. If the team already runs machines with compatible GPUs (developer workstations with NVIDIA cards, Apple Silicon laptops, or Linux servers with discrete GPUs), the marginal cost of adding Ollama is engineering time only. If no suitable hardware exists, the conversation shifts to whether a continuity plan justifies a hardware purchase.

Operational criticality. If the business pauses meaningfully when cloud AI fails (development teams blocked, customer support degraded, content production stopped), local AI failover is justified. If AI use is exploratory or non-critical, the case for local infrastructure is weaker.

Data sensitivity. Organizations handling regulated data (healthcare, legal, financial) often need local AI for reasons beyond continuity. Local execution keeps prompts and responses inside the corporate network, which simplifies GDPR, HIPAA, and SOC 2 compliance.

How is Ollama installed on macOS?

macOS is the fastest path to a working installation. The graphical installer handles everything, including the system service setup that makes Ollama available after restart.

Step 1

Download the macOS installer

Visit ollama.com and download the macOS package. The download is approximately 200 MB.

Step 2

Run the installer and grant permissions

Open the downloaded file and drag Ollama to the Applications folder. Launch Ollama. macOS prompts for permission to install the command-line tools. Approve the prompt. Ollama now runs as a menu bar application and starts automatically at login.

Step 3

Verify the installation

Open Terminal and run the verification commands:

ollama --version # Should output: ollama version is 0.x.x curl http://localhost:11434 # Should output: Ollama is running

If both commands succeed, Ollama is installed and the API server is listening. The next step is downloading a model.

How is Ollama installed on Linux?

Linux installation requires a few more steps than macOS, but the result is a more robust production deployment. The official installer creates a systemd service with automatic restart on failure, which is the right baseline for business use.

Step 1

Run the installer

Execute the one-line install script. The script handles dependency detection, GPU driver verification, and systemd service creation:

curl -fsSL https://ollama.com/install.sh | sh

The installer creates an ollama system user and installs the binary to /usr/local/bin/ollama. Model storage defaults to /usr/share/ollama/.ollama/models.

Step 2

Verify the service is running

Check the systemd service status:

sudo systemctl status ollama # Should show: active (running) curl http://localhost:11434 # Should output: Ollama is running

Step 3

Configure environment variables for production use

The default installation binds Ollama to localhost only and stores models in the system partition. For production deployments, these defaults often need adjustment. Edit the systemd service:

sudo systemctl edit ollama.service

Add the configuration under the [Service] section. The most common production variables:

[Service] # Bind to all interfaces (only do this with proper firewall rules in place) Environment="OLLAMA_HOST=0.0.0.0:11434" # Store models on a larger drive Environment="OLLAMA_MODELS=/data/ollama/models" # Keep models loaded in memory longer to reduce cold-start latency Environment="OLLAMA_KEEP_ALIVE=30m" # Limit concurrent loaded models if memory is tight Environment="OLLAMA_MAX_LOADED_MODELS=2"

Save the file and reload the service:

sudo systemctl daemon-reload sudo systemctl restart ollama

Step 4

Confirm GPU detection (NVIDIA only)

If the machine has an NVIDIA GPU, verify Ollama is using it:

OLLAMA_DEBUG=1 ollama serve 2>&1 | grep -i "cuda\|gpu"

The output should mention CUDA initialization and list the detected GPU. If it shows CPU mode despite an installed GPU, the driver version is likely below the minimum (535 on Linux). Update the NVIDIA driver and restart.

The systemd service includes Restart=always by default, which means Ollama recovers automatically from crashes or OOM kills. This is the single most important property for a continuity service, since the whole point is that Ollama is available when needed.

How is Ollama installed on Windows?

Windows installation uses an MSI installer or the winget package manager. Both produce the same result: Ollama running as a system tray application with the API server listening on localhost:11434.

Step 1

Install Ollama

Two paths work. Either download the MSI from ollama.com and run it, or install via PowerShell with winget:

winget install Ollama.Ollama

The installer adds Ollama to the system PATH and starts the background service.

Step 2

Verify the installation

Open a new PowerShell or Command Prompt window (a new session is required for PATH updates to take effect):

ollama --version curl http://localhost:11434

Both should succeed. The Ollama system tray icon should also be visible.

Step 3

Configure environment variables

Ollama on Windows reads environment variables from the user and system environment. Quit Ollama from the system tray, then open System Properties through the Settings app or Control Panel. Add environment variables:

OLLAMA_HOST = 0.0.0.0:11434 (only with firewall in place)
OLLAMA_MODELS = D:\OllamaModels (redirect to larger drive)
OLLAMA_KEEP_ALIVE = 30m

Restart Ollama from the Start menu. The new environment variables take effect on the next launch.

How are models downloaded and tested?

With Ollama installed and running, the next step is pulling the model library that the failover system will use. Pull all models during normal operations, while the network is available. Once downloaded, models live locally and require no internet access to run.

# General-purpose business model (8B parameters, works on most hardware) ollama pull llama3.1 # Coding replacement for Claude Code workflows ollama pull qwen2.5-coder # Fast document and email drafting model ollama pull mistral # Lightweight model for low-spec hardware ollama pull phi3 # Verify all models are present and ready ollama list

Test each model with a real prompt to confirm output quality and response speed before relying on it for failover:

ollama run llama3.1 "Summarize the key risks of cloud AI dependency for a manufacturing business in 100 words."

A well-functioning installation responds within seconds and produces coherent output. If response time exceeds 30 seconds for a short prompt on a GPU-equipped machine, the model is probably running on CPU. Verify GPU acceleration is active.

Does PCG handle Ollama deployment for clients?

Phoenix Consultants Group has been deploying production software systems since 1995, and the operational discipline that applies to legacy migrations and compliance platforms applies equally to local AI infrastructure. A custom Ollama deployment engagement starts with a hardware audit (what compatible machines already exist on the network), continues through installation and configuration tailored to the client's operating systems, and ends with team training on the operational procedures that keep the failover ready.

The same engineering team that builds and maintains the FireFlight Data System manages Ollama deployments. Both involve infrastructure that has to run continuously without manual babysitting, which is what PCG has built for three decades.

Need help deploying Ollama in production?

PCG handles hardware assessment, multi-platform installation, monitoring integration, and team training as a single engagement.

Book Your Free Consultation

Frequently Asked Questions

What GPU do I need to run Ollama for business use?

For business-grade inference speed, NVIDIA GPUs with Compute Capability 5.0 or higher (GTX 960 and newer) with driver version 535 or higher on Linux, or 531 or higher on Windows. Apple Silicon M1 through M4 chips work automatically through Metal. AMD GPUs require ROCm 7 on Linux. The minimum VRAM for a 7-billion-parameter model at standard quantization is 6 GB.

Can Ollama run on an AMD GPU on Windows?

Not natively as of mid-2026. AMD GPU acceleration through ROCm is Linux-only. Windows users with AMD GPUs must either run Ollama on CPU, use WSL2 with experimental ROCm support, or switch to Linux for the AI workload.

How much disk space does an Ollama installation need?

The Ollama runtime itself uses approximately 1 GB. Models are the main storage cost. A baseline emergency library of llama3.1 (4.7 GB), qwen2.5-coder (8 GB), mistral (4.1 GB), and phi3 (2.2 GB) totals roughly 20 GB. Adding a 70B model adds another 40 to 45 GB. Plan for 60 to 80 GB of free disk space for a full business deployment.

Should I install Ollama as a system service or run it manually?

For production failover use, install as a system service. On macOS the desktop application handles this automatically. On Linux, configure as a systemd service with automatic restart on failure. On Windows, install as a system service. Manual ollama serve invocations are appropriate for development testing but do not survive reboots or process crashes.

Where does Ollama store the downloaded models?

Default locations are ~/.ollama/models on macOS and Linux, and C:\Users\<user>\.ollama\models on Windows. The location is configurable through the OLLAMA_MODELS environment variable. For business deployments, redirecting model storage to a separate drive is recommended.

Do I need to keep Ollama running all the time?

Yes for failover scenarios. Ollama runs as a background service that listens on port 11434. Idle service consumption is minimal because models load into memory only on request and unload after a configurable inactivity period through OLLAMA_KEEP_ALIVE.

About the Author

Allison Woolbert

CEO and Senior Systems Architect, Phoenix Consultants Group

Allison Woolbert is the principal of Phoenix Consultants Group, the custom software consultancy founded in 1995. PCG has run legacy migration projects across Microsoft Access, Visual FoxPro, Paradox, VB6, and other discontinued platforms for industrial, manufacturing, and environmental services clients since the late 1990s.

Allison leads PCG's discovery and architecture practice, where the first deliverable on every legacy engagement is an honest inventory of what the existing application actually does and what it should do next.

LinkedIn.

Sources

¹ Ollama official GPU support documentation, NVIDIA and AMD requirements: github.com/ollama/ollama/blob/main/docs/gpu.md

² Ollama documentation on Apple Silicon and Metal GPU acceleration: docs.ollama.com

³ Ollama Linux installation and systemd configuration: docs.ollama.com/linux

⁴ Ollama environment variable reference: github.com/ollama/ollama/blob/main/docs/faq.md

This article is informational and reflects industry observations as of June 2026. It is not legal, compliance, or financial advice for any specific situation. Phoenix Consultants Group, founded 1995, provides custom software development and AI infrastructure consulting. For guidance tailored to your organization's specific requirements, contact PCG directly.

Get the full installation guide

This is Part 2 of a 4-part series on building an AI continuity plan with Ollama. Enter your email to unlock the rest of this article and receive Parts 3 and 4 covering monitoring, model selection, firewall configuration, and auto-failover integration.

We verify your email first. One click confirms your subscription.

Tag: Business AI Backup

What hardware does Ollama actually need?

NVIDIA GPUs (Linux and Windows)

Apple Silicon (M1 through M4)

AMD GPUs (Linux only, as of mid-2026)

Hardware sizing by model

Should the organization self-host or stick with cloud AI?

How is Ollama installed on macOS?

Download the macOS installer

Run the installer and grant permissions

Verify the installation

How is Ollama installed on Linux?

Run the installer

Verify the service is running

Configure environment variables for production use

Confirm GPU detection (NVIDIA only)

How is Ollama installed on Windows?

Install Ollama

Verify the installation

Configure environment variables

How are models downloaded and tested?

Does PCG handle Ollama deployment for clients?

Need help deploying Ollama in production?

Frequently Asked Questions

Allison Woolbert

Sources

Get the full installation guide

NEED A CONSULTATION?

Solutions

Data Management

Small Data Systems

Additional Services

Our Company

Subscribe