Phoenix Consultants Group | Custom Computer Programming Phoenix Consultants Group | Custom Computer Programming
  • Custom Software Developers
    • Analyzing Business Needs
    • Custom Application Development
    • Custom Website Development
    • Data Collection and Management
    • Form Design & Development
    • Visual Basic Programming Experts
    • Custom Technology Products & Software Solutions for Business
  • .NET Development
    • Business Logic to .NET Architecture:
    • Smarter Decisions with Intelligent Data Systems
    • Custom .NET Software Development
  • Fireflight Data System
    • Fireflight – Project
  • Data Management
    • Managing Legacy Data and Systems
    • Conversion, Migration & Integration
    • Data Management
    • Data Movement & Middleware Integration Services
    • Enterprise Resource Planning
    • Inventory Management Systems
    • Microsoft Access Solutions
      • Access Database Consulting
      • Access Database Design
      • Access for Rapid Data Development
      • Access Database Programming
  • Case Studies
    • ISO 9000 Documentation & Regulatory Compliance Database
    • Superfund Soil Remediation
    • OSHA Training & Certification
    • Ground Water Monitoring
    • Pest Control Reporting Engine
    • Vineyard Pest Trap Management
    • Fueling System for a Top-5 U.S. Metro Fleet
    • Payroll System for a Multi-Facility Physician Staffing Company
    • Ground Support Equipment (GSE) Management System for Airport Operations
    • (MSDS/SDS) Management System
    • Pesticide Licensing Compliance System
    • EPA Title V Air Quality Management System
  • Tech Wisdom
  • Industries We Serve
    • Custom Software Portfolio
  • Blog
  • About Us
  • Contact Us
Phoenix Consultants Group | Custom Computer Programming
  • Custom Software Developers
    • Analyzing Business Needs
    • Custom Application Development
    • Custom Website Development
    • Data Collection and Management
    • Form Design & Development
    • Visual Basic Programming Experts
    • Custom Technology Products & Software Solutions for Business
  • .NET Development
    • Business Logic to .NET Architecture:
    • Smarter Decisions with Intelligent Data Systems
    • Custom .NET Software Development
  • Fireflight Data System
    • Fireflight – Project
  • Data Management
    • Managing Legacy Data and Systems
    • Conversion, Migration & Integration
    • Data Management
    • Data Movement & Middleware Integration Services
    • Enterprise Resource Planning
    • Inventory Management Systems
    • Microsoft Access Solutions
      • Access Database Consulting
      • Access Database Design
      • Access for Rapid Data Development
      • Access Database Programming
  • Case Studies
    • ISO 9000 Documentation & Regulatory Compliance Database
    • Superfund Soil Remediation
    • OSHA Training & Certification
    • Ground Water Monitoring
    • Pest Control Reporting Engine
    • Vineyard Pest Trap Management
    • Fueling System for a Top-5 U.S. Metro Fleet
    • Payroll System for a Multi-Facility Physician Staffing Company
    • Ground Support Equipment (GSE) Management System for Airport Operations
    • (MSDS/SDS) Management System
    • Pesticide Licensing Compliance System
    • EPA Title V Air Quality Management System
  • Tech Wisdom
  • Industries We Serve
    • Custom Software Portfolio
  • Blog
  • About Us
  • Contact Us

Tag: Cloud AI Backup

Last updated: June 2026 Part 4 of 4
The previous three parts of this series got Ollama installed, configured, monitored, and ready. This final part closes the gap between "Ollama is available" and "Ollama is a reliable failover." Firewall configuration prevents accidental exposure. Auto-failover code makes the switch from cloud to local automatic. Drills and contingency procedures verify that the system actually works when needed.

A 2026 joint analysis by SentinelOne and Censys scanned the public internet for 293 days and found 175,000 unique Ollama instances exposed across 130 countries, most with no authentication and no firewall protection1. Many had tool-calling capabilities enabled, meaning attackers could not just consume the host's compute resources but potentially execute commands on the underlying system.

This is the part of the implementation that gets skipped most often. Ollama works perfectly on the developer's laptop with default settings. The production deployment that survives a security audit, runs as automated failover, and is tested regularly requires the steps in this article.

Why does Ollama need a firewall?

By default, Ollama binds only to 127.0.0.1:11434, which means localhost only2. This default is safe. The Ollama API is unreachable from other machines on the network, and no firewall configuration is strictly necessary.

The default changes the moment OLLAMA_HOST is set to 0.0.0.0:11434, which is required for any deployment where Ollama needs to serve requests from other machines (the most common business use case). At that point, the API is reachable from anywhere on the network. Without authentication and without a firewall, any user on the local network or, worse, any reachable internet host, can:

Submit arbitrary inference requests that pin the GPU for minutes at a time, effectively a denial-of-service attack against the host machine.

Exfiltrate model outputs by sending crafted prompts designed to leak training data or sensitive information that was used in fine-tuning.

Map the environment by querying the API for installed models, GPU specifications, and other host details that inform a larger attack.

Critical: Ollama has no built-in authentication. If OLLAMA_HOST is set to 0.0.0.0, anyone who can reach port 11434 can use the API. Firewall rules are the primary access control.

How is the firewall configured on Linux?

Linux has two common firewall tools. Ubuntu and Debian use ufw (Uncomplicated Firewall). Red Hat, CentOS, and Fedora use firewalld. Both achieve the same result with different syntax.

ufw on Ubuntu and Debian

The pattern is straightforward: deny port 11434 by default, then allow only the specific subnet or IP addresses that should have access.

# Enable ufw if not already enabled sudo ufw enable # Allow Ollama API access from the corporate subnet (example: 192.168.1.0/24) sudo ufw allow from 192.168.1.0/24 to any port 11434 proto tcp # Optional: allow access from a specific VPN subnet sudo ufw allow from 10.8.0.0/24 to any port 11434 proto tcp # Explicitly deny access from anywhere else to port 11434 sudo ufw deny to any port 11434 proto tcp # Check the resulting rules sudo ufw status numbered

firewalld on Red Hat, CentOS, Fedora

firewalld uses zones. The pattern is to add port 11434 to an "internal" zone that includes only trusted source addresses, and explicitly close that port in the "public" zone.

# Add trusted source to the internal zone sudo firewall-cmd --zone=internal --add-source=192.168.1.0/24 --permanent # Allow port 11434 only in the internal zone sudo firewall-cmd --zone=internal --add-port=11434/tcp --permanent # Reload to apply sudo firewall-cmd --reload # Verify sudo firewall-cmd --list-all --zone=internal

How is the firewall configured on Windows?

Windows uses Windows Defender Firewall. PowerShell as Administrator is the simplest way to configure rules consistently. The goal is the same: allow port 11434 only from trusted subnets.

# Open PowerShell as Administrator # Allow Ollama API from the corporate subnet New-NetFirewallRule -DisplayName "Ollama API - Internal" ` -Direction Inbound -Action Allow ` -Protocol TCP -LocalPort 11434 ` -RemoteAddress 192.168.1.0/24 # Block port 11434 from all other sources New-NetFirewallRule -DisplayName "Ollama API - Block External" ` -Direction Inbound -Action Block ` -Protocol TCP -LocalPort 11434 ` -RemoteAddress Any # Verify rules Get-NetFirewallRule -DisplayName "Ollama API*"

How is the firewall configured on macOS?

macOS uses pf (Packet Filter) for firewall rules. The application firewall in System Settings does not provide enough granularity for port-level control. Editing the pf configuration directly is required.

# Edit the pf configuration sudo nano /etc/pf.conf # Add these lines at the bottom # Block all incoming on port 11434 by default block in proto tcp from any to any port 11434 # Allow only the trusted subnet pass in proto tcp from 192.168.1.0/24 to any port 11434 # Load the updated configuration sudo pfctl -f /etc/pf.conf # Enable pf if not already enabled sudo pfctl -e # Check active rules sudo pfctl -sr

What additional Ollama hardening matters?

Firewall rules are the first layer. Three more environment variables and configurations reduce the attack surface further.

Restrict CORS origins

Set OLLAMA_ORIGINS to the specific frontend URLs that should be allowed to call the API from a browser. This prevents arbitrary websites from making cross-origin requests to Ollama if a user visits them while on the corporate network3.

Environment="OLLAMA_ORIGINS=https://docs.internal.corp,https://app.internal.corp"

Disable the built-in web UI in production

Ollama includes a basic web UI that exposes model metadata and lacks role-based access control. Disable it in production deployments4.

Environment="OLLAMA_NO_WEBSERVER=1"

Run Ollama as an unprivileged user

The official Linux installer already creates an ollama system user with no shell access. Verify this on existing installations and avoid running Ollama as root or as the primary user account. Resource limits via systemd cgroups prevent runaway processes from affecting the rest of the system.

Production hardening checklist

  • Firewall rules in place restricting port 11434 to trusted sources only
  • OLLAMA_ORIGINS set to specific allowed origins, not wildcard
  • OLLAMA_NO_WEBSERVER=1 set to disable the unauthenticated UI
  • Ollama running as an unprivileged system user, not root
  • Reverse proxy with authentication in front of Ollama if accessed across networks
  • Logs being collected and reviewed (see Part 3 monitoring script)
  • Disk encryption at rest for the model storage directory

How does automatic failover from cloud AI to local AI work?

The architecture is simple: a thin client library sits between the application and the AI provider. Every request goes through the client. The client tries cloud AI first, and if that fails for any reason, retries the same request against local Ollama. The application code calling the client never knows which backend served the response.

The failover client handles three cases:

Connection failure

Cloud AI endpoint is unreachable, DNS fails, or TCP connection times out. Switch to Ollama immediately.

HTTP error

Cloud AI returns 5xx status code (server error) or specific 4xx codes (rate limits, service degraded). Retry with Ollama.

Timeout

Cloud AI accepts the request but takes longer than the timeout threshold. Cancel and retry with Ollama.

A working failover client in Python

The code below is the same pattern PCG uses for production deployments. It handles all three failure modes, logs which backend served each request, and exposes a single interface that drop-in replaces direct calls to the OpenAI or Anthropic SDK.

#!/usr/bin/env python3 # ai_failover_client.py # A failover client that tries cloud AI first, then falls back to local Ollama. import requests import logging import os from typing import Optional # Configuration via environment variables CLOUD_API_URL = os.getenv("CLOUD_API_URL", "https://api.openai.com/v1/chat/completions") CLOUD_API_KEY = os.getenv("CLOUD_API_KEY") OLLAMA_URL = os.getenv("OLLAMA_URL", "http://localhost:11434/api/chat") OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "llama3.1:8b") CLOUD_TIMEOUT = int(os.getenv("CLOUD_TIMEOUT", "15")) OLLAMA_TIMEOUT = int(os.getenv("OLLAMA_TIMEOUT", "60")) logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s") def call_cloud(messages: list, model: str = "gpt-4o") -> Optional[str]: """Try the cloud AI provider. Returns response text or None on failure.""" try: r = requests.post( CLOUD_API_URL, headers={"Authorization": f"Bearer {CLOUD_API_KEY}"}, json={"model": model, "messages": messages}, timeout=CLOUD_TIMEOUT ) if r.status_code == 200: return r.json()["choices"][0]["message"]["content"] logging.warning(f"Cloud returned status {r.status_code}") return None except requests.exceptions.RequestException as e: logging.warning(f"Cloud request failed: {e}") return None def call_ollama(messages: list, model: str = OLLAMA_MODEL) -> Optional[str]: """Try the local Ollama instance. Returns response text or None on failure.""" try: r = requests.post( OLLAMA_URL, json={"model": model, "messages": messages, "stream": False}, timeout=OLLAMA_TIMEOUT ) if r.status_code == 200: return r.json()["message"]["content"] logging.error(f"Ollama returned status {r.status_code}") return None except requests.exceptions.RequestException as e: logging.error(f"Ollama request failed: {e}") return None def ai_request(messages: list, cloud_model: str = "gpt-4o") -> dict: """ Main entry point. Tries cloud first, falls back to Ollama on failure. Returns a dict with the response text and which backend served it. """ response = call_cloud(messages, model=cloud_model) if response is not None: logging.info("Served by cloud") return {"backend": "cloud", "text": response} logging.info("Cloud unavailable, falling back to Ollama") response = call_ollama(messages) if response is not None: return {"backend": "ollama", "text": response} logging.error("Both cloud and Ollama failed") return {"backend": "none", "text": None, "error": "All backends failed"} # Usage example if __name__ == "__main__": result = ai_request([ {"role": "user", "content": "Summarize the benefits of local AI in 50 words."} ]) print(f"[{result['backend']}] {result['text']}")
The key property is that the application code calling ai_request() never knows whether the response came from cloud AI or local Ollama. The failover is transparent, which is the whole point.

What is contingency mode and when does it activate?

Contingency mode is the operational state where all AI traffic routes to local Ollama by default, skipping the cloud AI attempt entirely. This is useful in two scenarios.

Known cloud outage. If the team knows the cloud provider is down (from a status page, social media, or repeated failover events in the logs), forcing contingency mode skips the wasted attempt at calling cloud AI and reduces latency for every request during the outage.

Compliance requirements. Some workflows handle data that should never touch cloud providers. Contingency mode can be enabled selectively for these workflows while other parts of the business continue using cloud AI.

Implementation is a single environment variable that the failover client checks before making any cloud request:

# In the client code, add at the top of ai_request() if os.getenv("AI_CONTINGENCY_MODE") == "true": logging.info("Contingency mode active, routing directly to Ollama") response = call_ollama(messages) if response: return {"backend": "ollama", "text": response}

How often should the failover system be tested?

Quarterly at minimum. Monthly for business-critical deployments. The test is straightforward and takes about 15 minutes.

Quarterly failover drill

  • Pick a low-traffic window (early morning, weekend, post-business hours)
  • Block outbound traffic to the cloud AI endpoint at the firewall level for 15 minutes
  • Have team members use AI-dependent workflows normally during the block
  • Verify that the failover client logged "Cloud unavailable, falling back to Ollama" for every request
  • Confirm response quality from local models was acceptable for the workflows tested
  • Confirm monitoring alerts fired correctly (the team got notified)
  • Remove the firewall block and verify automatic recovery to cloud
  • Document any failures, surprises, or workflow gaps for the next iteration

Untested failover is failover that does not work when needed. The drill exists so the team finds problems in a controlled 15-minute window, not during an actual 78-minute Anthropic outage.

Does PCG build production AI continuity systems for clients?

Phoenix Consultants Group has been building production software systems for operational continuity since 1995, and three decades of experience in environments where business-critical software cannot stop translates directly to AI infrastructure. A custom AI continuity engagement covers everything in this series as a single deliverable: hardware assessment, Ollama deployment, monitoring integration, failover client development, security hardening, and team training on the contingency procedures.

The FireFlight Data System, PCG's modular platform for operational data, uses the same engineering discipline. Continuous monitoring, automatic recovery, security defaults that assume the worst, and tested procedures for every failure mode. The Ollama deployment follows that same playbook because the goal is the same: a system that works when the team needs it most.

Need a turnkey AI continuity system?

PCG handles hardware, deployment, monitoring, failover code, security hardening, and team training as one engagement. The diagnostic call is with an engineer, not a sales tier.

Book Your Free Consultation

Frequently Asked Questions

Does Ollama need a firewall for business use?
Yes. A 2026 SentinelOne and Censys analysis found 175,000 Ollama instances exposed publicly with no authentication or firewall protection. Default Ollama binds to localhost only, which is safe. The moment OLLAMA_HOST is changed to 0.0.0.0 to allow network access, a firewall becomes mandatory. Without one, anyone reaching the host can submit inference requests, consume GPU resources, and potentially exfiltrate model outputs.
How do I configure a firewall for Ollama on Linux?
Use ufw on Ubuntu and Debian or firewalld on Red Hat and CentOS. The pattern is identical: block port 11434 from all sources by default, then explicitly allow the specific IP addresses or subnets that should reach Ollama. A single ufw command allows the corporate subnet and blocks everything else.
How does automatic failover from cloud AI to local AI work?
A small client library sits between the application and the AI provider. Each request goes to the cloud AI first. If the cloud AI returns an error, times out, or fails health checks, the same request is automatically retried against Ollama. The application sees one consistent API while the failover happens transparently. Typical implementation is 50 to 80 lines of code in any modern language.
Should failover be automatic or manual?
Automatic for most workflows. Manual failover requires someone to notice the outage and trigger the switch, which adds minutes or hours of delay. Automatic failover handles the switch in milliseconds. The exception is workflows with strict compliance or audit requirements where every model output must be logged with its source, in which case manual approval before falling back may be appropriate.
What is contingency mode for an AI continuity system?
Contingency mode is the operational state where all AI traffic routes to local Ollama instead of the cloud provider. It can be triggered automatically by repeated cloud failures or manually by an operator. While in contingency mode, the system logs all requests separately so the team has a record of what ran locally during the outage and can verify outputs after the cloud provider recovers.
How often should the failover system be tested?
Quarterly at minimum, monthly for business-critical deployments. A test drill blocks cloud AI access at the firewall level for 15 minutes during a low-traffic window. The team verifies that all applications continue working, monitors response quality from local models, and confirms that monitoring alerts fired correctly. Untested failover is failover that does not work when needed.

About the Author

Allison Woolbert

CEO and Senior Systems Architect, Phoenix Consultants Group

Allison Woolbert is the principal of Phoenix Consultants Group, the custom software consultancy founded in 1995. PCG has run legacy migration projects across Microsoft Access, Visual FoxPro, Paradox, VB6, and other discontinued platforms for industrial, manufacturing, and environmental services clients since the late 1990s.

Allison leads PCG's discovery and architecture practice, where the first deliverable on every legacy engagement is an honest inventory of what the existing application actually does and what it should do next.

LinkedIn.

Sources

1 SentinelOne and Censys joint analysis on exposed Ollama instances, early 2026: serverman.co.uk/ai/ollama/ollama-security-guide

2 Ollama default network binding documentation: github.com/ollama/ollama/blob/main/docs/faq.md

3 Ollama environment variables reference, OLLAMA_ORIGINS for CORS control: docs.ollama.com

4 Ollama production security configuration, web UI and authentication: markaicode.com/configure-ollama-firewall-rules-security

This article is informational and reflects industry observations as of June 2026. It is not legal, compliance, or financial advice for any specific situation. Phoenix Consultants Group, founded 1995, provides custom software development and AI infrastructure consulting. For guidance tailored to your organization's specific requirements, contact PCG directly.

Continue Reading

Get the full security and failover guide

This is Part 4 of a 4-part series on building an AI continuity plan with Ollama. Enter your email to unlock the rest of this article including firewall configuration for Linux, Windows, and macOS, a working Python failover client, and the testing drill that keeps the system trustworthy.

Tech Wisdom Series AI Signup

We verify your email first. One click confirms your subscription.

Recent Posts
  • The ERP That Got You Here Is the One Holding You Back
  • Your System Says It’s There. Your Team Says It’s Not. Fixing Inventory Visibility Gaps
  • The Hidden Labor Drain: Why Warehouse Teams Walk More Than They Pick
  • Why Warehouse Teams Stop Using Your ERP (And What It Actually Costs You)
  • How Do You Measure the ROI of Custom Software in the First 12 Months?
Join Our Newsletter

Drop us a line! We are here to answer your questions 24/7

NEED A CONSULTATION?

Contact Us
Phoenix Consultants Group - Custom Computer Programming
Phoenix Consultants Group is a Minority Women and Veteran Owned business
LGBT-Owned

Copyright © 2021-2026. All Rights Reserved | Phoenix Consultants Group
Privacy Policy

Solutions
  • Turning Ideas into Solutions
  • Smarter Decisions with Intelligent Data Systems
  • Custom .NET Software Development
  • Custom Application Development
  • Data Collection & Management
Data Management
  • Conversion, Migration & Integration
  • Custom Database Programming
  • Data Movement Services
  • Full Custom Data Management
  • Inventory Management Systems
Small Data Systems
  • Access Database Consulting
  • Access Database Design
  • Access Database Programming
Additional Services
  • Custom Webhosing / Websites
  • Visual Basic Legacy Programming
  • Form Design & Development
Our Company
  • About Phoenix Consultants Group
  • Contact Us
  • Our Blog & News
  • Portfolio & Projects

Subscribe

Subscribe to our mailing list and you will always be updated with the latest news.

Phoenix Consultants FacebookPhoenix Consultants LinkedIn   Phoenix Consultants Instagram