Data Collection and Management
PCG builds custom data collection and management systems for organizations at every scale, from small businesses replacing spreadsheets with a proper database to mid-size operations moving to SQL Server, to complex multi-source data warehouses. The right solution depends on how much data is generated, how many people need to access it simultaneously, what reporting and analysis it needs to support, and what security and compliance requirements apply. PCG has been building these systems since 1995.1
What questions should you answer before choosing a data collection solution?
The answers to these eight questions determine which data collection and management architecture fits your operation. PCG asks all of them at the start of every engagement. Getting the answers wrong at the scoping stage produces a system that fits the description but not the actual data problem.
Which data collection and management scale is right for your operation?
There is no universal answer. A small business replacing a spreadsheet and a mid-size manufacturer tracking production across three facilities have fundamentally different requirements for data volume, concurrent users, transaction logging, and query performance. The table below maps the three scales of data solution PCG builds, with the specific platforms and operational characteristics that apply to each.
| Factor | Small-Scale | Mid-Scale | Large-Scale |
|---|---|---|---|
| Typical users | 1 to 5 concurrent users, single location | 5 to 50 concurrent users, one or more locations | 50+ concurrent users, multi-site or enterprise |
| Data volume | Under 1 million records per table. Flat or lightly relational. | Millions of records. Fully relational across multiple tables. | Hundreds of millions of records. Multi-source. Potentially distributed. |
| Platforms | Microsoft Access, SQLite, Excel-backed systems | MySQL, Microsoft Access with SQL Server back-end, SQLite for web | MS SQL Server, Amazon AWS/RDS, Azure SQL, Oracle |
| Transaction logging | Basic. Limited rollback capability. | Moderate. Query logging and some rollback. | Full transaction logging. Complete rollback and audit trail capability. |
| Query performance | Fast for small datasets. Degrades as records grow past ~100K per table. | Strong for typical business reporting. Handles complex joins across multiple tables. | Optimized for high-volume, high-concurrency queries. Index strategies required. |
| Migration path | PCG designs small-scale systems with SQL Server migration documented from day one. | Can migrate to large-scale SQL Server or cloud platform as data volumes grow. | Full enterprise architecture. Horizontal scaling available for cloud deployments. |
| Hardware investment | Minimal. Runs on standard workstation. | Moderate. Dedicated server recommended for multi-user environments. | Substantial. Dedicated database server or cloud infrastructure required. |
| PCG typical build time | 2-6 weeks for standard deployments | 4-12 weeks depending on complexity | 8-24 weeks for full architecture build-out |
What is a data warehouse and when does your operation need one?
A data warehouse is a centralized repository that consolidates data from multiple source systems into a single location structured specifically for analysis and reporting. It is not a replacement for operational databases. It is a separate layer that pulls from them, resolves the inconsistencies between them, and presents a unified view of the organization's data for business intelligence purposes.
The decision to build a data warehouse is driven by a specific operational problem: leadership cannot get a current, accurate picture of business performance because the data that would answer their questions lives in four different systems that do not communicate. The warehouse solves that by being the single source of truth that all four systems feed.
A data warehouse is built to handle data volumes that would overwhelm operational databases. It is designed for read-heavy workloads: complex analytical queries that join across years of historical records, aggregation queries that summarize millions of transactions, and reporting queries that run across the entire dataset simultaneously.
Data from operational systems, external feeds, historical archives, and third-party platforms is extracted, transformed to a consistent format, and loaded into the warehouse on a defined schedule. The transformation step resolves the inconsistencies between source systems: different date formats, different customer ID structures, different product naming conventions.
The warehouse exposes its data to reporting tools, business intelligence platforms, compliance systems, and analytical applications through a consistent interface. Multiple applications can query the same data simultaneously without affecting the performance of the operational systems that feed it. Metadata within the warehouse documents what each data element means and where it came from.
How do you prepare to identify the right data collection solution for your business?
Most data collection problems are not solved by choosing the right software. They are solved by understanding what data the business actually needs and what it is currently doing with the data it has. PCG works through these four preparation steps with every client before recommending a solution architecture.
Audit the data you are currently collecting
Map every data source in the organization: what is being collected, where it is stored, in what format, and by whom. This audit almost always surfaces data that is being collected in multiple places in incompatible formats, data that is collected but never used, and operational questions that leadership asks regularly but the current system cannot answer. The gap between what the data can answer and what the business needs to know is the requirement for the new system.
Separate active data from dead data
Not all data in your current system needs to migrate to a new one. Dead data is data that was collected for a purpose that no longer exists, data that is so structurally inconsistent it cannot be reliably used for any analysis, and data that is retained only because nobody has decided to remove it. Migrating dead data inflates the scope and cost of the new system without adding analytical value. PCG helps identify what moves forward and what gets archived or removed.
Define the outputs before designing the inputs
The reports, dashboards, compliance documents, and analytical outputs the organization needs to produce determine what data must be collected and in what structure. A system designed without knowing what it needs to produce will require structural changes when reporting requirements become clear after deployment. PCG establishes the required outputs before designing the collection structure, not after.
Map the priority order and the data flow sequence
Not all data is equally important. Some data drives decisions that affect revenue, compliance, or operational continuity. Other data is useful but not critical. PCG helps prioritize which data elements are required from day one versus which can be added in subsequent phases, and maps the sequence in which data flows through the organization to identify where collection happens and where it needs to happen for the system to work correctly.
What data collection and management services does PCG provide?
- Full-scale to small-scale data systems. PCG builds data collection and management systems for single-user small business deployments through multi-source enterprise data warehouses. The architecture matches the actual scale of the data problem, not a predefined tier.
- Data integrity, confidentiality, and security. PCG builds access controls, encryption, audit logging, and backup procedures into every data system from the design stage. For organizations with regulatory compliance requirements under HIPAA, EPA regulations, or financial compliance frameworks, these requirements are specified during design and verified during testing.
- Secure web services for database access. Web-accessible database interfaces with role-based access control, encrypted connections, and session management. Staff query and update the database through a browser without requiring client software installation on every machine.
- Multi-user access control. Access controls defined at the record level, the field level, and the function level. The compliance officer's view of the database is different from the operations manager's view, which is different from the data entry clerk's view, because each role has different data access requirements that the system enforces automatically.
- Backup management and disaster recovery. Automated backup schedules with verified restore testing, off-site backup storage, and documented recovery procedures that define exactly how long it takes to restore the system to a specific point in time. For PCG-hosted systems, backup and recovery is included in the hosting arrangement.
- Inventory management across multiple platforms. PCG builds inventory management systems on Microsoft Access for small operations, MySQL and SQLite for web-integrated systems, and SQL Server for high-volume multi-site deployments. The platform is chosen based on the inventory operation's scale and access requirements, not on PCG's platform preferences.
- Data transformation and presentation management. Converting raw collected data into the formats required by reporting tools, compliance systems, and analytical platforms. This includes scheduled data exports, automated report generation, dashboard feeds, and API outputs for connected applications.
1 PCG data collection and management system history documented from project records across all scales and industries, 1995-2026.
2 Platform recommendations based on PCG deployment experience across small, medium, and large-scale data environments. Platform suitability thresholds reflect observed performance characteristics, not vendor specifications.
Frequently Asked Questions
Allison has been designing data collection and management systems since the early 1980s, predating PCG's founding in 1995. Her work spans every scale described on this page: small Access databases for family businesses, mid-scale SQL Server systems for manufacturing and environmental operations, and enterprise-level data collection platforms for ExxonMobil, Nabisco, and AXA Financial. The EPA pesticide inspection and case tracking system PCG built in 2004 has been in continuous production since January 2005 with practically zero downtime.
The lesson that holds across every scale: the quality of the data a system produces is determined at the collection stage, not the reporting stage. A database built on top of poorly structured data collection produces reports that require manual correction before they can be trusted. PCG builds the collection structure and the reporting structure together, so neither undermines the other.