Data Documentation

The Serava Database

How we source, structure, enrich, and score 2,000,000+ acquisition-ready businesses across 19 industries and 6 countries.

2M+
Companies
19
Industries
6
Countries
6+
Data Sources

Geographic Coverage

The database covers English-speaking markets where lower-middle-market acquisition activity is highest. Each country is sourced from multiple independent datasets to maximize completeness.

🇺🇸
United States
All 50 states, OSM + SBA + SAM sources
🇨🇦
Canada
All provinces, OSM + ODBus + Montréal registry
🇬🇧
United Kingdom
England, Scotland, Wales, N. Ireland via Companies House + OSM
🇦🇺
Australia
All states and territories via OSM
🇮🇪
Ireland
All counties via OSM
🇳🇿
New Zealand
All regions via OSM

Industry Coverage

Serava covers 19 industry verticals chosen for their prevalence in PE and search fund deal flow: fragmented, recurring-revenue businesses with owner-operated characteristics. All industries are active in all 6 covered countries.

🌡️HVAC
🔧Plumbing
Electrical
💻Managed IT / MSP
🐛Pest Control
🌿Landscaping
🏭Manufacturing
🏠Roofing
🧹Janitorial / Cleaning
🔩Auto Repair
🖌️Painting
🏊Pool and Spa
🔐Security Services
🧱Concrete and Masonry
📦Software Services
⚖️Legal and Professional
👥Staffing
🏢Facility Management
🏗️Property Management

Data Sources

Every company in the database originates from one of six public data sources. All sources are free, open-access, and updated on a regular cadence. No web scraping of private platforms is performed.

🌐

OpenStreetMap / Overpass API

Open Data

The foundation of the database. Serava runs approximately 1,820 targeted Overpass API queries, one per industry-region combination, across the US, Canada, UK, Australia, Ireland, and New Zealand. Every business tagged in OSM with a matching amenity, shop, craft, or office category is captured with its name, coordinates, address, and any available contact fields. OSM is the only global public dataset that provides named business locations at street-address precision without a paywall.

Fields captured
  • Business name
  • Address and coordinates
  • Phone and website (when available)
  • Opening hours
  • Industry tags
🇬🇧

UK Companies House Bulk Data

Open Data

Companies House publishes a full bulk CSV export of all registered UK companies updated monthly. Serava streams this file, filters for active companies incorporated at least 3 years ago with a valid postcode, maps 32 SIC 2007 industry codes to our 19 industry types, and geocodes each company using the postcodes.io API. This adds 100,000 to 300,000 additional UK businesses per import cycle with legal registration data unavailable in OSM.

Fields captured
  • Company name and registration number
  • Registered address
  • SIC industry code
  • Incorporation date
  • Company status
🇨🇦

Statistics Canada Business Register (ODBus)

Open Data

The Open Database of Businesses (ODBus) is Statistics Canada's public extract of the National Business Register. It contains business names, addresses, NAICS industry codes, and employee size class for the majority of Canadian employer businesses. Serava imports this dataset and maps Canadian NAICS codes to our industry taxonomy, supplementing OSM coverage for Canadian provinces including Quebec and the Maritime provinces where OSM tagging density is lower.

Fields captured
  • Business name
  • Province and city
  • NAICS industry code
  • Employee size band
  • Business type
🏙️

Ville de Montréal Commercial Registry

Open Data

The City of Montréal publishes a structured open-data file of all commercially registered businesses operating within the city limits. This registry captures small operators that may not appear in national datasets, adding granular coverage for one of Canada's largest commercial markets. The dataset is imported directly from Montréal's open data portal and geocoded to precise coordinates.

Fields captured
  • Business name
  • Commercial address
  • Business category
  • Registration date
🇺🇸

SBA 7(a) FOIA Loan Data

Open Data

The US Small Business Administration releases historical 7(a) loan data under FOIA. Each record contains the borrower's business name, city, state, NAICS code, loan amount, and the number of jobs retained. Serava cross-references this against existing database entries to append employee estimates and revenue proxies. Because SBA loans skew toward established, owner-operated businesses, this dataset is a strong signal of acquisition-relevant companies.

Fields captured
  • Borrower name and location
  • NAICS code
  • Loan amount (revenue proxy)
  • Jobs retained (employee proxy)
  • Approval date
🏛️

SAM.gov Federal Contractor Registry

Open Data

SAM.gov is the US government's System for Award Management, containing every business registered to receive federal contracts. These registrations include detailed NAICS codes, employee counts, annual revenue, and owner demographic information. Serava cross-references SAM records to enrich matching companies in the database with verified employee and revenue figures sourced from the business's own federal registration.

Fields captured
  • Legal business name
  • NAICS codes
  • Annual revenue (self-reported)
  • Employee count (self-reported)
  • Owner demographics

Enrichment Layer

Raw registry data tells you a business exists and where it is. Enrichment tells you who owns it, how big it is, and whether it has the financial profile worth pursuing. Four enrichment sources run on top of the base database, each filling different fields in a priority order designed to minimize API quota usage.

1

Diffbot AI Knowledge Graph

Machine learning extraction of structured business data from across the web. For each company, Diffbot identifies the most likely owner or key principal, estimates annual revenue from public signals, counts employees from LinkedIn and job boards, and extracts founding year and a business description.

Fields appended
  • Owner / principal name
  • Revenue estimate
  • Employee count
  • Founding year
  • Business description
2

Google Places API

Authoritative contact data and reputation signals from Google's business index. Matches are made by name and coordinates. When a match is confirmed, Google's verified phone number, claimed website, star rating, and review count are written to the company record.

Fields appended
  • Verified phone number
  • Website URL
  • Google star rating
  • Review count
3

Wikidata SPARQL

Free, unlimited SPARQL queries against Wikidata's structured knowledge graph. Useful for notable companies where founder, CEO, employee count, and revenue are recorded as structured facts. Serava queries Wikidata for any company whose name matches a known Wikidata entity.

Fields appended
  • Founder and CEO name
  • Official website
  • Employee count
  • Founding year
4

ProPublica Nonprofit Explorer

IRS Form 990 data for US nonprofit organizations, associations, and foundations. ProPublica's API provides total revenue and total assets as reported on the most recent 990 filing. This is particularly useful for industry associations, healthcare nonprofits, and trade schools that appear in the home services and healthcare categories.

Fields appended
  • Total revenue (Form 990)
  • Total assets
  • Tax year

Acquisition Fit Score

Every company in the database receives a composite acquisition fit score between 0 and 100. The score is designed to surface businesses that exhibit the characteristics most correlated with a motivated owner and a clean acquisition process. Higher scores rank to the top of every mandate search by default.

Owner Tenure30 pts

How many years the business has been under the same owner. Longer tenure correlates with accumulated equity, retirement motivation, and clean operational history.

Years in Business25 pts

Total operating age of the business. Older businesses have demonstrated survival through multiple economic cycles and carry lower go-forward risk.

Revenue Estimate20 pts

When available from Diffbot, SBA, or SAM sources, revenue is scored against the mandate's target range. Businesses near the centre of the range score highest.

Contact Completeness15 pts

Businesses with a verified phone number, website, and Google Places match score higher. This correlates with active operations and reachability for outreach.

Review Quality10 pts

Google star rating and review count signal customer satisfaction and market presence. High-rating, high-volume businesses indicate a stable customer base.

Deduplication Methodology

Because multiple data sources cover overlapping geographies, deduplication is applied at the database layer. Each company is identified by a composite key of its normalized name, rounded latitude, and rounded longitude. When the same business appears in both OSM and a national registry, the record is merged rather than duplicated, and any additional fields from the second source are appended to the existing record.

-- Unique constraint prevents double-counting across sources
UNIQUE INDEX (name_norm, lat_round, lng_round)
-- Conflicts are silently ignored; the first insert wins
INSERT OR IGNORE INTO companies (...) VALUES (...)

Update Cadence

The database is not a static snapshot. All data sources refresh on independent schedules, and the enrichment layer runs continuously against the companies with the highest acquisition fit scores.

OpenStreetMap (Overpass API)
~1,820 queries per run across all industry-region combinations
Quarterly full rebuild
UK Companies House
Aligned to Companies House monthly bulk data release
Monthly
Statistics Canada ODBus
ODBus is updated by Statistics Canada on an annual cycle
Annual
Ville de Montréal Registry
Open data portal updates periodically
Quarterly
SBA Loan Data
FOIA release follows US fiscal year
Annual
SAM.gov
Cross-referenced on enrichment runs
On-demand
Enrichment (Diffbot, Google, Wikidata)
Top 500 un-enriched companies by acquisition fit score per enricher run
Continuous

Search the Database

Request access to build a mandate and filter the full database against your acquisition criteria.

Request Access