How we source, structure, enrich, and score 2,000,000+ acquisition-ready businesses across 19 industries and 6 countries.
The database covers English-speaking markets where lower-middle-market acquisition activity is highest. Each country is sourced from multiple independent datasets to maximize completeness.
Serava covers 19 industry verticals chosen for their prevalence in PE and search fund deal flow: fragmented, recurring-revenue businesses with owner-operated characteristics. All industries are active in all 6 covered countries.
Every company in the database originates from one of six public data sources. All sources are free, open-access, and updated on a regular cadence. No web scraping of private platforms is performed.
The foundation of the database. Serava runs approximately 1,820 targeted Overpass API queries, one per industry-region combination, across the US, Canada, UK, Australia, Ireland, and New Zealand. Every business tagged in OSM with a matching amenity, shop, craft, or office category is captured with its name, coordinates, address, and any available contact fields. OSM is the only global public dataset that provides named business locations at street-address precision without a paywall.
Companies House publishes a full bulk CSV export of all registered UK companies updated monthly. Serava streams this file, filters for active companies incorporated at least 3 years ago with a valid postcode, maps 32 SIC 2007 industry codes to our 19 industry types, and geocodes each company using the postcodes.io API. This adds 100,000 to 300,000 additional UK businesses per import cycle with legal registration data unavailable in OSM.
The Open Database of Businesses (ODBus) is Statistics Canada's public extract of the National Business Register. It contains business names, addresses, NAICS industry codes, and employee size class for the majority of Canadian employer businesses. Serava imports this dataset and maps Canadian NAICS codes to our industry taxonomy, supplementing OSM coverage for Canadian provinces including Quebec and the Maritime provinces where OSM tagging density is lower.
The City of Montréal publishes a structured open-data file of all commercially registered businesses operating within the city limits. This registry captures small operators that may not appear in national datasets, adding granular coverage for one of Canada's largest commercial markets. The dataset is imported directly from Montréal's open data portal and geocoded to precise coordinates.
The US Small Business Administration releases historical 7(a) loan data under FOIA. Each record contains the borrower's business name, city, state, NAICS code, loan amount, and the number of jobs retained. Serava cross-references this against existing database entries to append employee estimates and revenue proxies. Because SBA loans skew toward established, owner-operated businesses, this dataset is a strong signal of acquisition-relevant companies.
SAM.gov is the US government's System for Award Management, containing every business registered to receive federal contracts. These registrations include detailed NAICS codes, employee counts, annual revenue, and owner demographic information. Serava cross-references SAM records to enrich matching companies in the database with verified employee and revenue figures sourced from the business's own federal registration.
Raw registry data tells you a business exists and where it is. Enrichment tells you who owns it, how big it is, and whether it has the financial profile worth pursuing. Four enrichment sources run on top of the base database, each filling different fields in a priority order designed to minimize API quota usage.
Machine learning extraction of structured business data from across the web. For each company, Diffbot identifies the most likely owner or key principal, estimates annual revenue from public signals, counts employees from LinkedIn and job boards, and extracts founding year and a business description.
Authoritative contact data and reputation signals from Google's business index. Matches are made by name and coordinates. When a match is confirmed, Google's verified phone number, claimed website, star rating, and review count are written to the company record.
Free, unlimited SPARQL queries against Wikidata's structured knowledge graph. Useful for notable companies where founder, CEO, employee count, and revenue are recorded as structured facts. Serava queries Wikidata for any company whose name matches a known Wikidata entity.
IRS Form 990 data for US nonprofit organizations, associations, and foundations. ProPublica's API provides total revenue and total assets as reported on the most recent 990 filing. This is particularly useful for industry associations, healthcare nonprofits, and trade schools that appear in the home services and healthcare categories.
Every company in the database receives a composite acquisition fit score between 0 and 100. The score is designed to surface businesses that exhibit the characteristics most correlated with a motivated owner and a clean acquisition process. Higher scores rank to the top of every mandate search by default.
How many years the business has been under the same owner. Longer tenure correlates with accumulated equity, retirement motivation, and clean operational history.
Total operating age of the business. Older businesses have demonstrated survival through multiple economic cycles and carry lower go-forward risk.
When available from Diffbot, SBA, or SAM sources, revenue is scored against the mandate's target range. Businesses near the centre of the range score highest.
Businesses with a verified phone number, website, and Google Places match score higher. This correlates with active operations and reachability for outreach.
Google star rating and review count signal customer satisfaction and market presence. High-rating, high-volume businesses indicate a stable customer base.
Because multiple data sources cover overlapping geographies, deduplication is applied at the database layer. Each company is identified by a composite key of its normalized name, rounded latitude, and rounded longitude. When the same business appears in both OSM and a national registry, the record is merged rather than duplicated, and any additional fields from the second source are appended to the existing record.
The database is not a static snapshot. All data sources refresh on independent schedules, and the enrichment layer runs continuously against the companies with the highest acquisition fit scores.
Request access to build a mandate and filter the full database against your acquisition criteria.
Request Access