
How we test contact data quality
By Ben Argeband, Founder & CEO of Heartbeat.ai — Keep calm, technical enough to be credible, not too technical.
What’s on this page:
Who this is for
This is for buyers, reviewers, and ops leaders validating vendors who need a defensible, repeatable test—not a debate. If you manage recruiter throughput, connectability, deliverability, and compliance risk, this is the pilot structure we use and recommend.
You’ll leave with: (1) locked metric definitions and logging fields, (2) a fair two-week pilot design, and (3) a copy/paste pilot report template you can hand to procurement and leadership.
Quick Answer
- Core Answer
- We test contact data quality by running a controlled two-week pilot, logging every attempt with fixed denominators, and separating phone connectability from email deliverability outcomes.
- Key Statistic
- Heartbeat observed typicals: connect rate ~10% (connected calls / total dials); mobile accuracy 82% (first mobile; correct first mobile / first mobiles tested); email accuracy 95% (verification pass: verified emails / emails tested; separate from Deliverability Rate).
- Best For
- Buyers, reviewers, and ops leaders validating vendors.
Compliance & Safety
This method is for legitimate recruiting outreach only. Always respect candidate privacy, opt-out requests, and local data laws. Heartbeat does not provide medical advice or legal counsel.
Framework: “Pilot, don’t argue”: measure outcomes instead of debating vendors
Vendor conversations go sideways when each side uses different definitions, different samples, and different logging. Our rule is simple: definitions first, then run a 2-week pilot, then decide based on outcomes you can audit.
- Definitions first: lock formulas, denominators, and required fields before you pull records.
- Separate signals: phone and email fail differently; measure them differently.
- Limits and variability: document what the pilot can’t prove so nobody over-promises.
The trade-off is… you spend more time on instrumentation up front, but you stop paying for “accuracy” you can’t operationalize or defend.
Step-by-step method
Step 1: Lock metric definitions (and keep denominators fixed)
If your denominators drift, your pilot becomes a story, not a measurement. Use these canonical definitions and keep them identical across vendors, time windows, and recruiters:
- Connect Rate = connected calls / total dials (per 100 dials). Connected means the call completes to ringing, voicemail system, IVR, or a human (not necessarily a human answer).
- Answer Rate = human answers / connected calls (per 100 connected calls).
- Deliverability Rate = delivered emails / sent emails (per 100 sent emails).
- Bounce Rate = bounced emails / sent emails (per 100 sent emails).
- Reply Rate = replies / delivered emails (per 100 delivered emails).
If you want the extended definitions and examples, keep them centralized and link them into your SOPs: accuracy and metrics definitions.
Step 2: Create a pilot data dictionary (minimum required fields)
Do not start outreach until you can log one row per attempt (dial or email). Here’s the minimum field set that makes results auditable:
- Identity: NPI (preferred), first/last name, specialty (as you use it), state.
- Source tagging: vendor_name, source_batch_id, pull_date.
- Channel attempt: attempt_id, channel (phone/email), timestamp (local time), recruiter_id.
- Phone outcomes: dialed (Y/N), connected (Y/N), answered_human (Y/N), disposition (voicemail/IVR/wrong party/opt-out/etc.).
- Email outcomes: sent (Y/N), delivered (Y/N), bounced (Y/N), bounce_type (hard/soft), replied (Y/N), reply_type (positive/neutral/negative/auto-reply).
- Suppression: opted_out (Y/N), do_not_call (Y/N), do_not_email (Y/N), suppression_reason.
Sample CSV header (attempt log): npi,vendor_name,source_batch_id,pull_date,recruiter_id,channel,attempt_id,timestamp_local,number_type,dialed,connected,answered_human,phone_disposition,sent,delivered,bounced,bounce_type,replied,reply_type,opted_out,do_not_call,do_not_email,suppression_reason
If you need a starting point for field naming and reporting, align your pilot fields to your long-term schema: ATS field schema for outreach metrics.
Step 3: Build a fair sample (so vendors are comparable)
Bad samples create fake wins. A fair sample is representative of what you actually recruit and comparable across vendors.
- Anchor identity to NPI where possible so you can dedupe across sources and avoid double-counting.
- Match the mix: same specialties, states, and role types you recruit.
- Pull at the same time: request exports on the same day to reduce decay bias.
- Apply the same suppression rules: remove prior opt-outs, known bad domains, and internal test addresses across all arms.
Step 4: Phone testing—what we test (and what we don’t)
Phone quality is not one number. We test:
- Connectability: does the call connect at all (ringing, voicemail system, IVR, or human)? This drives Connect Rate.
- Human answer likelihood: among connected calls, how often do you reach a person? This drives Answer Rate.
- Wrong-party risk: are you reaching the intended provider vs a clinic main line or unrelated person?
- Compliance handling: opt-outs and do-not-call flags are logged and honored.
For Heartbeat.ai specifically, we also evaluate whether we can provide ranked mobile numbers by answer probability. That’s a prioritization signal, not a promise—your cadence and local time windows still matter.
Step 5: Email testing—what we test (and what we don’t)
Email outcomes are a system result: your sending domain + your content + your list + recipient filters. In a pilot, we focus on measurable outcomes you can audit:
- Deliverability Rate and Bounce Rate (per 100 sent emails) to quantify acceptance vs rejection.
- Reply Rate (per 100 delivered emails) to separate list quality from sending volume.
We do not treat “delivered” as “in inbox.” Delivered means accepted by the receiving system; inbox placement is a separate layer.
Step 6: Run a 2-week pilot (controlled execution)
Here’s the structure that keeps pilots honest and fast:
- Setup (before Day 1): finalize definitions, fields, suppression rules, and sampling. Confirm your compliance review path (phone and email).
- Week 1 (baseline): run the same outreach play across Vendor A/B (or A vs current). Keep recruiter behavior consistent.
- Week 2 (repeat): repeat the same play to check stability and variability across days and time windows.
Fairness controls (do not skip):
- Same local-time call blocks and same day-of-week coverage across vendor arms.
- Same caller ID strategy and same number of attempts per record.
- Same email copy, same cadence, same sender domain.
- Same suppression list applied to all arms, with change control if updates are required.
Step 7: Analyze results with denominator discipline (and variance checks)
Report phone and email separately, and keep denominators fixed:
- Phone: Connect Rate (connected calls / total dials, per 100 dials) and Answer Rate (human answers / connected calls, per 100 connected calls).
- Email: Deliverability Rate (delivered / sent, per 100 sent), Bounce Rate (bounced / sent, per 100 sent), Reply Rate (replies / delivered, per 100 delivered).
Measure this by… exporting a daily attempt log with one row per dial and one row per email send, then computing each metric from those rows (not from a dashboard summary). Keep a change log for any suppression or cadence updates.
Decision logic (no thresholds):
- Stability: results should be directionally consistent across both weeks, not a one-day spike.
- Not recruiter-driven: improvements should hold across recruiters, not just one outlier.
- Wrong-party reviewed: spot-check “wins” to confirm you reached the intended person.
- Suppression honored: opt-outs and do-not-contact flags must be enforced across all vendor arms.
Step 8: Document what is not tested (and what is not guaranteed)
- Not guaranteed: future performance. Contact data decays and workflows change.
- Not guaranteed: inbox placement. Delivered means accepted, not necessarily inboxed.
- Not guaranteed: reply rates, interviews, or placements. Those depend on comp, schedule, message, and recruiter execution.
- Not guaranteed: identity match without verification. A connected call can still be wrong-party.
Diagnostic Table:
| Metric | Formula | Denominator | What to log | Common mistakes | Where it shows up |
|---|---|---|---|---|---|
| Connect Rate | connected calls / total dials | Per 100 dials | attempt_id, dialed=Y, connected=Y/N, vendor_name, recruiter_id, local_time_block | Using “answered” as connected; excluding no-answers from dials | Connect Rate vs Answer Rate |
| Answer Rate | human answers / connected calls | Per 100 connected calls | answered_human=Y/N, disposition (voicemail/IVR/wrong party), local_time_block | Dividing by total dials; mixing voicemail with human answers | Answer Rate definition |
| Deliverability Rate | delivered emails / sent emails | Per 100 sent emails | sent=Y, delivered=Y/N, sending_domain, vendor_name, timestamp | Counting opens as delivered; ignoring soft bounces | Metrics definitions |
| Bounce Rate | bounced emails / sent emails | Per 100 sent emails | bounced=Y/N, bounce_type (hard/soft), reason_code, vendor_name | Only counting hard bounces; changing suppression mid-pilot | Reply + deliverability tracking |
| Reply Rate | replies / delivered emails | Per 100 delivered emails | replied=Y/N, reply_type, time_to_reply, vendor_name | Dividing by sent; counting auto-replies as replies | Reply Rate tracking |
Denominator discipline worksheet (uniqueness hook): For every metric you report, store (1) the numerator event, (2) the denominator event, and (3) the raw attempt log that produced both. If you can’t trace a dashboard number back to attempt rows, don’t use it in vendor selection.
Internal-link map note: Keep definitions centralized and link out instead of redefining them in every SOP. This prevents “metric drift” across teams and pages.
Weighted Checklist:
Use this to score any vendor (including Heartbeat.ai) during a pilot. Weighting forces trade-offs into the open.
| Category | What “good” looks like | Weight | Score (1–5) | Evidence to attach |
|---|---|---|---|---|
| Definitions + logging | Agrees to your formulas/denominators; supports NPI dedupe; source tagging per record | 25% | Pilot data dictionary + sample export | |
| Phone outcomes | Higher connected calls per 100 dials without inflating wrong-party connections | 25% | Attempt log with dispositions | |
| Email outcomes | Higher delivered per 100 sent; lower bounces; clean suppression guidance | 20% | Send/deliver/bounce export + suppression rules | |
| Workflow fit | Field mapping, audit trail, opt-out handling, easy enrichment into ATS/CRM | 15% | Mapping doc + screenshots | |
| Compliance posture | Clear opt-out support, provenance, and outreach constraints documented | 15% | Policy docs + DPA + suppression workflow |
Sample pilot report template (copy/paste)
Header
- Objective: Decide whether Vendor A/B improves phone and/or email outcomes in our recruiting workflow.
- Scope: specialties, states, role types, date range, channels (phone/email).
- Identity key: NPI match rules and dedupe rules.
- Suppression: opt-outs, do-not-call/do-not-email, known bad domains, internal exclusions.
Definitions (paste the exact formulas)
- Connect Rate = connected calls / total dials (per 100 dials)
- Answer Rate = human answers / connected calls (per 100 connected calls)
- Deliverability Rate = delivered emails / sent emails (per 100 sent emails)
- Bounce Rate = bounced emails / sent emails (per 100 sent emails)
- Reply Rate = replies / delivered emails (per 100 delivered emails)
Execution controls
- Call windows (local time): [list your blocks]
- Email cadence: [list your steps]
- Recruiter assignment rules: [how records were distributed]
- Change log: [any changes with date/time]
Results table skeleton
| Metric | Vendor A | Vendor B | Notes (variance, segments, anomalies) |
|---|---|---|---|
| Connect Rate (per 100 dials) | |||
| Answer Rate (per 100 connected calls) | |||
| Deliverability Rate (per 100 sent emails) | |||
| Bounce Rate (per 100 sent emails) | |||
| Reply Rate (per 100 delivered emails) |
Decision
- What improved: [phone/email/both]
- What did not improve: [be explicit]
- Limits and variability: [what the pilot did not test; what could have biased results]
- Next action: rollout / extend pilot / stop
Outreach Templates:
These are designed for pilots: short, consistent, and easy to compare across vendor arms. Keep content constant so you’re testing data quality, not copywriting.
Phone voicemail (15–20 seconds)
Script: “Hi Dr. [Last Name]—this is [Name] with [Org]. I’m calling about a [role] opportunity in [location]. If you’re open to a quick chat, call or text me at [number]. If not, tell me and I’ll stop.”
Log fields: connected (Y/N), answered_human (Y/N), disposition, wrong_party (Y/N), opted_out (Y/N).
Email #1 (plain text)
Subject: Quick question — [role] in [location]
Body: “Dr. [Last Name], I recruit for [Org]. Are you open to a brief call about a [role] role in [location]? If not, reply ‘no’ and I’ll close the loop.”
Log fields: sent (Y/N), delivered (Y/N), bounced (Y/N), replied (Y/N), reply_type, opted_out (Y/N).
Email #2 (follow-up, 3–5 business days later)
Subject: Re: [role] in [location]
Body: “Bumping this in case it got buried. If you’re not interested, reply ‘pass’ and I won’t follow up again.”
Log fields: same as Email #1; keep cadence identical across vendor arms.
Common pitfalls
- Mixing metrics: treating “connected” and “answered” as the same. Keep Connect Rate (per 100 dials) separate from Answer Rate (per 100 connected calls) or you can’t diagnose the bottleneck.
- Denominator drift: excluding certain dispositions, only counting “qualified” replies, or changing suppression rules mid-pilot without a change log.
- Email pilot without sender hygiene: if SPF/DKIM/DMARC aren’t aligned, you’ll blame the list for a sending problem.
- Wrong-party inflation: counting clinic switchboards or unrelated people as “success.” Spot-check identity using NPI matching and recruiter notes.
- Static list thinking: buying static lists is risky because of decay. The modern standard is Access + Refresh + Verification + Suppression.
Disposition taxonomy (use the same labels across vendors)
- Phone: not_connected, connected_voicemail, connected_ivr, connected_human, wrong_party, opt_out
- Email: delivered, hard_bounce, soft_bounce, reply_positive, reply_neutral, reply_negative, auto_reply, opt_out
How to improve results
Improve phone outcomes without just “dialing more”
- Segment by number type: analyze mobile vs landline separately so you can see where connectability breaks.
- Time-window discipline: compare Connect Rate and Answer Rate by local time block and keep blocks consistent across vendor arms.
- Disposition hygiene: train recruiters on consistent dispositions so your metrics remain comparable week to week.
Improve email outcomes by separating list quality from sending quality
- Pre-flight your domain: confirm SPF/DKIM/DMARC alignment and monitor reputation signals.
- Use suppression aggressively: suppress hard bounces, opt-outs, and non-personal role accounts.
- Keep content stable during the pilot: you’re testing data quality first.
Measurement instructions (required)
- Export daily: one row per dial and one row per email send, including vendor_name, attempt_id, timestamp, and outcome fields.
- Compute metrics with fixed denominators:
- Connect Rate per 100 dials = (connected calls / total dials) * 100
- Answer Rate per 100 connected calls = (human answers / connected calls) * 100
- Deliverability Rate per 100 sent emails = (delivered emails / sent emails) * 100
- Bounce Rate per 100 sent emails = (bounced emails / sent emails) * 100
- Reply Rate per 100 delivered emails = (replies / delivered emails) * 100
- Variance check: compute each metric by day and by recruiter; investigate outliers before making a vendor decision.
- ROI linkage (optional): keep contact-quality metrics separate from funnel outcomes (screens/submittals/interviews). If you want a structured approach, see how to measure contact data ROI.
Legal and ethical use
Run pilots with the same compliance posture you’d use in production.
- Phone: follow applicable rules and internal policy for outreach, especially for mobile numbers. Maintain opt-out and do-not-call handling. Reference: FCC TCPA overview.
- Email: include required identification elements, honor opt-outs promptly, and avoid deceptive headers/subjects. Reference: FTC CAN-SPAM guide.
- Opt-out precedence: suppression takes priority over any vendor-provided contact. If someone opts out, suppress them across all vendor arms.
- Do not: bypass opt-outs, harass candidates, or use the pilot as a pretext to spam.
Evidence and trust notes
This page is part of our trust cornerstone. For how we think about sourcing, verification, suppression, and reporting standards across the platform, see our trust methodology hub.
Implementation Notes
- Monitor sending reputation: use Google Postmaster Tools to watch domain reputation and delivery signals during pilots.
- Sender authentication basics: confirm SPF is set correctly (and align DKIM/DMARC as applicable). Reference: Google Workspace Admin: Set up SPF.
- Email compliance context: CAN-SPAM compliance guide.
- Phone compliance context: FCC TCPA overview.
If you’re comparing vendors, use a structured rubric so you don’t accidentally reward “easy-to-count” metrics over workflow outcomes. See how to evaluate provider contact data vendors and data quality verification.
FAQs
What’s the difference between phone connectability and phone answer outcomes?
Connectability is whether the call connects at all (Connect Rate = connected calls / total dials, per 100 dials). Answer outcomes are what happens after connection (Answer Rate = human answers / connected calls, per 100 connected calls). You need both to diagnose the bottleneck.
What does “delivered” mean for email?
Delivered means the receiving system accepted the message (Deliverability Rate = delivered emails / sent emails, per 100 sent emails). It does not guarantee inbox placement.
Can a vendor guarantee contact data quality?
No. Contact data changes constantly, and outcomes depend on your outreach execution and compliance constraints. A credible vendor will show how they measure, refresh, verify, and suppress—then let you validate in your workflow.
How do we run a fair 2-week pilot across two vendors?
Pull samples at the same time, dedupe by NPI where possible, apply the same suppression rules, keep outreach content and cadence constant, and log one row per attempt with consistent dispositions. Then compare metrics by day and by recruiter.
Which metrics should procurement care about?
Procurement should care about metrics that map to operational outcomes and can be audited: Connect Rate (per 100 dials), Answer Rate (per 100 connected calls), Deliverability Rate and Bounce Rate (per 100 sent emails), and Reply Rate (per 100 delivered emails). Pair those with documented limits and variability.
Next steps
- Copy the pilot data dictionary and add the fields to your ATS/CRM (or a pilot spreadsheet). Start with ATS field schema for outreach metrics.
- Run the sample pilot report template above and attach your attempt logs so the result is defensible.
- If you want to test Heartbeat.ai in your workflow, start here: create a Heartbeat.ai account.
About the Author
Ben Argeband is the Founder and CEO of Swordfish.ai and Heartbeat.ai. With deep expertise in data and SaaS, he has built two successful platforms trusted by over 50,000 sales and recruitment professionals. Ben’s mission is to help teams find direct contact information for hard-to-reach professionals and decision-makers, providing the shortest route to their next win. Connect with Ben on LinkedIn.