Services
Analytics & MeasurementAI & Search VisibilityMarketing OptimizationAboutThinkingProductsGo Forward

The bots in your verified pool.

A patient signup count said 74. The real number was 61. The difference was sitting in the contacts everyone already trusted.

The pile everyone checks, and the pile nobody does

We were brought into a specialty launch brand’s CRM to sort out attribution — a block of verified contacts had no recorded origin, so they couldn’t be tied to a property or an audience. The spam folder was not the concern. The patient form had historically run 80 to 85 percent spam at the raw level, and that junk was being caught and suppressed. The system, everyone agreed, was working.

It was working on the pile everyone checks. Screening the verified pool — the contacts already labeled clean, the ones feeding the signup counts reported upward — turned up launch-window bots that had been classified as verified patients more than a year earlier. In the batch we were attributing, 13 of 74 “patient” signups were bots. The real count was 61.

The first question on any CRM account is not “how much spam do we have.” It is “what is hiding in the records we already trust.”

Origin and quality are different questions

Where a contact came from and whether it is real are independent axes. A bot that arrived through the patient form still legitimately originated at the patient site — its origin field is correct. What is wrong is its quality tier. The fix for spam lives in the tier field, never the origin field.

Collapsing these two into one bucket is the default error in CRM cleanups, and it is the reason most of them don’t hold. Keep origin and quality as separate fields with separate rules, and both stay auditable. Merge them and every future question — “how many real patients came from the brand site” — becomes unanswerable.

The order of operations is the method

Anyone can write a spam filter. The part that determines whether the numbers survive scrutiny is the sequence: screen, then re-tier, then attribute, then reconcile.

Screen the entire verified set first, not just the batch you were asked about — the full-set pass is what catches bots already sitting in the good pool (it caught four more here that nobody was looking for). Re-tier and suppress them before anything else touches them. Attribute origin only to the survivors; stamping origin onto a record you are about to delete is wasted motion that briefly inflates the very number you are fixing. Then reconcile to a known total, because the reconciliation is the proof of completeness.

108 unattributed “verified” contacts, fully resolved Unbranded site 21 HCP form 13 Patients, real 61 Bots, re-tiered 13 Reconciled: 21 + 13 + 61 + 13 = 108. The reconciliation is the proof.

What pharma form spam actually looks like

Pharma form spam is industry-wide, not a sign your program is uniquely targeted, and it is more patterned than most teams expect. The signatures worth screening for: last names that are the first name plus a short all-uppercase suffix; keyboard-mash email locals; placeholder identities; and the behavioral cluster of a hard bounce, a signup date inside a launch window, and zero engagement since.

No single pattern catches everything. The name rule misses the junk emails and the junk-email rule misses the fabricated names, so run both passes. And exposure varies enormously by design: an open patient DTC form sees a different world than an HCP-gated or rep-driven system, and platform mechanics differ across CRMs. The honest version of this method names that range.

Bad data is a capture problem wearing a cleanup costume

Every bad record traces upstream to something: a form with no bot protection, hidden attribution fields that got stripped in a rebuild, UTM capture that quietly broke. A cleanup without a capture fix just resets the clock — the same contamination accumulates again, on a schedule you can predict.

The capture-side work is unglamorous and decisive: score-based bot protection rather than image challenges — parts of your patient population cannot reliably pass visual puzzles, which makes this an accessibility requirement in pharma, not a preference — plus hidden fields that carry attribution at submit, and automations that stamp origin the moment a contact is born so no record starts life unattributed.

The deliverable is the honest number

What the client needed was not a clean list or a dashboard. It was a defensible sentence: 61 real signups, not 74, and here is the screen and the reconciliation behind it. In pharma the CRM has to carry that weight, because HIPAA keeps identifiable people out of analytics — your web numbers are directional, and the CRM is the conversion source of truth. If the source of truth is inflated, every downstream number inherits the inflation.

One practical note that saves real pain: when a corrected baseline is about to surface, brief your media partners first. A spam correction that lands unannounced reads as a performance drop, and you will spend a quarter explaining it.

Five checks for your own account

  1. Count the verified contacts with no origin. Filter your verified tiers for a blank source field. That number should be zero, and it should stay zero.
  2. Screen the whole verified pool, not just new arrivals. The dangerous contamination is old, trusted, and already inside your reported totals.
  3. Audit the capture path. Bot protection on every form, hidden attribution fields present, and both confirmed to have survived the last form rebuild.
  4. Confirm origin and quality are separate fields. If one field is doing both jobs, your next cleanup will not hold either.
  5. Pressure-test the reported number. If your verified count deflated 15 to 20 percent tomorrow, which numbers already reported upward would change — and who should hear it from you first?

A pharma CRM is a measurement asset, not a marketing list. Treat the numbers in it as claims to verify, not facts to report.

More from Thinking: The 76% conversion rate that wasn’t · You can rank #1 on Google and still be invisible in AI search

← All thinking