19 Appendix A — Canonical Values & Methodology
Appendix A · Key Metrics Reference
This appendix documents every canonical value used in the analysis, the methodology behind each, and reconciliation notes where values differ from earlier drafts. Its purpose is to let a reviewer verify any figure in the report against its source computation.
Account count distinction. Two counts exist for the UK segment: sales-side (3,914 accounts with at least one sales row) and full-pool (3,940 accounts including those with only orphan cancellation rows). The sales-side count applies where the analysis references accounts that placed an order. The full-pool count applies where it references the segment universe. The net-revenue identity holds under either framing.
19.0.1 A.1 — Cleaning rules applied
The cleaning rules applied to the raw UCI Online Retail Dataset are documented in full in appendix_a/01_cleaning_rules.md. In summary:
- ‘C’ prefix credit notes net against gross revenue at the segment and account level (aggregate netting, not invoice-level matching).
- ‘A’ prefix adjustments (3 rows) are excluded entirely from revenue.
- Negative quantities on non-‘C’ invoices are excluded.
- Zero or negative unit prices are excluded.
- Non-product stock codes (
POST,DOT,D,M,CRUK,BANK CHARGES,AMAZONFEE,PADS,S,SAMPLES, and strayDCGSSGIRL/DCGSSBOY) are excluded from product revenue. Postage is excluded from product revenue. - Flagged accounts 16446 and 12346 are quarantined from segment totals.
Country = "Unspecified"is not part of UK or International; rows withCustomerIDpresent andCountry = "Unspecified"form a separate Identified-Unspecified bucket (4 accounts).- December 2011 (partial month, dataset ends 2011-12-09) is included in full-period totals but flagged where trend comparability matters. H1→H2 growth uses a comparable Jul–Nov H2 window.
- Duplicate rows (5,268 exact duplicates across all 8 columns) are removed.
- Segment assignment: UK =
CustomerIDpresent ANDCountry = "United Kingdom"; International =CustomerIDpresent ANDCountry ∉ {"United Kingdom", "Unspecified"}; Anonymous =CustomerIDmissing.
19.0.2 A.2 — Segment revenue and counts
| Metric | UK | International | Anonymous |
|---|---|---|---|
| Gross revenue | £7.00M | £1.50M | £1.51M |
| Cancellations | £194,657 | £31,491 | £4,165 |
| Net revenue | £6,804,018 | £1,465,432 | £1,506,585 |
| Cancellation rate | 2.78% | 2.10% | n/a (invoice-level) |
Account count (distinct customer_id) |
3,914 | 414 | 1,371 invoices |
Total gross revenue (all buckets): £10.25M = UK gross (£7.00M) + International gross (£1.50M) + Anonymous gross (£1.51M) + Identified-Unspecified gross (£2.66K) + Quarantined gross (£245.66K). This identity ties cleanly; mixing net and gross components (as earlier drafts did) does not.
Identified-Unspecified bucket: This bucket contains four accounts with £2,661 gross. Reported separately per cleaning Rule 7. Quarantined accounts (16446 and 12346 combined): raw gross £245,656; raw net £2.90. See Appendix B.
19.0.3 A.2a — Segment scope: sales-side vs full-pool counts
Two customer/invoice counts are used in this series, answering different questions. Both are derived from the canonical cleaned dataset per Rule 1 (aggregate netting at segment scope).
| Segment | Sales-side count (placed at least one order) | Full-pool count (sales + orphan-cancel activity) | Orphan-cancel delta | Orphan-cancel value |
|---|---|---|---|---|
| UK Identified | 3,914 customers | 3,940 customers | +26 customers | £3.69K in cancellations |
| International Identified | 414 customers | 416 customers | +2 customers (IDs 12605, 12666) | £234.94 in cancellations |
| Anonymous | 1,371 invoices | 1,410 invoices | +39 invoices | £4,165.45 in cancellations |
Why two counts. Under Rule 1 aggregate netting, segment net revenue is computed as segment gross minus every cancellation attributed to the segment — including cancellations whose originating sales row falls outside the dataset window or is otherwise absent from the cleaned sales frame. These rows are legitimate segment-attributable credit notes: they reduce segment net revenue and, in a full-pool view, raise the segment’s customer/invoice count. Per Rule 1 they are not dropped.
What these orphan cancellations represent. The transaction data alone cannot determine which case any individual orphan cancellation is. Several possibilities are consistent with the data:
- The originating sale was placed before the dataset window opened (December 2010), and the credit note was issued within the window. The dataset begins December 2010, so any pre-window sales rows are not visible.
- The originating sale was captured without a CustomerID and later credited under a CustomerID — common for retail-floor or phone-order workflows where the customer is identified only at the return.
- The credit note was logged against the wrong customer ID or wrong invoice reference (data entry error).
- The originating sale was archived or pruned from this dataset extract before the cleaned data was produced.
These are candidate explanations, not findings. A pre-close due diligence audit should verify which of these is operationally accurate by reconciling against the seller’s order management system.
When each count applies.
- Sales-side count applies where narrative or analysis references accounts that placed an order — retention curves, cohort analysis, per-account LTV, frequency tiers, overdue-customer detection. These computations require at least one sales row per customer, so orphan-cancel customers are out of scope by definition.
- Full-pool count applies where narrative or analysis references the segment universe or the segment net revenue base — segment totals, concentration percentages, Pareto curves, segment-level cancellation rates. Excluding orphan-cancel rows in these contexts would understate segment net revenue and segment universe size.
Net-revenue identity. The segment-level gross − cancel = net identity holds under either framing because cancel is always the full segment-attributable total (Rule 1 scope). Per-customer aggregation affects only the account count, not the segment total.
Cross-references.
- Rule 1 is defined in
appendix_a/01_cleaning_rules.md(and mirrored in thecleaning_rules.mdaudit document at the project root). - Canonical values live in
segment_gross_canonicalwithinR_helpers/ltv_helpers.R. - In [Chapter 13 (Concentration)](s4_concentration.qmd), the UK concentration section renders the full-pool count (3,940) with a footnote pointing back to this subsection. All other UK-account citations across the report (narrative text in the Overview, Customer Base Profile, Retention & Growth, and segment program chapters) use the sales-side count (3,914) because they describe accounts with order activity.
19.0.4 A.3 — AOV (order value)
AOV is the median or mean invoice value in the segment.
| Segment | Median AOV | Mean AOV |
|---|---|---|
| UK | £298.58 | n/a (reported in source file) |
| International | £391.34 | n/a (reported in source file) |
| Anonymous | £361.97 | £1,101.93 |
19.0.5 A.4 — Retention and acquisition
| Metric | UK | International |
|---|---|---|
| Accounts with ≥1 order | 3,914 | 414 |
| Accounts with ≥2 orders | 2,562 | 262 |
| 1→2 retention rate | 65.46% | 63.29% |
| One-timer count | 1,352 | 152 |
| One-timer rate | 34.54% | 36.71% |
New-customer acquisition trend (UK, Jan–Aug 2011): Monthly counts ran 356 → 340 → 419 → 277 → 255 → 214 → 168 → 141. Literal Jan→Aug decline: 60.39% (356 → 141). Peak-to-trough (Mar 2011 peak of 419 → Aug 2011 trough of 141): 66.35%.
19.0.6 A.5 — Temporal (H1 vs H2 2011)
| Period | Gross (identified) |
|---|---|
| H1 2011 (Jan–Jun) | £3,278,981 |
| H2 2011 comparable (Jul–Nov) | £4,306,958 |
| H2 2011 full (Jul–Dec 9, asymmetric) | £4,651,168 |
H1→H2 growth (comparable window Jul–Nov): 31.35%.
19.0.7 A.6 — Concentration
| Metric | Value |
|---|---|
| International segment total net revenue | £1,465,432 |
| Top-4 combined net | £649,645 |
| Top-4 % of International segment net | 44.33% |
| Top-10 % of International segment net | 52.25% |
| Top-4 account IDs (descending net) | 14646, 14911, 12415, 14156 |
| UK Pareto-80 (accounts to reach 80% of net) | 1,115 |
| International Pareto-80 | 84 |
| Anonymous Pareto-80 (invoice-level) | 371 |
19.0.8 A.7 — 3-year LTV (chain model, Approach A)
Formula:
LTV_3yr = AOV_median × orders_per_customer_median × (1 + r + r²)
where AOV_median is the segment median invoice value, orders_per_customer_median is the segment median of distinct invoices per customer, and r is the 1→2 retention rate.
| Segment | AOV_median | orders_per_customer (median) | r | 3-yr LTV |
|---|---|---|---|---|
| UK | £298.58 | 2 | 65.46% | £1.24K |
| International | £391.34 | 2 | 63.29% | £1.59K |
LTV sensitivity note. The chain LTV formula uses median orders per customer as the per-year order volume (UK: 2, International: 2). Using mean orders per customer would produce materially different figures — UK mean orders is higher than the median because of a long tail of high-frequency wholesale accounts. The median-based formulation is preferred for portfolio-readable figures; the mean-based alternative is appropriate if the target reader is pricing acquisition economics against top-decile accounts. Earlier drafts of this series cited a £3,680 figure from a steady-state model — that figure is not reproduced here because it was a methodological outlier.
Incremental LTV per converted account (Y2+Y3): UK £646.75; International £808.78.
19.0.9 A.8 — Data quality
| Metric | Value |
|---|---|
| Anonymous invoices (orders missing CustomerID) | 1,371 |
| Share of all product invoices | 6.93% |
| Gross revenue attached | £1.51M |
| Products with cancel rate > 100% (flag-for-review) | 5 stock codes |
| Raw rows (pre-cleaning) | 541,909 |
| Product sales rows after cleaning | 522,686 |
| Product cancellation rows after cleaning | 8,670 |
Rows removed by cleaning rules (by rule): Rule 9 (deduplication) 5,268; Rule 2 (‘A’ adjustments) 3; Rule 3 (negative qty non-‘C’) 1,336; Rule 4 (non-positive price) 2,496; Rule 5 (non-product sales) 2,191; Rule 5 (non-product cancels) 581.
19.0.10 A.9 — Reconciliation notes
The following figures differ from values used in earlier drafts of this series. The values below are computed from the cleaned dataset and are the authoritative reference:
- UK gross revenue: earlier drafts cited £7.26M; computed £7.00M. The earlier figure included postage (POST + DOT); the current scope excludes non-product codes from product revenue.
- UK account count: earlier drafts cited 3,916; computed 3,914. Accounts 16446 and 12346 are quarantined per cleaning Rule 6.
- International account count: earlier drafts cited 418; computed 414. Four accounts with
Country = "Unspecified"are separated into the Identified-Unspecified bucket per cleaning Rule 7. - International median AOV: earlier drafts cited £385.06; computed £391.34. Follows from the Rule 7 narrowing.
- H1→H2 revenue growth: earlier drafts cited +29.5%; computed 31.35%. The comparable H2 window (Jul–Nov) is the defensible basis; the earlier figure may have used a different window.
- UK Jan→Aug 2011 new-customer decline: earlier drafts cited 79%; computed 60.39% (literal Jan→Aug) or 66.35% (peak Mar→Aug). The earlier 79% is not reproducible from in-year data.
- UK one-timer count: earlier drafts cited 1,353; computed 1,352. Follows from the account-count delta.
- Top-4 International concentration: earlier drafts cited 43.92%; computed 44.33%. Follows from the Identified-Unspecified exclusion narrowing the International segment.
- UK 3-year LTV: earlier drafts cited values ranging from £1,115 to £3,680; computed £1.24K. The chain-model formulation is documented in §A.7.
- International 3-year LTV: earlier drafts cited £1,290 or £3,680; computed £1.59K.
Appendix C · Methodology and Metric Definitions
This appendix documents the analytical methods, metric definitions, and known limitations of this series. Its purpose is to allow a reviewer to verify any figure in the series against its source computation and to understand where analytical assumptions were required.
19.0.11 C.1 — Scope and Data Source
All figures in this series derive from the UCI Online Retail Dataset: a transaction record covering December 1, 2010 through December 9, 2011. The dataset contains 8 fields per transaction: InvoiceNo, StockCode, Description, Quantity, InvoiceDate, UnitPrice, CustomerID (where captured), and Country (where captured).
Non-product line items excluded from all analyses: POST, DOT, M, BANK CHARGES, PADS, AMAZONFEE, D, S, SAMPLES, SAMPLE, and transactions with matching description strings (postage, dotcom postage, manual, bank charges, amazon fee, cruk commission, next day carriage, samples).
Cancellation transactions: identified by InvoiceNo strings beginning with “C”. Cancellation revenue = abs(Quantity x UnitPrice) for cancellation rows. Net revenue = gross revenue minus cancellation value.
19.0.12 C.2 — Metric Definitions
Gross revenue: sum(Quantity x UnitPrice) for all non-cancellation, non-excluded transactions in scope.
Net revenue: gross revenue minus the sum of cancellation credit note values attributed to the same scope.
Cancellation rate: sum(cancellation credit note values) / gross revenue, expressed as a percentage. Rates above 100% indicate prior-period credit notes issued within the analysis window against orders placed before the window start.
Median AOV (order-level): median(order-level gross revenue) where each order is the sum of all line items on a single InvoiceNo. This is the primary AOV benchmark used throughout — it is not the mean of line-item prices.
Top-N concentration: sum(net revenue of top N accounts ranked by net revenue) / total segment net revenue.
1 to 2 retention: count(accounts with 2 or more orders) / count(accounts with 1 or more orders), within segment.
Reorder interval: median(days between consecutive distinct order dates) for accounts with 2 or more orders. Uses distinct order dates to avoid counting same-day split invoices as separate reorder events.
CV (coefficient of variation): (standard deviation of unit prices / mean unit price) x 100, computed across all transactions for a given SKU.
19.0.13 C.3 — LTV Methodology
Three LTV models are used in this series. The chain model is the base case for all program gain calculations and portfolio totals.
Chain model (base case): LTV_3yr = AOV_median × orders_per_customer_median × (1 + r + r²), where r is the 1→2 retention rate. Full formulation in Appendix A §A.7. UK chain 3-year LTV: £1.24K. International chain 3-year LTV: £1.59K. Earlier drafts of this series cited £1,115 UK and £1,290 International as targets; the computed values are authoritative per the reconciliation in Appendix A §A.9.
Steady-state model (optimistic ceiling): Uses mean(n_orders > 5)^(1/5) as the steady-state retention rate. Appropriate only as a ceiling sensitivity — assumes the business reaches long-run retention stability, which cannot be confirmed from a 13-month window. UK steady-state 3-year LTV: approximately £2,990. International steady-state 3-year LTV: approximately £3,680.
Probability-weighted model (pessimistic floor): Expected value computation weighted by probability of reaching each order. UK probability-weighted 3-year LTV: approximately £355. International: approximately £574.
Decision rule: All program gain calculations, portfolio totals, and CAC ceilings in this series use the chain model. Steady-state and probability-weighted figures appear only in sensitivity analyses, clearly labeled.
19.0.14 C.4 — Program Gain Calculations
Day-30 follow-up program gain: (n_customers x delta_pp x incremental_forward_LTV_per_pp). Delta_pp is the assumed retention improvement (10 percentage points in base scenario). Incremental forward LTV per pp is computed from the chain model sensitivity function.
Overdue recovery gain: n_overdue_accounts x recovery_rate x segment_LTV. Recovery rate assumption: 50%. LTV input: chain model 3-year figure.
Lapsed reactivation gain: n_lapsed_accounts x reactivation_rate x segment_LTV. Reactivation rate assumption: 10%. LTV input: chain model 3-year figure.
Combined gain: simple sum of individual program gains. This is an upper bound — it assumes no interaction effects between programs and no cannibalization of the same accounts across programs.
19.0.15 C.5 — Known Limitations
Single-year observation window. This dataset covers 13 months. Multi-year retention inference is not supported. LTV projections assume observed retention rates persist — ownership transition effects, competitive changes, and market dynamics are not modeled.
Cohort observation truncation. Q3 and Q4 2011 acquisition cohorts had insufficient time to return before the December 9 data cutoff. One-time rates for these cohorts are not interpretable as true retention failures — customers may return in 2012. Do not use Q3/Q4 2011 cohorts as LTV model inputs.
Anonymous segment LTV. The anonymous segment LTV cannot be computed directly from transaction data — no CustomerID means no retention tracking. Program gain estimates for anonymous conversion use identified-segment LTV as a proxy, which overstates converted-account value to the extent that anonymous customers are episodic purchasers rather than relationship customers.
Regional LTV small samples. Regions with fewer than 10 repeat accounts (British Isles: 12 accounts, Africa: 1 account) produce high-variance per-account LTV estimates. Treat as directional only.
Cancellation rate interpretation. Rates above 100% indicate prior-period credit notes and do not reflect the current period’s order quality. These are flagged explicitly in the Priority 0 queue.
Non-product line item consistency. POST, DOT, and related non-product codes are excluded from all segment analyses. However, inclusion/exclusion treatment may vary slightly across the 23 parts of this series due to the analysis being built sequentially. Minor figure discrepancies traceable to this source are documented in C.6.
19.0.16 C.6 — Reconciliation Notes
Three LTV figures per segment. The series references three different LTV figures for each segment. These are not errors — they reflect three different models (chain, steady-state, probability-weighted) as documented in C.3. The chain model is always the base case. Where a figure differs from the chain model, the model name is stated explicitly.
AOV figures. Two AOV figures are used: order-level median (primary benchmark, used in LTV models) and order-sequence-weighted median (used in wallet share analysis). The order-level median for UK is £298.58 (International £391.34). The order-sequence-weighted figure varies by order position and is lower on average due to first-order and single-order customer weighting. Both are correct on their own terms — the difference is documented where it appears.
Regional account totals. The international segment contains 414 identified accounts after the Identified-Unspecified bucket (4 accounts, Country = "Unspecified") is separated per cleaning Rule 7. Regional breakdowns may not sum to 414 due to 1 account classified as European Community. The segment total of 414 is authoritative; regional subtotals are informational. Note: earlier drafts of this series cited 418 accounts — that figure included the Identified-Unspecified bucket.
19.0.17 C.7 — Research Questions Addressed in This Series
The acquisition question breaks into a set of business-health and acquisition-specific sub-questions that this series answers. The table below maps each question to the section of the series that addresses it.
| # | Category | Question | Addressed in |
|---|---|---|---|
| Q1 | Business health | Revenue concentration risk — how dependent is the business on its top customers and products? | Chapter 13 (Concentration); Chapter 21 (Appendix C — Products) |
| Q2 | Business health | Demand seasonality — when does revenue concentrate, and is the Q4 dependence a structural vulnerability? | Chapter 2 (Overview); Chapter 21 (Appendix C — Products) |
| Q3 | Business health | Customer base health — is the business growing and retaining, or churning through one-time customers? | Chapter 6 (Growth & Retention) |
| Q4 | Business health | Product performance — which products are the real revenue drivers net of cancellations, and what does near-term demand look like entering Q4? | Chapter 21 (Appendix C — Products); Chapter 8 (Wallet & Momentum) |
| Q5 | Business health | Segment differences — are there customer populations with materially different behavior requiring different management approaches? | Chapters 3–5 (Customer Base Profile) |
| Q6 | Business health | Growth composition — is revenue growth coming from new customers, returning customers, or both? | Chapter 6 (Growth & Retention) |
| Q7 | Acquisition | Is this business growing or contracting — and is growth sustainable without intervention? | Chapter 2 (Overview); Chapter 6 (Growth & Retention) |
| Q8 | Acquisition | Is the customer base earning at its commercial potential? | Chapters 9–12 (Customer Lifetime Value) |
| Q9 | Acquisition | What are the specific, quantified revenue improvement opportunities by segment and tier — and what would it cost to execute them? | Chapter 11 (Scenarios); Chapters 16–18 (Post-Close Action Plan) |
| Q10 | Acquisition | What are the material risks an acquirer must price in — and which are resolvable post-close? | Chapter 2 (Overview); Chapters 13–15 (Risk Assessment) |
| Q11 | Surfaced by the data | What does 6.93% of all orders having no CustomerID mean for an acquirer, and what does recovering that intelligence unlock? | Chapter 18 (Resolution Program) |
| — | Supporting analysis | Pricing consistency across segments | Chapter 14 (Cancellations & Pricing) |
| — | Supporting analysis | Cancellation quality — concentration by account or product | Chapter 14 (Cancellations & Pricing) |
| — | Supporting analysis | Dead stock risk — products with zero orders in trailing 12 months | Chapter 22 (Appendix D — Dead Stock) |
| — | Supporting analysis | Reorder predictability for LTV validation | Chapter 7 (Order Cadence) |
| — | Supporting analysis | International geographic concentration and transferability | Chapter 5 (International Landscape) |
19.1 Appendix D · Customer Segment Assignment Validation
This appendix validates the segment assignment methodology. Some customers place orders from both UK and international addresses — this section identifies those customers, quantifies the revenue involved, and confirms the primary-segment assignment rule used throughout the report.
Validation result: Clean. No identified customer placed orders from both UK and international addresses in this dataset. Every CustomerID maps to a single geographic segment — UK Identified or International Identified — with no cross-segment ambiguity. The segment assignment methodology requires no adjustment.
Author: Shawn Phillips | Lailara LLC
← Anonymous Order Resolution Program | Appendix B — Account Investigation Briefs →