# Memorygram

## The Family-Memory Foundation Model

**Heirloom · Technical Whitepaper v0.1**
**April 2026 · Joe Lewis Jr., Founder**

---

## Executive Summary

Every foundation model shipped to date was trained on public data: scraped web pages, licensed books, archived forums, open code repositories. None was trained on what families actually remember. The voices of grandparents. The stories passed across generations. The wisdom that matters in kitchens, at bedsides, in cars on the way home from the hospital. That corpus has never existed as a trainable asset. It does not exist in any dataset that can be bought, scraped, or licensed. It exists only inside families, and until now, nowhere else.

**Memorygram is the first AI foundation model being trained specifically to understand how families preserve voice, stories, and wisdom across generations.** It is built by Heirloom, a memory-preservation platform with a proprietary multi-modal corpus of consent-verified family data, growing daily through an explicit opt-in framework designed from the ground up for ethical training.

**Memorygram is built as augmented intelligence, not autonomous AI.** The dominant industry bet is scale toward general reasoning: systems designed to converge on human-level capability across every domain at once. Memorygram takes the opposite shape. Families do not need a model that knows everything. They need one that knows *this* family. The goal is not to automate the work of being someone's grandchild. It is to make sure no grandparent's voice is ever lost because the grandchild was too busy. Every architectural choice in this document, from the training corpus to the ethics framework to the deployment path, follows from that bet.

This document describes the model, the data asset, the technical architecture, the ethics framework, and the commercial opportunity. Memorygram is, we believe, the most defensible proprietary AI asset that can be built in the consumer memory space — and the category of "vertical foundation models" that OpenAI validated with GPT-Rosalind in April 2026 is the defining AI category of the next five years.

**Key targets**

| Target | Milestone | Timeline |
|---|---|---|
| Data rights infrastructure | Consent flow shipped | Q2 2026 (shipped) |
| v0 preview | Opus 4.7 → Llama 3.1 8B distillation | Q3 2026 |
| v1 production | Multi-modal fusion, Sage-integrated | Q4 2026 |
| Edge deployment | On-device inference for private mode | 2027 |
| Licensing partners | Hospice, grief tech, federal, academic | 2027–2028 |

**The ask.** Heirloom is raising seed capital and engaging partner firms for compute, distillation, and fine-tuning infrastructure. Target first training run cost: $30–50k, largely covered by startup credit programs already applied for.

---

## 1. Thesis: Vertical Foundation Models Are the Defining AI Category of 2026

In April 2026, OpenAI released GPT-Rosalind, a frontier reasoning model built specifically for biology, drug discovery, and protein engineering. It was the first time a major AI lab shipped a domain-native frontier model under its flagship brand. This was not a fine-tune. It was a foundation model trained from the start for a single vertical, with proprietary data partnerships, a specialized evaluation framework, and an acknowledgment that general-purpose models have hit a ceiling in domains where the data matters more than the compute.

That category now exists. It has a name, a playbook, and a validated market. The race for 2026 is for founders who can plant a flag in an uncontested vertical before a general-purpose lab decides it matters.

**Family memory is the highest-value uncontested vertical.** Here is why:

1. **The training data does not exist anywhere else on earth.** Ancestry has family trees but zero voice, zero living story. ElevenLabs has voices but no personality, relationships, or context. StoryWorth has written memoirs but no multi-modal fusion. 23andMe has DNA but no qualitative memory. No general-purpose foundation model has ever seen a corpus of multi-generational family voice + photo + text + relationship graph. The raw material has never been assembled.

2. **The data compounds.** Families using Heirloom today capture memories daily. The corpus grows as a function of user-months, not as a function of crawl scale. This produces a network effect that defeats competitors with more compute. OpenAI cannot train GPT-Family. They have no family-scale data pipeline.

3. **The moat is defensible by construction.** Family data is the most sensitive data users will ever trust to a platform. The consent framework required to train on it is itself a two-year engineering investment. Any entrant arriving later starts from zero on both data and trust.

4. **The vertical has natural commercial adjacencies.** Hospice care, grief tech, family history research, academic longevity research, federal VA memorial programs, and private wealth legacy planning all need what a family-memory model does. There is a licensing market downstream of every good checkpoint.

Memorygram is what gets trained when a founder with a living, growing, consent-verified family memory corpus sits at the center of this category convergence.

---

## 2. The Data Asset

### 2.1 What the Heirloom corpus contains

Heirloom's production data (as of April 2026) includes, across the active beta cohort and growing:

- **Voice captures.** Voice memos, voicemails, and cloned-voice reference audio. Multi-speaker, multi-generational, consent-uploaded.
- **Photographic memory.** Family photos with vision-analyzed scene, era, faces, emotions, and narrative captions. Sourced from smartphones and scanned archives.
- **Written stories.** First-person reflections, memoir chapter drafts, and captured moments in free-form text. Every moment is tagged to a person in the family tree.
- **Relationship graphs.** Parent-child, spouse, and sibling edges across up to six generations per family. Augmented by FamilySearch imports (ancestor research).
- **Trait and theme annotations.** Inherited traits, family traditions, values, places, events. Derived through AI enrichment and human-confirmed.
- **Health heritage data.** Opt-in longitudinal family health patterns, useful for generational wellness research.

This is a six-dimensional corpus (voice × image × text × graph × trait × health) that, to our knowledge, exists nowhere else in any form suitable for model training. It is not harvested. It is contributed.

### 2.2 Why this corpus cannot be replicated

Three structural reasons:

**Data acquisition.** Building an equivalent corpus from scratch requires consumer trust, sustained engagement, multi-year capture windows, and deep consent infrastructure. An entrant starting today would need approximately four years to reach parity with the corpus Heirloom already has — during which Memorygram would already be shipping.

**Distribution advantage.** Heirloom's founder, Joe Lewis Jr., operates The Lewis Show, a distribution platform with 449,000 followers across Instagram, Facebook, and TikTok, 93% of whom are women aged 25–44 in the sandwich-generation ICP. This is the most emotionally resonant and commercially valuable demographic for family memory capture. Replicating this distribution is not purchasable; it requires 12+ years of audience building.

**Consent as moat.** Memorygram is training on explicit opt-in data under a documented consent framework with auditable opt-out, deletion, and retention guarantees (Section 4). Any competitor racing to catch up without equivalent infrastructure will run into a regulatory wall. Under emerging state-level AI training disclosure laws (California AB 3030-class precedent, New York SHIELD amendments, EU AI Act Article 28B implementation), non-consented training is an existential legal risk.

### 2.3 Growth trajectory

Heirloom launches publicly March 1, 2026, into a pre-existing audience of 449k followers with an estimated 1–3% direct signup conversion on the founder's owned content. Target scale:

| Milestone | Active families | Voice minutes | Written moments | Corpus multiplier vs. v0 baseline |
|---|---|---|---|---|
| Launch (Mar 2026) | 100–500 | 1k–5k | 5k–25k | 0.01× |
| v0 training cutoff (Q3 2026) | 1,000–5,000 | 20k–100k | 50k–250k | 1× baseline |
| v1 training cutoff (Q4 2026) | 10,000+ | 200k–1M | 500k–2.5M | 10× |
| 2027 end-of-year | 50,000+ | 1M+ | 5M+ | 50×+ |

Every family who opts in to training consent (default off, explicit on) becomes a persistent contributor. No churn scenario we have modeled eliminates the compounding effect of the corpus. Memorygram gets smarter with time in a way no competitor can match.

### 2.4 The consent framework as a tradable asset

Training consent on `family_members.training_consent` (default false, explicit opt-in, audit-logged via `decision_records`) is not a feature. It is a first-class legal artifact. In an acquisition, the consent framework survives. Any acquirer inherits the promises made to contributing families, and the data retention rules that protect them.

This is the same legal construct Anthropic uses to protect its "no customer data in training" commitment. For Heirloom, the construct inverts: "only explicitly opted-in customer data, anonymized and aggregated." Either way, the framework is the asset, not an overhead.

---

## 3. Technical Architecture

### 3.1 Base model selection

Memorygram will be built on **Llama 3.1 8B** (or Llama 3.2 when stable for fine-tuning) as the initial base. Rationale:

- **Open weights.** Meta's license permits commercial use and derivative training for organizations under 700M MAU (Heirloom qualifies indefinitely). Full ownership of the trained weights.
- **Size-to-quality ratio.** 8B parameters is the sweet spot for fine-tuning with available compute budget while maintaining general reasoning capacity. Larger (70B) would require compute we do not yet have; smaller (1–3B) loses too much baseline knowledge.
- **Edge deployment viable.** 8B models can be quantized (4-bit, 8-bit) and run on modern smartphones. This matters for the Sage Speaks private mode use case where family voice data must never leave the device.
- **Active ecosystem.** Llama 3.1 8B has mature fine-tuning tooling across Together AI, Fireworks, Modal, Anyscale, and Hugging Face. Switching cost is low if we discover a better base.

Secondary evaluation: Mistral Small 3 and Qwen 2.5 7B will be benchmarked against Llama 3.1 8B on family-narrative tasks in Q3 2026. Final base model selection for v1 will depend on evaluation results, not a priori preference.

### 3.2 Distillation from Claude Opus 4.7

The first phase of Memorygram training is **distillation**: generating high-quality responses from Claude Opus 4.7 (Anthropic's most capable model, already integrated into Heirloom production in April 2026) on anonymized Heirloom vault data, and training Memorygram to reproduce those responses at a fraction of the compute cost.

Distillation pipeline:

1. **Anonymized sampling.** Extract consented moments, voice transcripts, and tree relationships. Scrub identifying details (names replaced with role tokens: `<mother>`, `<grandchild_7>`, etc.) via deterministic anonymization.
2. **Prompt generation.** For each sample, generate family-memory tasks: story-continuation, reminiscence-in-character, relationship-inference, trait-extraction, memoir-chapter-synthesis.
3. **Opus 4.7 response generation.** Run tasks through Claude Opus 4.7 with task-specific system prompts matching Heirloom's production Sage persona.
4. **Quality filtering.** Score each response with a smaller judge model (Claude Haiku 4.5) against a rubric focused on family-memory fidelity. Discard bottom 20%.
5. **Fine-tuning corpus.** Output: a ~1M-example supervised fine-tuning dataset, purely synthetic but grounded in real (anonymized) family patterns.

Estimated cost: $8–15k in Opus 4.7 inference. Covered entirely by the $10k Anthropic Startup credit already targeted.

### 3.3 Fine-tuning approach

Supervised fine-tuning (SFT) on the distilled corpus using LoRA adapters at first (fast iteration, cheap), graduating to full-parameter fine-tuning for v1. Tooling: Together AI's fine-tuning API or Modal + PyTorch FSDP for full control.

Estimated cost: $15–25k in compute for v0. Covered by Together AI + Modal credits.

Evaluation during training: held-out validation set of 10k family-memory tasks, benchmarked nightly against Claude Sonnet 4.6 (cheaper reference) and Opus 4.7 (quality ceiling). Success criterion: within 15% of Opus 4.7 quality on the validation set at 1% of the per-token cost.

### 3.4 Multi-modal fusion (v1)

v0 is text-only. v1 adds:

- **Voice encoder.** Fine-tuned Whisper-large-v3 on Heirloom voice corpus, producing voice-DNA embeddings (personality + vocal features together) that condition Memorygram generation.
- **Vision encoder.** A family-photo-tuned CLIP variant (or Claude Opus 4.7 vision as a teacher in early phases — see April 2026 vision pilot data).
- **Graph encoder.** Relationship-graph embedding over `family_members` + `family_relationships`, so Memorygram can reason about who-is-related-to-whom when generating.

Fusion architecture: cross-attention layers between modalities at the middle of the transformer stack, similar to Flamingo and IDEFICS-3 approaches.

Estimated cost: $30–50k additional for v1 multi-modal training in Q4 2026.

### 3.5 Evaluation framework

Memorygram will be the first model to be evaluated on **family memory specific benchmarks** that do not yet exist in public form. Heirloom will construct and release:

- **HeirloomBench-Voice**: 1,000 tasks matching voice-personality to new unseen questions. Measures whether a cloned grandfather's answer sounds like him.
- **HeirloomBench-Thread**: 500 multi-turn conversations tracking family references across sessions. Measures whether Memorygram remembers who is whose grandmother.
- **HeirloomBench-Memoir**: 200 memoir-chapter generation tasks scored by consented family members on faithfulness and emotional resonance.
- **HeirloomBench-Consent**: adversarial tests verifying Memorygram refuses to surface memories from non-consenting vaults.

HeirloomBench will be published as an open benchmark after v0 release, accompanying an academic-adjacent paper, contributing to the field and inviting scrutiny.

---

## 4. Ethics and Consent Framework

Memorygram is the first foundation model built under the assumption that training data is a contract, not a crawl.

### 4.1 Opt-in by default false

The `training_consent` flag defaults to `FALSE` on every new family account. No moment is eligible for training until the vault owner explicitly opts in from the Settings panel. The opt-in copy is plain-language, names Memorygram specifically, and links to this whitepaper.

### 4.2 Opt-out is honored across time

Toggling off `training_consent` immediately excludes all data from future training runs. For already-completed training runs, individual-level unlearning is computationally intractable — however, Heirloom commits to an annual retraining cycle that excludes any user who has opted out since the prior cycle, subject to operational feasibility.

### 4.3 Anonymization pipeline

Before any content enters the training corpus:

- Names are replaced with relational role tokens.
- Dates are offset by a per-family random delta (preserving relative order, obscuring specifics).
- Specific place names are generalized (`Boston` → `<Northeastern US city>`).
- Voice samples are processed through a speaker-scrambler layer (pitch/formant shift) before embedding extraction when training the voice encoder. Actual voice clones never train the encoder directly.

### 4.4 Deletion guarantees

Upon account deletion request, Heirloom commits to:
- Immediate removal of all raw moments from active databases.
- Exclusion of the user from all future training runs.
- A cryptographic deletion log written to `decision_records` for audit.
- Exclusion from the next scheduled retrain cycle.

### 4.5 No surveillance, no sale

Memorygram data will never be used to:
- Profile users for advertising.
- Sell inferences to third parties.
- Train models for non-Heirloom commercial products without explicit per-family re-consent.

These commitments are contractual and survive acquisition. Any successor entity must re-consent every participating family before changing data use.

---

## 5. Roadmap

### Q2 2026 — Data rights foundation (shipped April 2026)

- `training_consent` column on `family_members`, default false, audit-logged.
- Opt-in toggle shipped in Heirloom Settings modal.
- Privacy policy rewritten with Memorygram-specific data rights section.
- Compute credit applications submitted (Anthropic Startup, Together AI, Modal, Google Cloud for Startups, AWS Activate).
- Partner firm outreach (Together AI, Fireworks, Modal) initiated.

### Q3 2026 — v0 preview

- Distillation corpus generated (~1M examples) from Opus 4.7.
- Fine-tuning run on Llama 3.1 8B via preferred partner firm.
- Internal HeirloomBench-v0 defined and Memorygram scored against Claude Sonnet and Opus.
- Founding Families (first 100 opted-in families) receive early access to a Memorygram-powered beta feature.

### Q4 2026 — v1 multi-modal

- Voice encoder fine-tuned on Heirloom voice corpus.
- Cross-attention multi-modal fusion layers trained.
- Sage Speaks (voice-cloned family conversations) migrated from ElevenLabs + Opus 4.7 to ElevenLabs + Memorygram v1.
- First HeirloomBench paper drafted.

### 2027 — Edge deployment and licensing

- Quantized (4-bit) Memorygram distilled for on-device inference.
- Sage Speaks private mode launches: intimate family conversations never leave the phone.
- Licensing pilot with one hospice network, one grief tech startup, and one academic longevity research group.
- HeirloomBench paper published; open-source components released under a custom license that preserves the consent framework.

### 2028+ — The moat matures

- Memorygram becomes the default AI layer for family memory across Heirloom product surfaces.
- Federal licensing opportunities pursued (VA memorial programs, Department of Defense family history initiatives).
- Acquisition discussions with strategic buyers (see Section 7) become meaningful.

---

## 6. Go-to-Market and Licensing Strategy

Memorygram has three revenue surfaces:

**1. Heirloom consumer product (primary, today).**
Memorygram powers premium features in Heirloom's Family ($9.99/mo) and Legacy ($24.99/mo) tiers. Sage Speaks, memoir chapter generation, and Sage Discovers (proactive insights) all eventually run on Memorygram rather than rented APIs. Target: 10,000 paying families by end of 2027, $1.5–3M ARR.

**2. Licensed API for adjacent verticals (mid-term, 2027).**
Therapists, hospice providers, grief-care apps, family history institutions, and private wealth legacy firms all need AI that understands family memory and cannot legally use general-purpose models on their clients' family data. Memorygram becomes the HIPAA-appropriate family memory API. Target: 3–5 paying licensees in 2027 at $50k–$250k annual contract values.

**3. Federal and institutional deployments (long-term, 2027–2028).**
Per April 2026 reporting, federal agencies are accelerating frontier-model adoption. Specific fits for Memorygram:
- Department of Veterans Affairs memorial programs and family outreach.
- Library of Congress / National Archives oral history initiatives.
- Department of Defense family preservation programs for deployed service members.
- State-level preservation programs for historically marginalized communities' oral histories.

Federal contracts are large, slow, and defensible. A single five-year VA contract could meaningfully change Heirloom's valuation trajectory.

---

## 7. The Opportunity

### 7.1 Why this is worth $500M+

Three comparables define the valuation envelope for a proprietary AI asset with a defensible vertical data moat:

| Company | Category | Last known valuation | Key asset |
|---|---|---|---|
| Character.AI | Consumer AI with user-generated conversation corpus | $5B (2024) | User interaction corpus + model |
| ElevenLabs | Voice AI | $3.3B (2024) | Voice model + customer voice library |
| Ancestry | Genealogy data | $4.7B (2020 sale) | Historical records dataset |
| 23andMe | Genetic data | $1.4B (2025, post-crash) | DNA corpus |

Memorygram's defensibility profile is closer to Ancestry + Character.AI combined: a unique multi-modal proprietary corpus that grows through a product network effect. A serious acquirer in the $500M–$2B range would be paying for:

1. The trained model (technical asset).
2. The live data pipeline (continuing training signal).
3. The consent infrastructure (legal asset).
4. The Lewis Show distribution channel (go-to-market asset).
5. The founder's 12-year audience relationship (trust asset).

### 7.2 Acquirer map

**Tier 1 — strategic fit, most likely.**
- **Apple.** Needs proprietary family-memory AI to differentiate Siri, Apple Memories, and iCloud. Refuses to train on customer data. Memorygram is the only ethical source. Apple has never acquired a model company at this size; this could be its first.
- **Ancestry.com.** Needs to pivot from static records to active AI. Memorygram is the missing voice + story layer on top of Ancestry's tree data. Strategic + defensive (blocks a competitor from getting the corpus).

**Tier 2 — strategic fit, possible.**
- **Google (YouTube + Family Link + Gemini).** Family-memory is the emotional layer Gemini cannot credibly build alone. Google DeepMind's Gemini Robotics-ER 1.6 also indicates Google's appetite for multi-modal vertical models.
- **Amazon (Echo Show + Kindle).** The Echo Show is a family memory device without a family memory model. Memorygram plugs directly into the gap.

**Tier 3 — not strategic but high signal.**
- **Hallmark.** Distribution fit, brand fit, but likely can't afford the deal.
- **Private equity with family-tech thesis.** Genealogy PE firms that rolled up Ancestry-adjacent properties.

The most valuable outcome is not necessarily acquisition. A successful IPO at $1B+ on the back of Memorygram as a category-defining AI asset is the stronger founder outcome and we will pursue it in parallel with strategic conversations.

---

## 8. Team and Next Steps

### Current team

- **Joe Lewis Jr. — Founder, CEO.** Solo builder. 12 years of audience development. Non-technical founder using AI-native development stack (Claude Code, Opus 4.7, Anthropic API, Netlify, Supabase). Product is live, paying customers exist, distribution is active.

### Planned additions (2026)

- **ML research partner firm** (Q2–Q3 2026). Together AI, Fireworks AI, or Modal as the infrastructure partner and initial fine-tuning collaborator. Target structure: compute credits + minority Memorygram warrants for skin-in-game alignment.
- **ML engineering advisor.** Part-time senior ML engineer retained 5–10 hours/week for architectural review and benchmark design. Target: a known researcher from the Llama / Mistral / Anthropic alumni network.
- **Advisory board.** 3–5 advisors spanning AI ethics, consumer product, genealogy industry, and capital markets. Compensated in equity.

### Immediate next steps (next 30 days)

1. Publish this whitepaper on `/memorygram` as the public v0.1 reference.
2. Close compute credit applications ($65k+ target).
3. Execute first partner-firm conversations with equity-for-compute structure proposed.
4. Raise pre-seed or seed capital specifically against the Memorygram asset ($500k–$1.5M target).

---

## Appendix A: Related Work

- OpenAI. *GPT-Rosalind: A Frontier Reasoning Model for Biology, Drug Discovery, and Protein Engineering.* April 2026.
- Anthropic. *Claude Opus 4.7.* April 2026.
- Meta. *The Llama 3 Herd of Models.* 2024.
- Alayrac, J.B. et al. *Flamingo: a Visual Language Model for Few-Shot Learning.* 2022.
- Project CETI. *Contextual and combinatorial structure in sperm whale vocalizations.* Nature Communications, 2024.
- ElevenLabs. *Voice Cloning Documentation.* 2025.

---

## Appendix B: Glossary

- **Corpus.** The total training data asset. For Memorygram, the consented, multi-modal vault of opted-in Heirloom families.
- **Distillation.** Technique where a smaller model is trained to mimic the outputs of a larger model, capturing quality at a fraction of the inference cost.
- **Fine-tuning.** Additional training applied to a pre-trained base model to specialize it for a domain or task.
- **LoRA.** Low-Rank Adaptation — a parameter-efficient fine-tuning technique that trains a small adapter instead of modifying the full model.
- **Memorygram.** Heirloom's proprietary family-memory foundation model. Not a product name exposed to end users directly (who experience it as "Sage"); an asset name for technical, legal, and commercial contexts.
- **Multi-modal fusion.** Training a single model to reason jointly across multiple data types (voice, image, text, graph).
- **Owner record.** The `family_members` row with `person_name = 'Owner'` that represents an account holder's primary identity. The training-consent flag is stored here.
- **Sage.** The family-facing AI persona in Heirloom products. Powered by (Claude today) → (Memorygram v1 Q4 2026).
- **Vertical foundation model.** A frontier AI model trained specifically for a single domain rather than general-purpose use. GPT-Rosalind (biology) and Memorygram (family memory) are the canonical 2026 examples.

---

## Appendix C: Contact

- **Joe Lewis Jr.** — hello@tryheirloom.family
- **Heirloom** — https://tryheirloom.family
- **Memorygram public page** — https://tryheirloom.family/memorygram
- **The Lewis Show** — 449,000 followers across Instagram, Facebook, TikTok

---

*This whitepaper is a living document. v0.1 reflects the state of Memorygram planning as of April 2026. Major updates will be versioned and dated. For partnership, licensing, or investor conversations, contact the founder directly at the email above.*

*© 2026 Heirloom. This document may be shared in full for the purpose of commercial discussions, press coverage, investor review, academic reference, and partner firm evaluation. Re-distribution with modification is not permitted.*
