Nội bộ — kế hoạch. Tài liệu thiết kế để duyệt, không phải bản phân phối. Bản nháp, có thể thay đổi. Một số định danh hạ tầng đã được lược bỏ. noindex.

_system / plans · care storage v2 (mô hình + bù đắp một lần + nhập định kỳ)

Đồng bộ & bù đắp dữ liệu học sinh — kế hoạch catch-up một lần

Mô hình lưu trữ v2 (quy tắc) + bù đắp 211→293 em + nhập định kỳ. Trạng thái build/run trực tiếp ở §12. Cập nhật 2026-06-14.

Status: decisions administrator-approved 2026-06-13 (D1–D19; Hoa Sữa/D12 in scope; student_code is the single id — journey_id retired, D19 override). Build + first live run done 2026-06-13 overnight — see §12. 9/11 build items shipped, Track C OCR live (130 docs), 211 v2 handbooks regenerated, N6 re-key proven on real data; Track G + B/K/D/E staged supervised. ⭐⭐ 2026-06-14 — TRACK G (Step 5) CODE + LIVE CUTOVER DONE (see §12.15). The journey_id→student_code re-key shipped end-to-end: committed a709996 (branch claude/care-trackG-rekey), full refactor of all 27 care tools + app/main.py/app/hub.py + 7 hub templates + every test file (pytest 775 green, ruff clean), and the supervised live cutover is complete — the live care DB is re-keyed (journey_id dropped, 311 children, FK-clean, funding 164/2.62B intact, verified no clobber), the survey DB is migrated (129 candidates), and the hub (00119) + care-dashboard (00033) + care-jobs are redeployed on the new code with schedulers resumed. Only the deferred Phase-2 bucket-FILE recode remains (rename Ho-so/journey-NNNN.md→<code>.md folders + rewrite benefit.ledger_path; reads work today via the unchanged paths). 2026-06-13 session 2 (see §12.1): N12 shipped + applied live — all 24 held OCR files resolved (15 essays filed school-scoped, 9 consolidated held by design), source_file written 130→161, 0 ambiguous left; Track D split-name parser shipped; the data-consumer audit (step 8) run — detail-parity patched, remaining punch-list recorded; journey-0028 name-order fix applied to the DB (Drive folder rename deferred). 2026-06-13 session 3 (see §12.2): Track B (CETB 2025-26 essays) DONE — all 6 schools' per-student originals filed, source_doc kind=student 1→138, counts unchanged (+2 fixes: op-aware skip, idempotent copy). 2026-06-13 session 4 (see §12.3): Track K (school funding agreements) DONE — built N11 care_benefit_extract + filed all 6 schools' BB-tài-trợ live: benefit 162→342 (+180 coverage rows, amount_vnd=NULL), register kind=funding 6→12, source_doc +6; 1.325 tỷ VND; 1 cross-school transfer (Mai Tiến Vũ) resolved. 2026-06-13 sessions 5–7 (see §12.4–§12.7): Track K extended to ALL years (2023-24 + 2024-25 — 7 more agreements, benefit→495, funding→19, 18 cross-school transfers resolved, 20 graduates reported); audit fixes A1–A5 coded (A3 = register amount_vnd so funding rolls up in reports/area4); Track D DONE (care_roster footer-guard + status-from-Ghi-chú + rich-field capture → 82 Hoa Sữa NBTL trainees loaded, child→293); N8 cross-school dup/transfer detector shipped; Track E relocate-school-first fixed (năm-học/acronym; dry-run 79 moves, apply gated). 2026-06-13 session 8 (see §12.8): COPY-ORIGINALS DIRECTIVE COMPLETE (Step 1) — care-jobs image rebuilt (+ --doc-type override, commit eec8ee4); 60 SE originals filed (Tam Thanh+Đại An; only 2 schools have SE folders, not ~180); 9 held essays force-filed (Track-B typos, each confirmed); Track E 80 consolidated docs relocated → Truong/<school>/<năm-học>/; older-year per-student essays = N/A (no Bài-viết folders). kind=student 138→207; N8/A5 checks = 0. 2026-06-13 session 8 cont. (see §12.9–§12.10): Steps 1–3 COMPLETE + LIVE. (2) A1–A5 go-live — register amount_vnd already present; 6 tài trợ 25-26 funding rows backfilled (funding non-null 7→13); deployed hub 00075 + care-dashboard 00018 (also ships parallel session 693180b UI); journey-2002 receipt verified. (2d) Drive cleanup — built rename-student-folder/ prune-orphan-docs/trash-file; journey-0028 merged to one correct folder; 27 Đại An orphan PDFs trashed. (3) Regen — handbook-refresh 293/293, index-rebuild (A4 v2), consistency-check (concurrency lesson: first run clobbered 88/293 by a parallel session + un-quiesced dashboard; clean re-run under full quiesce). 2026-06-13 session 9 (see §12.11): Step 4 DONE — built tools/care_mail.py (email intake lane, §7 channel 3, mirroring finance-mail; inert-safe; 13 tests) + cloud/care_mail/ image; entrypoint handbook-refresh gained an incremental --since affordance; deployed care-handbook-refresh + care-mail Jobs live (DB-safe targeted deploy, smoke-tested inert). Conservative scheduling: no schedulers registered (handbook-refresh on-demand under quiesce; care-mail until its mailbox exists); both schedules stay defined in deploy.ps1. 2026-06-14 session 9 cont. (see §12.12): Step 6 data fixes DONE — prize amount_vnd backfilled (144 individual + 6 pool rows; funding-w/-amount 13→19; 33M pool; Tam Thanh +300k/1-giải gap flagged) + 10 Hoa Sữa K4 malformed names fixed (full name + DOB recovered from the K4 roster; 10 handbooks regenerated); child 293 unchanged, integrity ok. 2026-06-14 session 9 cont. (see §12.13): Step 6c republish DONE (care plan page rendered via the new committed build_care_sync_plan_page.py, redaction-scanned clean, deployed to hmt-media-review-3dv.pages.dev) + Track G execution PLAN captured — DB half proven (care_rekey.py), D19 Stream-C-independence guardrail CLEAR, full re-key sequence written. 2026-06-14 session 10 (see §12.14): Data-consumer verification + report-QA fixes + 2024-cohort reconciliation — all LIVE. Confirmed the migration is complete + consumable (census + exercised report_child/school + index). Then fixed live: (a) SE "not available" — 10 Tam Thanh SE notes had empty detail (content in the payload); backfilled + a consistency-check invariant added. (b) School funding total missing from registers (fetch_register_timeline dropped amount_vnd). (c) Two-part funding — reports/registers now split "Tài trợ cấp trường (Thỏa thuận tài trợ)" from "Khen thưởng & hỗ trợ học sinh" (prize pool no longer double-counted); child report no longer shows the school's total funding (per-student blank until known); fixed the "cấp lứa→cấp trường" heading. (d) 2024 cohort showed 3/5 schools — re-cohorted 34 (Đại An + Nguyễn Bính, 2025→2024) + loaded 17 truly-absent 2024-25 students (withdrawn), 1 namesake held; cohort now 5 schools / 94 em. child 293→310. care-jobs + care-dashboard redeployed; registers + 71 handbooks regenerated.

▶ NEXT SESSION — START HERE

Track G is CODE-COMPLETE + LIVE (2026-06-14); the ONLY remaining work is the deferred Phase-2 bucket-FILE recode. Step 5 shipped: the ~820-hit refactor (all 27 care tools + app/main.py/app/hub.py + 7 hub templates + every test file; pytest 775 green, ruff clean), a latent care_rekey data-loss bug fixed (the rebuild schema was dropping the A3 funding columns — 164 rows / 2.62 B VND), and the supervised live cutover is done: the live care DB is re-keyed (journey_id dropped, 311 children, FK-clean, funding intact, verified no clobber), the survey DB migrated (129 candidates), and hub 00119 + care-dashboard 00033 + care-jobs are redeployed on the new code with schedulers resumed. Committed a709996. (Cutover gotchas captured in memory project_trackG-rekey-in-progress: bundles bucket is [bundles bucket]; new code crashes on the old DB at boot — upload the re-keyed DB first; hub + care-dashboard are manual-traffic; PowerShell shell state doesn't persist — load .env in the deploy command.)

REMAINING — Phase-2 bucket-FILE recode (its own focused session). The DB keys on student_code, but the per-student bucket folders are still journey-NNNN and child.profile_path + benefit.ledger_path still point there (reads work — the files exist at the old paths). Before running it: enhance care_rekey.recode_bucket to ALSO rewrite benefit.ledger_path (it currently rewrites only profile_path), and consider source_doc.drive_url/ bucket_path + register_entry/source_file paths. Then run under a quiesce (min-instances=0 + pause the 6 care schedulers + backup): a Cloud Run job is the cleanest place (the gcsfuse bucket is mounted there) since recode_bucket walks a filesystem root. Also clean up the pre-existing missing-ledger-file drift on the ben-2026-uni-* university benefits (their ledger .md files were never written by the lnquang load — unrelated to the re-key) and tidy the stale 0%-traffic hub revisions 00115/00116/00117/00118.

Then the final closing sweep: source-coverage + a clean consistency-check + a data-consumer sample (a child/school/cohort report, a handbook, the hub child page, the index, a watchlist run).

Parked, not blocking Track G — admin: enable the email lane (create the Care-Docs Gmail label + filter, then register care-mail-trigger hourly). cosmetic: §12 physical renumber (12.3 sits after 12.7; dated headers carry the reading order). (Coordinator-confirmed + fixed live 2026-06-14: (a) the held namesake "Nguyễn Gia Tuấn Anh" was a different person → loaded as Đại An journey-3036/cohort 2024; (b) Tam Thanh's missing 24th prize — Nguyễn An Minh Châu's Part-III academic 300k — added → Tam Thanh now reconciles to 24 giải / 5.5M; (c) the 18 loaded 2024-25 students re-statused withdrawn→graduated-exited (exit_reason=graduated, graduation_date 2025-05-31): Đại An grade 9 → THPT outside the programme, Nguyễn Bính grade 12 → finished THPT; the 11 NB classes (12A1–12A7) backfilled.)

Supersedes the 2026-06-12 sync-backfill draft by adding the target storage model (the rules), not just the gap list. Scope: Area-1 care only. Sinh viên (university students) and the standalone Hỗ trợ tài chính folder both stay out of scope. The care benefit facts come from the benefits file inside each cohort folder in the lnquang source (point 12e) — those are in scope: the handbook must show them (Track K). Three layers in one plan:

Target model (§1) — how candidate / student / school / cohort data is

stored, what is the source of truth, how the handbook and the journal work, and where the rules are recorded. This is the new core.

One-time backfill (§3–§6) — getting the existing 211 children's data to

obey the model.

Standing intake (§7) — the periodic jobs so every future document

stays compliant without a manual catch-up.

Build on what exists. Earlier migrations already shipped real tooling, jobs and data. §2 is an inventory of what is already built (reuse) vs new / changed, so nothing is rebuilt. Live counts (2026-06-12 re-run): 211 children, Ho-so/ 211 folders 1:1, source_doc 91, source_file ledger 43, profile_entry 542, school_report 293. Re-pull the live DB before any write (the year-end clobber lesson).

Single identity: student_code (administrator override, 2026-06-13).
HSYYYYNNNN (e.g. HS2025-0055; HVK<k><NNN> for NBTL) is the **one code used
everywhere** — Drive folders (Ho-so/HS2025-0055 — <Tên>/), bucket files
(Ho-so/HS2025-0055.md, Tai-lieu-goc/HS2025-0055/), the DB primary key,
handbook, reports, and hub. journey_id is retired — the earlier
keep-it-as-a-hidden-surrogate recommendation is overridden (administrator
confirms no PII concern in scope).
Two guardrails that keep student_code a sound key. (1) Frozen at mint —
a primary key must be immutable, so the YYYY digits record the original
cohort year and never change even if a child's cohort is later corrected (the
correction lives in the cohort/enrolment fields, not in the code); this keeps the
key stable for the foreign key across the ~7 child-keyed tables. (2) Stream C
("Hành trình của em") keeps its own anonymised id space on the media side;
confirm it does not depend on the care key before journey_id is dropped
(Track G).

1. Target storage model (the rules, redesigned)

The model keeps the existing two spaces (human-readable Shared Drive 2_Hoc-sinh/ + machine care bucket) and the five rules of care-data.md, and sharpens four things the current rules under-specify: the student folder as source of truth, the candidate→student folder move, a school-first layout, and a record-update journal.

1.1 Student folder = the source of truth (point 4)

Every per-student document lives under one folder, and that folder is the authoritative record for the child. Target layout:

2_Hoc-sinh/Ho-so/<HSYYYYNNNN> — <Tên>/      # human space (Drive)
  So-tay · <HS…> · <Tên>                      # the handbook Doc (rule 3, §1.5)
  Tai-lieu-goc/                               # ALL originals for this child
    <original: PDF, ảnh, Word/Excel, …>        # ← source of truth (raw, never edited; ANY format, point 4)
    <original>__OCR  (Google Doc)              # readable OCR derivative (if transcribed)
  Khao-sat/                                    # the candidate's survey folder, moved here on admission (§1.2)

care/<bucket-prefix>/Ho-so/HS2025-0055.md      # machine: canonical profile (named by student code, point 1)
care/<bucket-prefix>/Tai-lieu-goc/HS2025-0055/ # machine: .txt mirror of each original

Originals are not PDF-only (point 4). Tai-lieu-goc/ holds the original in whatever format it arrived — PDF, scanned image (JPG/PNG/HEIC), Word/Excel, even a photo of a handwritten page. file-student-doc (N1) is format-agnostic; OCR/extraction runs on whatever can be transcribed, the raw file is always kept.

Source-of-truth precedence (define it once, everywhere): original file > OCR Doc > .txt mirror > extracted DB rows. Everything below the original is derived and regenerable; the original is read-only after filing. Each original is one source_doc row (kind=student), and every DB fact pulled from it carries source_ref = doc_id[#section] so any number traces back to the page it came from.

This is the part even Đại An does not yet satisfy: it has OCR Docs but the original scan PDFs were never copied out of the source, there is no .txt mirror, and only 1 source_doc kind=student row exists in the whole DB (see §12-explainer). Track B (§4) fixes it for all schools.

1.2 Candidate → student promotion (point 2)

A candidate already has a survey folder 2_Hoc-sinh/Khao-sat/<ung-vien-NNNN>/ (survey Doc + documents + photos) and a row in the separate so-khao-sat.sqlite (candidate, state=cho-xet-chon). On admission, the existing survey_intake.admit_candidate() mints the student code (HSYYYYNNNN), writes the child row, and seeds the first progress note. Change to make: instead of relocating only the survey Doc, move the entire Khao-sat/<ung-vien>/ folder (all data therein) into the new student folder as Ho-so/<HS…> — <Tên>/Khao-sat/, and catalog the survey as a source_doc kind=student doc_type=khao-sat. Rejected candidates stay in Khao-sat/ with state=tu-choi (audit trail; never deleted).

Why a nested Khao-sat/ subfolder rather than dissolving it into Tai-lieu-goc/: it preserves the candidate package intact (provenance) and keeps "pre-admission survey" visibly distinct from school-life documents.

Existing candidates need the same treatment (point 3), two cases:

Still pending — normalise each existing Khao-sat/<ung-vien>/ folder to the

v2 rules (folder naming, catalog the survey as a source_doc, ledger row), so the candidate space obeys the model before any future admission.

Already promoted — for candidates admitted before N4, the survey folder

was left in Khao-sat/ (only the survey Doc, or nothing, was relocated). The backfill (Track I) moves those folders into the now-existing student folder and catalogs them, so no promoted student is missing their candidate package.

1.3 School folder, year-subfoldered (point 5) — replaces the year-first tree

Today consolidated docs live year-first (Tai-lieu-tong-hop/<năm>/<trường>/) and the school register is a separate Doc (So-tay-truong/<trường>); the result is the messy split the stocktake found (2025-2026 vs CETB 2025-2026). Move to school-first, collocating the register with the school's documents:

2_Hoc-sinh/Truong/<trường-slug>/
  So-tay-truong · <trường>           # the per-school register Doc (rule 4)
  <năm-học>/                          # 2023-2024, 2024-2025, 2025-2026, …
    <roster / KQ / sổ điểm / SE batch / chuyên cần / OCR derivatives>

care/<bucket-prefix>/Truong/<trường-slug>/<năm-học>/   # machine: .txt extractions

Rule: a school-scoped document is filed once here as one source_doc kind=consolidated row; the per-student facts it contains are extracted to that student's profile_entry/school_report with full detail and surface in the student's handbook (§1.5). The document is not copied into each child's folder — the fact goes to the child, the file stays with the school. This is the existing rule-2 discipline, just relocated school-first.

1.4 Cohort: register only, no docs subfolder (point 6 — my answer)

Recommendation: no. A cohort (lứa = programme-entry year) is a cross-school analytic cut, not a document owner — its data is a roll-up of per-student and per-school records. A cohort docs folder would be almost always empty and would create a third filing location for documents that already belong to a school or a child. Keep the cohort as a register Doc only (So-tay-lua/<programme>-<năm>, backed by register_entry scope_type=cohort), which reads the register and rolls up the per-student data. If a genuinely cohort-level document ever appears (a cross-school cohort event write-up), file it lazily under So-tay-lua/<programme>-<năm>/ then — do not pre-build the tree.

1.5 The handbook (sổ tay) v2 — comprehensive (point 3)

The handbook is the durable, human-readable single-child record. Current so_tay.py produces 5 sections (Hồ sơ · Ghi danh · Học bạ tất cả các năm · Sổ phúc lợi · Nhật ký tiến độ), already all-years and chronological (rule 3 ✓). Review findings + target v2:

Handbook section	Today	v2 target
1 · Hồ sơ (identity)	id, school, class, programme, cohort, entry, status	+ exit reason, carer + phone, address, student code, view-class
2 · Ghi danh (enrolment)	programme/school/cohort/dates/status	+ exit reason per row
3 · Học bạ — tất cả các năm	all years, chronological, GPA trend, source filename	+ resolve `source_ref`→ source_doc: title + Drive link + OCR-disclaimer flag
4 · Sổ phúc lợi (benefits)	lifetime, date/category/amount/source	+ petal anchor + receipt link (point 12e)
5 · Nhật ký tiến độ (journal)	chronological, `raw_link` only	+ resolve provenance via `source_doc` (title + link), not just a bare link
6 · Tài liệu gốc (NEW)	—	index of every `source_doc` for this child: title · doc_type · kind · transcription flag · Drive link · date added. This is "references to other original files + their details."
7 · Theo dõi an toàn (NEW, gated)	—	watchlist signals — coordinator/safeguarding audience only
8 · Nhật ký cập nhật hồ sơ (NEW)	—	the record-update journal (§1.6)
footer	none	compiled-by / compiled-at line

Slug / layout checks (point 3): folder Ho-so/<HSYYYYNNNN> — <Tên> and Doc So-tay · <HS…> · <Tên> are consistent and idempotent (same student_code → same folder/Doc, upsert-in-place). One real risk to fix: student_code is minted lazily on first handbook upsert (ensure_student_code); if a bulk job runs the handbook before enrolment is recorded the code timing can wobble. Fix: mint the student code at enrolment (ghi_danh / roster apply), not at handbook time so the folder name is stable before any handbook write. The handbook is a live record (no version history) — accept that; the report (report_child) remains the versioned snapshot.

1.6 The record-update journal & the two timestamps (point 9)

The model must distinguish two timestamps and never collapse them:

Event date — when the thing happened (a học bạ is for HK1 2024-2025; a

benefit was delivered on a date). Already captured: profile_entry.entry_date, school_report.term/period_month, benefit.date_delivered.

Ingest timestamp — when it entered our record. Captured on some tables

(source_doc.added_at, source_file.processed_at, school_report.extracted_at) but missing on profile_entry and benefit (they only carry the event date).

Two changes:

Schema (additive): add an ingest created_at column to profile_entry

and benefit, set at write time. No backfill of history required; new writes populate it.

Handbook §8 "Nhật ký cập nhật hồ sơ": a derived appendix — a single

chronological log built by UNION-ing the ingest timestamps (source_doc.added_at, profile_entry.created_at, benefit.created_at, school_report.extracted_at, source_file.processed_at) → "on <ingest-ts>, <what> was added to this record from <source file / doc_id>." No new table; it reads existing columns. This gives the audit answer "when did we learn X, and from which original file?" that the handbook can't currently produce.

1.7 Bucket naming — drop the `10_` (point 7 — my answer)

The bucket prefix is still care/10_Ho-so-thu-huong/ while the Drive band is now 2_Hoc-sinh; the 10_ is genuinely misleading. Recommendation: rename to care/ho-so-thu-huong/ (drop the numeric prefix; the prefix only ever pinned Drive sort order, which the bucket doesn't need). But it is a supervised live-data move, not a cosmetic edit — the bucket is the sole copy of beneficiary data and every tool reads it via profile_store.CARE_ROOT + HMT_BENEFICIARY_ROOT. Do it as an isolated, gated track (Track G): backup → quiesce hub → gcloud storage move objects under the new prefix → update the constant + env + redeploy → verify → delete old prefix last. Low priority; pure clarity. (Do not rename the Drive band; that already happened and the two namespaces are deliberately decoupled.)

1.8 Where the rules are recorded (points 1, 10)

Canonical, agent-facing: .claude/rules/care-data.md — the "five rules"

doc that every care tool reads. This plan's §1 becomes the v2 edit to that file once approved (school-first layout, candidate-folder move, handbook v2 sections, source-of-truth precedence, the two-timestamp rule). Until approved, the rules live here as a reviewable draft; care-data.md is not edited silently (it overrides agent behaviour).

Taxonomy detail: _system/plans/care-data-migration-plan.md Part 2b (the

per-student / consolidated / multi-scope doc-type table) — unchanged, still the reference.

Human copy on Drive: a 2_Hoc-sinh/_Huong-dan index Doc (filing rules +

folder map) so a coordinator filing by hand follows the same model. Generated from the care-data.md markdown (same pattern as the Drive _DOC-TRUOC / _QUY-TAC-DAT-TEN reference Docs).

Enforced by: the consistency-check job (§7) asserting the three places

agree, so the rules are checked, not just written (point 1).

1.9 `2_Hoc-sinh` — the organized target tree (point 11)

2_Hoc-sinh/
  _Huong-dan                       # filing-rules index Doc (the recorded rules, human copy)
  Ho-so/<HS…> — <Tên>/             # per-student: handbook + Tai-lieu-goc + Khao-sat
  Truong/<trường>/<năm-học>/       # per-school register Doc + consolidated docs by year (§1.3)
  So-tay-lua/<programme>-<năm>     # per-cohort register Doc only (§1.4)
  Khao-sat/<ung-vien>/             # PENDING candidates only (moves into Ho-so/ on admission)
  Bao-cao/<YYYY>/                  # generated reports

Cleanup folded into Track E: retire the "already-removed" Tai-lieu-so-hoa-OCR/ band, fold Tai-lieu-tong-hop/<năm>/<trường>/ and the stray CETB 2025-2026 tree into Truong/<trường>/<năm-học>/, delete the _ / _/_ artifact folders.

1.10 Document naming convention (point 2) — review & extend `naming-convention.md`

Today naming-convention.md covers event slugs, renditions and post-bundles but says nothing about care source documents, so files land with whatever name the field gave them. Add a care-document rule and have intake rename on file (ASCII kebab, diacritics dropped, đ→d, lowercase, like the rest of the convention):

Doc scope	Filename	Filed to
Per-student	`<HSYYYYNNNN>__<doc-type>__<năm-học>[__<seq>].<ext>`	`Ho-so/<HS…> — <Tên>/Tai-lieu-goc/`
Consolidated	`<trường-slug>__<doc-type>__<năm-học>[__<seq>].<ext>`	`Truong/<trường>/<năm-học>/`
Candidate	`<ung-vien-NNNN>__<doc-type>[__<seq>].<ext>`	`Khao-sat/<ung-vien>/` (→ student on admission)

<doc-type> is from the taxonomy (hoc-ba, so-diem, se, giay-khen, khao-sat, bien-ban-tai-tro, …); <ext> is the original extension, any format (point 4). The original name is preserved in source_doc.original_name so the rename is reversible and traceable. N7 covers the naming-convention.md edit; N8 (intake sweep) and N1 (file-student-doc) apply the rename.

2. What already exists (reuse) vs new / changed

Earlier migrations shipped most of the machinery. Reuse it; build only the gaps.

Already built — reuse as-is

Capability	Tool / job
Copy source archive → Drive + catalog `source_doc` + bucket stub	`care_migrate copy-source`
Move ONE file into a named student's `Tai-lieu-goc/`	`care_migrate relocate-one`
Move legacy per-student OCR/survey Docs into matching folders	`care_migrate relocate-legacy`
Mint `source_doc` from `raw_link`, set `source_ref`	`care_migrate backfill-provenance`
Seed school/cohort registers + emit register Docs	`care_migrate build-registers`
Apply gated coordinator fixes (grades, identity, phantoms)	`care_migrate reconcile`
Read-only Drive tree / migrated-vs-not classify	`care_migrate drive-ls`, `migrate-status`
Re-OCR scans, Gemini default, ledger skip-logic	`care_reocr` (`--write-db`)
Roster ingest (dump/propose/apply, school-scoped dedup, code mint)	`care_roster`
Structured extractors (hoc-ba, so-diem, chuyen-can, hop-phu-huynh, ket-qua-thi, se, giay-khen)	`school_doc_extract`
Classify → route → extract → file drops; `_needs-review.md` triage	`care_intake_convert`
Handbook + register Doc upsert, branded	`so_tay`
Candidate admit → mint student id + child row + first note	`survey_intake.admit_candidate`
OCR engine seam (Gemini 3.1-pro-preview default via `HMT_OCR_ENGINE`)	`gemini_ocr`
The migration ledger	`source_file` table
Deployed Cloud Run jobs	intake-convert · care-migrate · reocr · roster · index-rebuild · consistency-check · watchlist-monthly · watchlist-keyword · area4-emit

New or changed (this plan's build list)

#	Build / change	For
N1	`care_migrate file-student-doc` — copy original (any format, point 4) → `Tai-lieu-goc/` under the convention name (§1.10) + `.txt` mirror + `source_doc kind=student` + `source_file` row	Track B (point 12a, D1)
N2	`so_tay` handbook v2 — sections 6 (Tài liệu gốc index), 7 (watchlist, gated), 8 (update journal); source-link resolution on §3/§5; petal+receipt in §4; exit reason; compiled-at footer	§1.5 (point 3)
N3	School-first relocate — a `care_migrate` action (extend `relocate-legacy`) to fold `Tai-lieu-tong-hop/<năm>/<trường>` into `Truong/<trường>/<năm>` and rebuild paths	§1.3, Track E (point 11)
N4	`survey_intake.admit_candidate` — move the whole `Khao-sat/<ung-vien>/` folder into the student folder + catalog survey as `kind=student`	§1.2 (point 2)
N5	Schema additive — `created_at` ingest ts on `profile_entry` + `benefit`; mint `student_code` at enrolment not handbook time	§1.5, §1.6 (point 9)
N6	`student_code` as the single id — re-key the DB (`journey_id`→`student_code`, drop the column, re-point FK across the ~7 tables) + bucket prefix `10_Ho-so-thu-huong → ho-so-thu-huong` + rename per-student machine files to the student code (`HS….md` / `Tai-lieu-goc/HS…/`); update `profile_store.CARE_ROOT` + env + `profile_path`	§1, Track G (points 1, 7)
N7	`care-data.md` v2 edit + the care-document naming rule in `naming-convention.md` (§1.10) + the `_Huong-dan` human Doc	§1.8, §1.10 (points 1, 2, 10)
N8	Standing `care-intake` sweep (periodic, renames files on file) + enable the email lane for care	§7, §1.10 (points 2, 13)
N9	Handbook lifecycle — handbook on admission (wire `admit_candidate`/roster → `so_tay`) + a standing `handbook-refresh` job that regenerates changed students' handbooks	§11 (point 8)
N10	`care_migrate source-coverage` — walk each lnquang source band, reconcile every file against the `source_file` ledger, report "in source but not migrated"	§11 (point 9)
N11	Benefit-list extractor (only if absent) — parse the per-cohort benefits file → cohort/school `register_entry` funding + per-student `benefit` rows	Track K (point 12e)

3. The gaps (verified counts, 2026-06-12)

Gap	Rule	Status
G1 — children with no folder/handbook	1	✅ CLOSED (re-run: 142→211 folders, 1:1 with DB)
G2 — per-student originals not filed	3 a/b/c	172 source PDFs; filed 0; `kind=student` = 1 → Track B (needs N1)
G3 — per-student docs not OCR'd (5 schools)	3 d	only Đại An done (43 ledger); ~145 PDFs (Tam Thanh, Vĩnh Hào, Tân Khánh, Nguyễn Bính, LTV) → Track C
G4 — Hoa Sữa (NBTL) trainees not loaded	1, 3	7 source_doc, 0 trainees; ~81 need a split-name parser → Track D (held)
G5 — Drive disorganised	file-by-ownership	retired OCR band still present; `Tai-lieu-tong-hop` vs `CETB` split; `_`/`_/_` artifacts → Track E (now the school-first restructure)
G6 — no real migrated-vs-not register	5	`source_file` = 43 (Đại An only); copy-source/roster loads unrecorded → Track F
G7 — model not the canonical rule yet	1, 10	§1 not yet in `care-data.md`; no human filing Doc → N7
G8 — no standing compliant intake	1	future docs still rely on manual catch-up → §7 / N8
G9 — per-cohort benefit files not ingested	2/3, 2b	each lnquang cohort folder holds a benefits file of per-student support; the facts are needed in handbook §4 but were never extracted → Track K. (The standalone `Hỗ trợ tài chính` folder is out of scope.)

Per-student originals by school (source PDFs): LTV 24-25 = 27, LTV 25-26 = 13,
Tam Thanh 25-26 = 30, Tân Khánh = 27, Vĩnh Hào = 30, Đại An = 27, Nguyễn Bính
= 18. CETB 2023-2024 has no per-student PDFs (rosters/KQ only) — its originals
gap is empty by nature.

What "c — source_doc kind=student THIẾU. Toàn DB chỉ có 1 dòng kind=student" means (point 12d): source_doc.kind has exactly two values. consolidated = a group document (roster, KQ sheet, SE batch) → one catalog row that fans out to many per-student profile_entry/school_report rows. student = a document about one child (học bạ, essay, certificate, letter) → one catalog row. Almost everything loaded so far came from consolidated sources, so the whole DB has 1 kind=student row. The consequence: the per-student-document provenance layer is essentially absent — a child's record cannot point back from a fact to that child's own original document. Track B creates one kind=student row per filed original, closing it.

4. Work tracks (built on §2)

All writes run on Cloud Run (care-migrate-run), one DB writer at a time, each preceded by backup + hub quiesce + scheduler pause, verified after (rule 5). Re-pull the live DB immediately before every write.

Track A — human-space backfill (G1) ✅ DONE. 142→211 folders + branded

handbooks; cohort/school reports regenerated. No further action.

Track B — file the 172 per-student originals (G2, rule 3 a/b/c). Match each

source file to a student (name + class, school-scoped); copy the raw original (any format, point 4) into Ho-so/<HS…> — <Tên>/Tai-lieu-goc/ under the convention name (§1.10), write the .txt mirror, add source_doc kind=student + source_file. Includes Đại An (file its 27 originals; only OCR was done). Needs N1. File originals for all docs regardless of OCR status (point 12a).

Track C — OCR the not-yet-transcribed per-student docs (G3, rule 3 d). Per

school, care_reocr … --write-db → readable OCR Doc in Tai-lieu-goc/ + SE/essay profile_entry.detail. Skip anything already done in the ledger (migrate-status first) so nothing is re-billed (point 12a). Engine: Gemini 3.1-pro-preview for both printed and handwriting — the Đại An handwriting A/B passed (on par with Claude, ~2.3× cheaper), so the earlier "keep Claude for handwriting" caveat is retired (point 12b). Captions stay Claude (not OCR). Meter per school, dry-run-sample first.

Track D — Hoa Sữa NBTL trainees (G4). Build the split-name parser

(concatenate first/last columns, province Lào Cai/Sa Pa, capture CMND/DOB/ carer/hardship) → ~81 trainees under Nâng bước tương lai, codes HVK<k><NNN>, code block 8xxx. Approved into this round (D12); build the parser, load after A–C.

Track E — Drive restructure to school-first (G5, point 11). Fold

Tai-lieu-tong-hop + CETB 2025-2026 into Truong/<trường>/<năm-học>/; retire the OCR band; delete _/_/_ artifacts. Needs N3. Drive-mostly; rebuilds source_doc.drive_url/bucket_path paths it moves.

Track F — make the ledger a real register (G6). Backfill source_file for

the earlier copy-source + roster loads; run migrate-status over every band for one authoritative migrated-vs-not view.

Track G — student_code as the single identity (D19 override; points 1, 7).

One supervised migration: re-key the DB from journey_id to student_code (re-point the FK across child + the ~7 child-keyed tables, then drop the journey_id column), move 10_Ho-so-thu-huong → ho-so-thu-huong, rename each per-student machine file/folder to the student code (HS….md, Tai-lieu-goc/HS…/), update profile_store.CARE_ROOT + env + child.profile_path; backup → verify → delete old paths last. Freeze student_code at mint (immutable key) and confirm Stream C is independent of the care key first. Do this FIRST (Wave 1, before Track B) so every later write keys natively on student_code and nothing is re-keyed twice. Needs N6.

Track H — handbook v2 (point 3). Ship N2 + N5, then regenerate all

211 handbooks. After Tracks B/C so the new Tài liệu gốc + journal sections have data to show.

Track I — candidate folders to v2 (points 2, 3). Ship N4 for every

future admission. Backfill the existing candidate space: normalise pending Khao-sat/<ung-vien>/ folders to the rules (naming + catalog), and move the folders of already-promoted candidates into their student folders (the pre-N4 admits that left their survey package behind).

Track J — canonical rules + standing intake (points 1, 10, 13). Ship N7

(care-data.md v2 + _Huong-dan Doc) and N8 (the care-intake sweep + email lane). This is what makes the model stay compliant.

Track K — benefit ingestion from the per-cohort benefits file (G9, point 12e).

Each lnquang cohort folder contains a benefits file (per-student support for that cohort). Parse it → fan out per-student benefit rows (category, amount_vnd/amount_inkind, petal, funding_source, receipt_path/ ledger_path), matching each student school-scoped; catalog the file once as kind=consolidated under the matching Truong/<trường>/<năm> (or the cohort register) + a register_entry funding row. The facts surface in handbook §4 (petal + receipt link, via N2). Care side only — the donor match / receipt issuance (Area 3/4) and the standalone Hỗ trợ tài chính folder are out of scope. Tooling: a consolidated benefit-list extractor (N11) if school_doc_extract lacks one.

5. Synchronisation discipline (every write follows this)

Re-pull the live DB right before applying (never a session-start snapshot).
Back up the live DB → care/_backups/<date>-<step>/.
Quiesce the hub (update … --update-labels) + pause the care schedulers.
Apply on Cloud Run, idempotent, dry-run reviewed first.
Upload profiles/txt then the DB; re-download & verify counts.
Resume schedulers; confirm the hub serves.
Record every file outcome in source_file (idempotent re-runs skip done).

6. Sequencing

Wave 1 (cheap, no OCR spend): ~~A (done)~~ → G first (student_code

re-key + bucket recode + drop journey_id, N6) → B (file 172 originals + catalog + mirror, needs N1) → F (ledger backfill). G goes first so B/F write natively on student_code; then rule-3 a/b/c is done.

Wave 2 (metered OCR spend): C per school, dry-run-sample → review →

apply, billed school by school. Closes rule-3 d. K (benefit ingestion) rides here — scanned receipts may need OCR; the structured fan-out to benefit rows is cheap.

Wave 3 (the model becomes permanent): H (handbook v2) → I

(candidate move) → J (care-data.md v2 + standing intake). Do H after B/C so the new sections have data.

Wave 4 (separate builds, any time): D (Hoa Sữa parser) · E

(school-first restructure). (G moved to Wave 1 — it is now the identity re-key, a foundation for everything after.)

7. Standing intake — keeping it compliant in future (point 13)

The backfill is one-time; future documents must land in the model without a catch-up. The pieces mostly exist — wire them into a periodic sweep.

How a document gets in (channels):

Hub upload (preferred, built): coordinator picks scope + doc_type →

deterministic routing. Best for school sheets and per-student docs.

Drop folder (built): Tai-lieu-vao/<week>/<student-code>/ swept by

ingest-watch → care_intake_convert. Good for bulk/ad-hoc.

Email lane (parked → enable, N8): a care-documents mailbox/label,

mirroring cloud/finance_mail/. Lets coordinators forward a school's scans straight in. Recommended next channel — lowest friction for the field.

Form-attached (parked): onFormSubmit already knows form context; enable

later if a structured care-doc form is wanted.

Periodic jobs (Cloud Run; all currently manual under Phase A):

Job	Cadence (proposed)	Does
`care-intake` (sweep, N8)	daily/weekly	pull new docs from drop + hub (+ email when live) → classify → route → extract → file → `source_doc` + `source_file`; unmatched → `_needs-review.md` triage
`watchlist-keyword`	daily	safeguarding keyword sweep over new notes
`watchlist-monthly`	monthly	full watchlist sweep
`consistency-check`	weekly	assert the three places agree — `Ho-so/` count == `child` count == folder count; every `source_doc` has a file; every fact has a `source_ref`; flag drift
`index-rebuild`	on change	rebuild the read index from the live DB

consistency-check is the enforcement arm of §1.8: it turns the rules from written into checked. Phase A is untouched — these are internal care jobs, not publishing; nothing reaches Facebook/website, no channel-publisher cron.

8. Constraints & risks

Single DB writer. One Cloud Run care write at a time; sessions have

clobbered each other — re-pull before apply.

gcsfuse write race. Quiesce the hub before any so-dang-ky.sqlite upload.
OCR billing. Anthropic credits ran out mid-run before; meter Track C per

school, keep the ledger so a partial run resumes. Gemini cuts this ~2.3×.

Name matching is school-scoped. Cross-school namesakes are a known trap;

ambiguous matches go to triage, never auto-merged.

Bucket rename = live-data move (Track G). Sole copy of beneficiary data;

backup + verify + delete-old-last; isolate from other writes.

Sensitive docs (giay-to, hoan-canh, health): store the source, **scope

the fact** — do not copy identifying medical / precise-circumstance detail into the handbook body (beneficiary-care-subplan §3). Handbook §7 watchlist is coordinator/safeguarding audience only.

Firewall. Read into the workspace from the personal Drive only; never

write back out. Phase A untouched (care data, not publishing).

9. Recommendations & best practices (point 14)

One source of truth per fact. The original file is authoritative;

OCR Doc, .txt mirror and DB rows are derived and regenerable. Never edit an original; never let a derived copy become the only copy.

Provenance is non-optional going forward. Every new profile_entry /

school_report / benefit must carry source_ref → source_doc → original file. Make the intake sweep refuse to write a fact with no source (soft gate → triage), so the chain never breaks again.

Two timestamps, always distinct (event vs ingest) — §1.6. Reports read

event order; the update journal reads ingest order.

Stable IDs. student_code is frozen at mint and never reused; drive_file_id

is the dedup key in source_file so re-ingest is idempotent and edits are detectable.

Idempotent, ledgered, dry-run-first for every migration (rule 5). The

ledger is what lets a billing-interrupted OCR run resume.

Don't over-build. Cohort stays register-only; folders are lazy-created;

reuse the 8 care_migrate actions before writing a new one.

Convergence as a check, not a hope. consistency-check weekly asserts the

three places agree — the rules are enforced, not just documented.

Audience discipline. Sensitive detail is scoped, not transcribed; the

safeguarding section is gated; the handbook is access-controlled by Drive permissions, not a watermark.

10. Decisions — administrator-approved 2026-06-13

All recommendations accepted. D1–D11 and D13–D18 are approved as
recommended; D12 = do it (Hoa Sữa in scope this round); **D19 = student_code
becomes the single id** (administrator override; journey_id retired). Recorded
below for the record; these now drive the build, no longer "open".

D1. Build care_migrate file-student-doc (N1, Track B needs it)? *Rec:

yes.*

D2. Originals filed as the raw PDF in Tai-lieu-goc/ + a .txt mirror

(vs a PDF→Doc conversion)? Rec: raw PDF + .txt.

D3. Gemini 3.1-pro-preview for all OCR including handwriting (retire the

Claude-for-handwriting caveat — A/B passed)? Rec: yes.

D4. Adopt the school-first layout (Truong/<trường>/<năm>, N3, Track

E), replacing the year-first Tai-lieu-tong-hop tree? Rec: yes — it fixes the G5 split and point 11.

D5. Candidate promotion moves the whole Khao-sat/ folder into the

student folder (N4)? Rec: yes.

D6. Cohort = register only (no docs subfolder)? Rec: yes (point 6).
D7. Rename the bucket 10_Ho-so-thu-huong → ho-so-thu-huong (N6, Track

G) now, or defer as cosmetic? Rec: do it, but as an isolated gated track.

D8. Handbook v2 sections + created_at/code-mint changes (N2, N5)?

Rec: yes.

D9. Promote §1 into care-data.md v2 + the _Huong-dan human Doc (N7)

once D1–D8 land? Rec: yes — this is "ensure compliance in future" (point 1).

D10. Enable the email intake lane + the periodic care-intake sweep

(N8, §7)? Rec: yes — email next, sweep weekly.

D11. Per-cohort benefit files (G9, point 12e) — Track K: ingest the

benefits file inside each cohort folder → per-student benefit rows + handbook §4; the standalone Hỗ trợ tài chính folder and the Area-3/4 donor side stay out of scope. Rec: yes — the handbook is incomplete without it.

D12. ✅ Do it this round. Hoa Sữa (Track D) is in scope; build the

split-name parser and load the ~81 trainees after A–C.

D13. Bucket files and the DB primary key both on the student code

(HS….md, Tai-lieu-goc/HS…/); journey_id dropped (N6; see D19, point 1)? Rec: yes.

D14. Adopt the care-document naming convention (§1.10) and rename files

on file/intake (N7/N8, points 2, 4)? Rec: yes.

D15. Schedule the care jobs (consistency-check weekly, care-intake

+ handbook-refresh + watchlist) on Cloud Scheduler — care jobs are cron-eligible; only channel-publisher is gated in Phase A (point 5)? Rec: yes.

D16. Handbook lifecycle — auto-create on admission + standing

handbook-refresh (N9, point 8)? Rec: yes.

D17. source-coverage verification after each wave (N10, point 9)?

Rec: yes — it is the only proof nothing was dropped.

D18. Do NOT delete the lnquang source folders after migration (point 6)

— verify + ledger them as fully-migrated; the coordinator decides on their own personal Drive? Rec: yes (we have read-only access and it is the upstream origin; deletion is neither ours to do nor wise).

D19. ✅ student_code is the single id (administrator override). Retire

journey_id; re-key the DB and drop the column (Track G, Wave 1). Guardrails: freeze student_code at mint (immutable key) and confirm Stream C is independent of the care key before the drop. (This overrides the earlier keep-as-hidden-surrogate recommendation; the administrator confirms no PII concern in scope.)

11. Operations — triggering, footprint, future handbooks, verification

11.1 When/how jobs run (point 5)

Care jobs are Cloud Run Jobs, today run on-demand (gcloud run jobs execute / Cloud Run UI / a hub button). Care jobs are cron-eligible — Phase A only forbids a channel-publisher cron; an internal care job that never publishes is fine on a schedule. Proposed triggers:

consistency-check — weekly Cloud Scheduler (e.g. Mon 06:00

Asia/Ho_Chi_Minh) and on-demand right after every migration write.

care-intake — daily or weekly sweep of the inbox channels.
handbook-refresh, watchlist-keyword — daily; watchlist-monthly — monthly.
Migration tools (care_migrate, care_reocr, care_roster) — **on-demand

only**, never scheduled (one-time backfill work). Registering these is a /setup-schedules change; it must still refuse any channel-publisher job (autonomy-phase.md).

11.2 Operating footprint — is the tool/job count a problem? (point 7)

Not inherently — all care jobs share one container image, dispatched by the HMT_CARE_JOB env var (entrypoint.py), so more jobs ≠ more image builds; a job definition is thin. Keep it lean by:

One-time vs standing. Migration tools (care_migrate, care_reocr,

care_roster) are backfill-only — on-demand, no schedule. The standing set is small: care-intake, handbook-refresh, consistency-check, watchlist-keyword/-monthly, index-rebuild, area4-emit.

Retire after backfill. Once Tracks B–G verify clean, the migration **job

definitions** can be removed (the tools stay in the repo, runnable ad-hoc) to shrink the deploy surface.

One image, env-dispatched stays the rule — resist a per-tool image.

11.3 Handbooks for new & changing students (point 8)

At admission / roster apply: wire survey_intake.admit_candidate (and

care_roster apply) to call so_tay.upsert_native_doc so every new student gets a handbook immediately (N9). Mint the student code here too (§1.5).

Ongoing refresh: a standing handbook-refresh job regenerates the

handbook for any student whose data changed since last compile (using child.last_updated / the new created_at cols). The upsert is idempotent and cheap, so a daily/weekly pass keeps every handbook current after each intake-convert write. A hub "regenerate handbook" button covers one-offs.

11.4 Verifying nothing is missing vs the source (point 9)

After each wave, run care_migrate source-coverage (N10): it walks each lnquang source band read-only (drive-ls), lists every file, and reconciles against the source_file ledger — every source file must have a ledger row with a terminal result (written/skipped/held); any source file with no ledger row = a missed document, reported per folder. The report is the migration's completion proof and feeds Track F's authoritative migrated-vs-not view. Pair it with consistency-check (three-place agreement) for full coverage: one checks source → us, the other checks us internally consistent.

11.5 Disposition of the source folders (point 6)

Do not delete the lnquang source after migration. Three reasons: (1) the SA has read-only access to a personal Drive — it is not ours to delete; (2) it is the upstream origin we reconcile against (§11.4) — deleting it removes the ability to re-verify; (3) the firewall is read-into only. Instead: once source-coverage reports 100% and the ledger marks the band fully-migrated, the workspace copy becomes the operating source of truth, and whether to archive the personal-Drive folder is the coordinator's call on their own Drive, not a workspace action. (Folder IDs are kept out of this public page per the redaction rule.)

12. Build & run status (2026-06-13 overnight)

Branch claude/website-redesign (the care v2 foundation lives here; basing off main would orphan it). Full run log: _system/runs/care-storage-v2-overnight-2026-06-13.md. Lesson saved: memory project_care-storage-v2-overnight.

Done (shipped + verified)

Item	Status
N1 file-student-doc(s)	✅ built + selftest
N2 handbook v2 (§6/§7/§8 + source resolution)	✅ built; 211 handbooks regenerated live
N3 relocate-school-first	✅ built + selftest
N4 admit → handbook + Khao-sat move	✅ built + tests
N5 created_at ingest timestamps	✅ built; live DB migrated to add the columns
N6 care_rekey (Track G DB half)	✅ built + PROVEN on a copy of the live 211-child DB (counts preserved, FK-clean, journey_id dropped) — STAGED for live
N7 care-data.md v2 + naming rule	✅ written
N9 mint-at-admission + handbook-refresh job	✅ built
N10 source-coverage	✅ built + selftest
google-genai in care-jobs image	✅ added (Gemini was ImportError-ing)
Deploy: care-jobs image + Gemini wiring on care-migrate-run	✅ rebuilt + wired (verified)
Track C OCR (Gemini)	✅ 130 docs transcribed (~$2.1), source_file 43→154, 24 held; the 24 held resolved 2026-06-13 session 2 (§12.1)
Track H handbook v2 regen	✅ all 211 regenerated as native Google Docs
Full test suite	✅ 671 passed; ruff/mypy clean; 9 commits (overnight)
N12 school-scoped re-OCR + acronym index + entrypoint `--school`	✅ shipped + applied live (§12.1); 24 held → 15 filed + 9 consolidated held
Track D parser split-name + column overrides	✅ built + tests (the ~81-trainee load is still staged)
Audit step 8 (data-consumer)	◑ run; `detail`-parity patched (handbook §5 + report digests); punch-list staged for a redeploy
Track B — file per-student originals (CETB 2025-26 essays)	✅ DONE all 6 schools (§12.2); `source_doc kind=student` 1→138; +2 fixes (op-aware skip `27fea0f`, idempotent copy `cfabd7c`). SE forms + older years + no-match typos = later passes

Next steps (staged — supervised session, all live DB writes)

Do these one DB-writer at a time under the sole-writer + backup + verify discipline (§5 + the lesson in §12 of the run log — management-hub --min-instances=0, drain care-dashboard, pause the 6 schedulers, raise the job task-timeout, verify the ledger persisted after). Suggested order:

N12 — school-scoped re-OCR matching — ✅ DONE (2026-06-13 session 2, §12.1).

Shipped (school-scoped pool + 1-char fuzzy + a display-name acronym index so THPT LTV - Bài viết resolves to thpt-luong-the-vinh) + entrypoint HMT_REOCR_SCHOOL→--school; deployed in the care-jobs image and applied live: all 24 held resolved. (Track B's file-student-docs already school-scoped.)

OCR triage — ✅ DONE; Track B (raw originals) — still staged. The **15 held

essays were filed 2026-06-13 (12 via the school-scoped batch, 3 via coordinator-confirmed forced execs — see §12.1); source_file written 130→161, 0 ambiguous left. Still pending: filing the 172 raw per-student originals** (file-student-docs --school <slug> --school-year 2025-2026, dry-run → --apply) to create the source_doc kind=student rows, then handbook-refresh so §6 populates. The 24-held resolution (for the record):

15 student essays → file to these students (force `reocr --file-id <id>

--journey <jid>, or let N12's school-scoped pass do it automatically): 7A Nguyễn Thùy Linh→0032, 7A Nguyễn Thị Thuỳ→2022, 7B Nguyễn Hải Đăng→6026, 7B Nguyễn Ngọc Gia Huy→2017, 7B Trần Nhật Minh→6004, 9A Lê Thị Diệu Linh→6017, 9B Lê Thị Diệu Linh→0008 (two different girls, same name, different schools), 10A3 Trần Thị Huyền→0020, 11A4 Phạm Nhật Minh→1026, 11A4 Vũ Khánh Vân→0023, 6B Nguyễn An Minh Châu→0028, 9A Vũ Gia Bảo→2012 (confirm: DB class 8A vs file 9A); Lại Hia Huy→6024 Lại Gia Huy (coordinator-confirmed), Ngô Công Thiệu→6016 Ngô Công Thiện (1-char typo, confirm), Trần Viết Minh Tiềm→6027 Trần Viết Minh Tiền (1-char typo, confirm). (journey-ids; drop the journey-` prefix shown.)

9 consolidated → not per-student: the 6 "BB tài trợ <school>" are

the Track K funding source; KQ HK1 is already in school_report; 2× TDTT are school-level. Leave them out of per-student filing.

Track K (N11) — confirm the per-student benefit source first (the

"Tài trợ <năm>" subfolder docs, NOT the "Cùng em tiến bước" rosters), then build the extractor → benefit rows + register funding → handbook §4.

Track D — K3 via care_roster (nang-buoc-tuong-lai, 8xxx block, Lào Cai);

K1/K2/K4 need the split-name + header-offset parser (institutional letterhead).

Track E — relocate-school-first over Tai-lieu-tong-hop (N3).
Track G live re-key — the big one: run care_rekey on the live DB, then

the ~650-ref journey_id→student_code code refactor (110 hub + 540 tools) + Drive/bucket recode (10_Ho-so-thu-huong→ho-so-thu-huong, Ho-so/journey-NNNN.→Ho-so/HS….) + CARE_ROOT/env + hub redeploy, with the hub exercised after the cutover.

N8 email lane + handbook-refresh scheduler (dedicated job in deploy.ps1).
Data-consumer completeness audit (do AFTER the data tracks B/C/K/D/E land).

Now that all relevant data is migrated, verify every function/surface that uses or involves the care data actually sources it and presents it necessary + sufficiently — reports are one consumer, not the only one. Build a single consumer → source map (each surface × which tables/columns it reads) and flag any surface that under-sources, or any migrated source that no surface reads. The consumers to audit:

a. Reports (report_child, report_school, report_cohort) — per report section, confirm it reads the right table/column and surfaces:

Identity + enrolment — child (+ carer/phone/address/exit-reason) and

ghi_danh history (programme transitions, lứa ≠ năm-học kept distinct).

Academic record — all-years school_report (hoc-ba/so-diem),

chronological, GPA trend, with source-stamp.

Progress + SE — profile_entry including the full detail (the

OCR'd essays + 3-perspective SE notes from Track C), not just summary_line.

Benefits — benefit with petal + amount + receipt + funding_source

(incl. the Track K per-cohort benefits once loaded); school/cohort funding rolls up from benefit + register_entry funding rows, not recomputed.

Provenance — source_ref → source_doc resolves to a title + link, and

a Tài liệu gốc view lists the child's filed originals (Track B output).

Events/register — register_entry (prizes/hardship/milestones/headcount)

in the school/cohort reports; typical-case ("trường hợp tiêu biểu") picks.

Safeguarding — watchlist signals (gated audience).
Windows — per-student report defaults to full programme history (not

a trailing slice); cohort = programme-entry year, năm-học is a separate cut.

b. Handbook (so_tay v2) — confirm §3 transcript, §4 benefits (petal + receipt), §5 journal-with-provenance, §6 Tài liệu gốc, §8 update journal all populate from live data once Track B/K land (re-run handbook-refresh). c. Hub views (management-hub + care-dashboard) — the student list, the child profile page (transcript, notes incl. OCR detail, benefits, provenance links), Phân tích (analytics/charts), Cảnh báo (watchlist), and search each show the migrated data; no view silently drops a source. d. Read index (beneficiary_index_rebuild) — includes every migrated student + the new fields; re-run after the data tracks. e. Safeguarding watchlist (safeguarding_watchlist) — scans the new OCR profile_entry.detail (essays/SE), not just summary_line, so signals in the transcribed text are caught. f. Area-4 / finance emit (benefit_report / area4-emit) — includes the Track K per-cohort benefits in the donor/finance roll-up. g. Consistency-check (beneficiary_consistency_check) — asserts the v2 invariants over the now-complete data: Ho-so/ count == child count, every source_doc has a file, every fact carries a source_ref; extend it if a new invariant (e.g. every filed original has a kind=student row) isn't checked. h. Registers (So-tay-truong / So-tay-lua Docs) — roll up the migrated per-student data, don't recompute from scratch.

Method: produce the consumer → source map; flag under-sourcing + orphaned sources; patch the offending function; then regenerate a sample of each surface (a child report, a school report, a cohort report, a handbook, the hub child page, the index, a watchlist run) and read each for sufficiency — a coordinator/principal/trustee can act from it without opening raw files. The handbook is the per-child companion; the report is the versioned snapshot; keep every surface's sourcing in sync so nothing migrated goes unseen.

12.1 Session 2 — N12 + remaining OCR + data-consumer audit (2026-06-13)

Shipped + applied live:

N12 school-scoped re-OCR matching: pool restricted to one school (resolves

cross-school namesakes) + 1-char-typo fuzzy fallback + a display-name acronym index (THPT LTV → thpt-luong-the-vinh) + entrypoint HMT_REOCR_SCHOOL→--school. 3 commits. Track D split-name / column-override roster parsing — 1 commit.

Remaining-OCR run (Path B): rebuilt the care-jobs image with N12, dry-ran

(confirmed in-cloud), then backup → quiesce → 4 school-scoped batches (12 essays) + 3 coordinator-confirmed forced namesakes. source_file written 130→161, 0 ambiguous left; the 9 still-held are the consolidated non-essays (6 BB tài trợ = Track-K source, KQ already in school_report, 2 TDTT) — correctly excluded. child 211 / profile_entry 542 / school_report 293 unchanged. Backup care/_backups/20260613-pre-reocr-n12/.

Op-lessons (memory project_n12-reocr-remaining-done): backup is a hard gate

before any so-dang-ky.sqlite write; forced namesake mappings need explicit per-file coordinator confirmation (the matcher holds, never guesses); run forced execs AFTER all batches (a batch re-running its folder re-holds a namesake and reverts an earlier forced written mark — Doc survives, ledger reverts); wrap each gcloud run jobs update/execute in a DNS-flap retry and never execute after a failed update (stale env).

journey-0028 name-order fix (Nguyễn Minh An Châu → Nguyễn An Minh Châu):

DB done live (corrects hub/reports/search/CSV). Drive folder + handbook Doc rename deferred — local Drive write is unavailable (box ADC is cloud-platform-scoped; only the SA key in Cloud Run writes Drive). Do it as a Cloud Run folder-rename step (NOT handbook-refresh, which would duplicate the folder); ids in memory project_n12-reocr-remaining-done.

Data-consumer audit (step 8) — fixes required

Ran the §12 step-8 audit (3 read-only auditors over reports / hub+index / watchlist+area4+consistency+handbook). Root cause: the migrated OCR'd note body (profile_entry.detail) was rendered as only summary_line by several surfaces. report_child + the watchlist keyword builder already read detail correctly (audited, no change).

Patched (committed dd1c2d4; offline — needs a hub/care-jobs redeploy to go live):

so_tay handbook §5 journal renders the full detail body as indented bullets.
report_school / report_cohort digests feed (detail or summary_line)[:500]

to the LLM (capped so a long SE note can't bloat the aggregate prompt).

Still required (punch-list; most need a redeploy):

#	Fix	File(s)
A1	Live hub child page: show `detail` in the journal; resolve `source_ref`→`source_doc` (title+Drive link) instead of the bare ref+`raw_link`	`app/main.py` child route + `app/templates/child.html:72,74` (care-dashboard, not `app/hub.py`)
A2	Surface `benefit.receipt_path` in report_child benefits table, the hub child benefits, and the area4/IATI emit (handbook §4 already shows it)	`report_child.py:350`, `child.html:52`, `benefit_report.py`
A3	Funding roll-up: add `funding_source` + `petal` to the register generator; have area4-emit also read `register_entry kind='funding'` (cohort/Track-K funding is under-counted reading `benefit` only)	`so_tay._fetch_benefits_for`, `benefit_report.py:79`, `report_school.py`/`report_cohort.py` funding sections
A4	Read index to include v2 fields (`student_code` as a lookup key, `exit_reason`, `graduation_date`, `primary_carer`)	`beneficiary_index_rebuild.py:43-53`
A5	consistency-check v2 invariants: a dangling-`source_ref` integrity check (not a presence check — pre-v2 facts legitimately lack it), "every `source_doc` has a file", "every filed original ↔ `kind=student`"; also update it off the OLD `Ho-so/<jid>.md` bucket path to the v2 `Ho-so/<mã> — <Tên>/`	`beneficiary_consistency_check.py`
A6	(low) `register_entry` has no live hub view; report_child transcript lacks a source-stamp column; roster list shows frozen `class_label`; watchlist dead `profiles_root` param	various

Full detail: memory project_data-consumer-audit-2026-06-13. Recommended: bundle A1–A5 + the journey-0028 Drive rename + the deferred detail-parity go-live into a single hub/care-jobs redeploy.

12.2 Session 3 — Track B (file per-student originals) DONE (2026-06-13)

All 6 schools' CETB 2025-26 per-student essay originals filed (Track B / G2, rule 3 a/b/c). source_doc kind=student 1 → 138 (dai-an 27, tam-thanh 29, tan-khanh 23, vinh-hao 28, luong-the-vinh 12, nguyen-binh 18 + 1 baseline); child 211 / profile_entry 542 / school_report 293 unchanged. Each child now has its raw original in Ho-so/<HS…>/Tai-lieu-goc/ + a kind=student catalog row (the source-of-truth + handbook §6 layer §1.1 said was absent). Ran on Cloud Run (file-student-docs per school, --school school-scoped) under backup + quiesce. Backup care/_backups/20260613-pre-trackB/.

Two fixes shipped (pushed):

27fea0f op-aware skip — file-student-doc(s) was skipping any file the

ledger marked done, but the Track-C OCR pass records each essay as op='reocr' 'written' (→ 'done'); without the fix Track B would file nothing. Now it skips only on a prior FILING (op='copy-source').

cfabd7c idempotent copy_file — reuse an existing same-named copy instead

of duplicating, so an interrupted/clobbered filing re-runs clean.

Op-lesson (cost a clobber): the first run restored the hub in the same breath as the last job (Đại An) → a hub instance raced gcsfuse and rolled back Đại An's 27 source_doc rows (Drive copies survived). Rule: after a care-DB job, settle + re-download + VERIFY counts persisted, THEN restore the hub. Recovered by re-running Đại An + verify-before-restore. (memory project_trackB-in-progress-resume.)

Outstanding (next sessions), in suggested order

Track K (N11) — ✅ DONE (2026-06-13 session 4, §12.3). Built

care_benefit_extract + filed all 6 schools' funding agreements live.

Track D load — ~81 Hoa Sữa NBTL trainees: K3 via care_roster; K1/K2/K4 via

the shipped split-name/header-offset parser (dump → set cols → propose → apply); codes HVK<k>, block 8xxx, programme nâng bước tương lai.

Track E — relocate-school-first over Tai-lieu-tong-hop.
One bundled hub/care-jobs redeploy — audit fixes A1–A6 (§12.1) + the

journey-0028 Drive folder/Doc rename + the deferred detail-parity go-live (commit dd1c2d4) + the ~27 Đại An duplicate-orphan PDF cleanup (delete Tai-lieu-goc files not in any source_doc.drive_url; needs Cloud-Run/SA Drive write).

Track G — the big one: live care_rekey + the ~650-ref journey_id→student_code

refactor + bucket recode + hub redeploy, hub exercised after. Its own session.

N8 — care email-intake lane + handbook-refresh scheduler.
handbook-refresh — run ONCE after K/D land (populates handbook §6 for the

138 filed originals + everyone), not per-track.

Track B remainder — the per-school no-match typos (forced `file-student-doc

--file-id --journey, per-file confirm), the SE forms (Cảm xúc xã hội subfolders), and CETB 2024-25 / 2023-24 essays = later file-student-docs` passes.

After the data lands: re-run source-coverage + consistency-check + the data-consumer sufficiency check (regenerate a sample of each surface).

12.4 Session 5 — data-consumer audit fixes A1–A5 (code) + Track D findings (2026-06-13)

Audit fixes A1–A5 SHIPPED (code on main; 687 tests green; A6 deferred low-pri). They go live on the next hub/care-jobs redeploy (+ a live schema migration).

A3 (the big one) — funding now rolls up. register_entry gains additive

amount_vnd/petal/funding_source (ensure_register_columns); care_benefit_extract writes the funding total; benefit_report (Area-4), report_school, report_cohort, and the so_tay school/cohort registers all sum register_entry kind='funding' so the 1.325 tỷ Track-K scholarship funding surfaces (the per-student coverage rows are amount_vnd=NULL by design, so before this they rolled up as 0đ).

A2 benefit.receipt_path shown in report_child, the Area-4 HXL/md, and the hub

child benefits. A1 hub child page renders the full detail body + resolves source_ref→source_doc (title + Drive link) + receipt link. A4 read-index carries student_code/exit_reason/graduation_date/primary_carer (findable by code). A5 consistency-check gains a dangling-source_ref integrity check (the file-existence + v2-path checks need Drive/bucket access → deferred to the care-intake job + Track G).

Live steps still owed for A1–A5 (bundle into the redeploy): run ensure_register_columns on the live DB (additive migration) + populate the 6 existing Track-K funding rows' amount_vnd (re-run care_benefit_extract apply over the 6 reviewed plan JSONs — idempotent; it now backfills amount_vnd on the existing rows) + deploy hub/care-jobs.

Track D (Hoa Sữa NBTL) — prepped, NOT loaded; needs a parser fix + a decision. Structures mapped (read-only): K3 clean (hdr row 0, name-col 1, sheet Danh sách, 8 trainees); K1 hdr row 7, split name (name-col 4 + given-col 5), sheet Danh sách toàn khóa, Khóa 1 (2023); K2 hdr row 3, split name (name-col 2 + given-col 3), sheet Danh sách HS, Khóa 2 (2023); K4 hdr row 7, split name (name-col 5 + given-col 6), 3 trade sheets Á/Bánh/Bàn, Khóa ~4 (2024). Per Khóa: code HVK<k><NNN>, id-block 8<k>00–8<k>99, programme nang-buoc-tuong-lai, school hoa-sua (Hà Nội). Blocker found: propose over-captures trailing footer rows as students (e.g. Người lập biểu = "prepared by") — care_roster needs a stop-at-footer / require-numeric-STT guard before any load. Decisions: trainees are adult, past, completed/dropped 2023–24 cohorts — load status derived per-row from Ghi chú (Đã hoàn thành→graduated/da-ket-thuc; Bỏ học/Nghỉ học→withdrawn; else active), which needs a small status-parse in care_roster. ~81 trainees total.

12.5 Session 6 — Track K is MULTI-YEAR; 2023-24 + 2024-25 loaded live (2026-06-13)

Track K (which had done only 2025-26) is multi-year: each CETB <năm> source folder has a Tài trợ <năm>/ subfolder of BB agreements + a school-totals summary sheet (no per-student amounts → coverage-row model holds). Loaded live the 7 prior agreements (5× 2024-25: Nguyễn Bính, Vĩnh Hào, LTV, Tam Thanh, Đại An; 2× 2023-24: LTV, Tam Thanh) via the same idempotent N11 pipeline + the A3 amount_vnd write. benefit 343→495 (+152 coverage), register kind=funding 12→19 (+7, all with amount_vnd), source_doc +7, source_file +7; child/profile_entry/school_report/ kind=student unchanged; FK-clean; backup _backups/20260613-pre-trackK-prioryears/. Scholarship coverage by year now 2023: 44, 2024: 108, 2025: 181.

Cross-school transfers at scale (the Mai Tiến Vũ pattern): 18 funded students

matched no same-school child but resolved to an existing journey whole-DB by exact name — almost the whole Tam Thanh 2023 founding cohort (journey-0013–0023) was funded at Tam Thanh in 23-24/24-25 and is now filed under LTV (moved THCS→THPT). Their coverage is correctly attached across years (e.g. Vũ Khánh Vân scholarship = [2023,2024,2025]; Mai Tiến Vũ = [2024,2025]). 1 OCR typo (Trần Thanhh→Thanh Ngoan, coordinator-confirmed → journey-7002).

Source-doc imperfections (faithful, not OCR errors): Đại An 24-25 Điều 1 commits

150M but Điều 3 itemises 137.4M (doc's own discrepancy — kept the committed 150M); Tam Thanh 23-24 states no per-semester amounts (installments null); LTV 23-24 has a blank date (→ 2023-09-01 fallback). Verified by reading the PDFs.

20 funded students with NO care record (graduates/leavers — Nguyễn Bính lớp 12

of 24-25, Đại An lớp 9; cohorts we never loaded) → reported, not created (coordinator's call): _system/runs/trackK-prioryear-absent-2026-06-13.md. The funding totals still count them. This is the wider gap: past CETB cohorts (23-24/ 24-25 graduates) are not in the care store; only current cohorts are.

12.6 Session 7 — Track D (Hoa Sữa NBTL) parser enhanced + 82 trainees loaded live (2026-06-13)

Parser (committed): care_roster gained (1) a footer-row guard (detect_stt_col + numeric-STT requirement — drops institutional signature lines like "Người lập biểu" that were polluting the list), (2) status-from-Ghi-chú (row_status scans the row text → graduated/withdrawn/paused/active, robust to column position), and (3) rich field capture (detect_fields + RosterStudent): DOB (ISO-normalised), gender, address, dân tộc, phone, hoàn cảnh, trình độ, parents — written to child columns + a searchable source='roster' intake note (vs the prior name/class-only thin records). open_enrollment gained additive trang_thai/ket_thuc/ly_do_roi. 687 tests green; verified live-read on K1/K3/K4.

Loaded live (82 trainees): K1 34 (HVK1), K2 21 (HVK2 — 1 deduped, same person across Khóa), K3 8 (HVK3), K4 19 (HVK4, across the 3 trade sheets Á/Bánh/Bàn). School hoa-sua ("Trường Trung cấp Kinh tế - Du lịch Hoa Sữa", Hà Nội), programme nâng bước tương lai, id-block 8100–8499, cohort years K1/K2=2023, K3/K4=2024. child 211→293; FK-clean. Status (from Ghi chú): 52 active, 23 graduated, 7 withdrawn + matching NBTL enrollments. Rich fill: DOB/address 71/82, parents on K4. 71 intake notes. Backup care/_backups/20260613-pre-trackD-hoasua/. Supervised local-apply→upload (DB + 82 profile .md), verify-before-restore, post-restore re-pull confirmed no clobber. Handbook Docs/folders come with the deferred handbook-refresh.

12.7 Session 7 cont. — N8 detector + Track E fixes + the "copy originals to Workspace" directive (2026-06-13)

Shipped (committed): N8 cross-school duplicate/transfer detector in beneficiary_consistency_check (the Mai Tiến Vũ pattern as a standing weekly check; name + matching DOB across schools → flag; 0 on the live DB). Track E fixes to relocate-school-first: _norm_nam_hoc strips a programme prefix ("CETB 2023-2024" → "2023-2024") + the resolver gains a display-name acronym ("THPT LTV" → thpt-luong-the-vinh). Track E dry-run = 79 moves, target verified safe (the Foundation Shared-Drive Tai-lieu-tong-hop 1Of8xB, NOT the lnquang personal source [nguồn lnquang] owned by lnquang2016@gmail.com).

Directive (administrator): copy the original files from lnquang's folders into the HMT Workspace at the relevant places (care-data.md rule 1 — the Foundation owns its readable copy; the workspace copy becomes the operating source of truth). Scoped against current state:

Already copied: consolidated school docs → Foundation Tai-lieu-tong-hop

(1Of8xB — verified a real Foundation-owned copy, ids differ from lnquang); per-student essays 2025-26 → Ho-so/<mã>/Tai-lieu-goc/ (Track B, 138).

Gap to copy (= Track B remainder + Track E): per-student SE forms ("Cảm xúc

xã hội", ~180), per-student essays 2023-24/2024-25, the Track-B no-match/typo essays; then reorganize consolidated → Truong/<school>/<năm-học> (Track E apply). lnquang source per-school shape: <school>/ = loose consolidated sheets + a Bài viết/ folder (per-student essays) + a Cảm xúc xã hội/ folder (per-student SE). No separate per-student học-bạ scans exist (academic data is in consolidated KQ sheets), so essays + SE are the per-student originals to copy.

Tools: care_migrate file-student-docs --school <slug> --school-year <yr> per

Bài viết/Cảm xúc xã hội folder (copies original → Tai-lieu-goc + bucket .txt + source_doc kind=student); relocate-school-first --apply for the reorg. All Cloud Run Drive-writes (local Drive-write unavailable) + DB catalog writes (backup/quiesce) + per-file namesake confirms → a focused supervised session.

12.3 Session 4 — Track K (school funding agreements) DONE (2026-06-13)

Built N11 care_benefit_extract (engine seam Gemini-default + school-scoped matcher + idempotent writes; selftest + 6 pytest tests; 687 suite green; commit on main) and filed all 6 schools' "Thỏa thuận tài trợ" (BB tài trợ) live.

Source (administrator-confirmed): the 6 BB tài trợ <school> 25-26 PDFs in

the source "Tài trợ 2025-2026" folder (NOT the rosters; ids in source_file). Each is a school-level agreement: a school total + HK1/HK2 installments + a phụ lục of funded students, NO per-student amount.

Model (administrator-confirmed): money lives once on a **school-scope

register_entry kind='funding' (total + installments + funder + source_doc); each covered student gets a coverage benefit** (category=scholarship, petal=1, amount_vnd=NULL — no fabricated split, claims.md; amount_inkind=coverage text; receipt_path=the BB doc). BB doc cataloged once as source_doc kind=consolidated doc_type=bien-ban-tai-tro.

Extraction: Gemini 3.1-pro on all 6 PDFs (~$0.18 total). Integrity check:

every school's HK1+HK2 == total. 6 schools, 1.325.000.000đ, 181 funded.

Matching: 174/181 auto-matched; 6 namesakes (token-subset false-positives,

each with an exact-name candidate) + 1 no-match resolved per coordinator confirm (the 6 → journey-0027/0028, 2012/2032, 0023/1023, matching the N12 essay mappings; Mai Tiến Vũ (LTV) left unmatched — funded but no care profile).

Live write (local-apply→upload under quiesce, NO deploy): backup

care/_backups/20260613-pre-trackK-benefits/ → hub min=0 + 6 schedulers paused → re-pull → apply locally → upload DB + 180 ledger md → settle + verify before restore → hub min=1 + resume. Counts: benefit 162→342 (+180), register_entry kind=funding 6→12 (+6), source_doc 228→234 (+6); child 211 / profile_entry 542 / school_report 293 / kind=student 138 unchanged. FK-clean, post-restore re-pull confirmed no clobber.

Surfacing pending: handbook §4 + report/area4 funding roll-up show this after

the deferred handbook-refresh (after K/D) + the audit-A3 redeploy bundle.

Mai Tiến Vũ — cross-school transfer, NOT a missing student — ✅ RESOLVED (live). The one LTV funding line left unmatched was an existing beneficiary: journey-2002 / HS20240022, entered cohort 2024 at THCS Vĩnh Hào (home commune Vĩnh Hào; mồ côi cha), who moved up to THPT Lương Thế Vinh, lớp 10A6 in 2025-26. Root cause (verified): he is the only genuine LTV-roster name absent from the LTV DB (whole-DB diacritic-insensitive scan = exactly 1 record; survey DB = 1 candidate ung-vien-2002 → admitted to journey-2002 → no duplicate, no orphan LTV profile). Our loaders are school-scoped + per-cohort with no promotion/transfer step: when LTV 2025-26 data loaded, "Mai Tiến Vũ" matched nothing in the LTV pool and — because he already existed under Vĩnh Hào — no cross-school link was attempted and no LTV record was created. His 2024-25 Vĩnh Hào học bạ/notes are correct history; only his current school field was stale. Fix applied live (backup _backups/20260613-pre-mtv-move/ → quiesce → verify → restore): child.school_slug Vĩnh Hào→Lương Thế Vinh, class_label 9A→10A6 (cohort_year 2024 + student_code frozen), a coordinator transition profile_entry note, and the LTV scholarship coverage benefit attached (LTV 29→30; benefit 342→343, profile_entry 542→543). Lessons: (1) a school-scoped "no-match" may be a transfer, not an absence — check the whole-DB name first; (2) systemic gap to fix in standing intake / consistency-check (N8): detect "a new-roster name matching an existing child in another school at an adjacent grade-level" → flag as a probable promotion for coordinator confirm, so cross-school progressions are caught automatically; (3) school_slug is a single current-school field — an all-years enrolment history (per-year school) would model transfers without losing the prior-school record.

Track D prep status (read-only done; load pending go-ahead)

The 4 Hoa Sữa NBTL roster files are located (K1–K4 ids in the run log) and the care_roster parser (split-name + header-offset + --name-col/--given-col/ --header-row/--sheet overrides) is shipped. Per-Khóa convention confirmed: codes HVK<k><NNN>, id-block 8<k>00–8<k>99, programme nang-buoc-tuong-lai, province Lào Cai/Sa Pa. K3 is a clean roster (care_roster direct); K1/K2/K4 carry an institutional letterhead (header offset) and K2 splits the name across columns (needs the overrides). Next: --mode dump each (needs --db), set per-file overrides, --mode propose → review NEW vs EXISTS → --mode apply under backup→quiesce→verify. ~81 trainees, new code block 8xxx, all under Nâng bước tương lai.

12.8 Session 8 — COPY-ORIGINALS DIRECTIVE COMPLETE (Step 1) (2026-06-13)

The standing "Foundation must own its readable copies" directive (care-data rule 1) is done. All Step-1 sub-items executed live + verified.

Rebuilt the care-jobs image (Cloud Build) from current committed code so

the Cloud Run Drive-write jobs carry the latest tooling, and re-pointed care-migrate-run to it. Added a --doc-type override to file-student-docs (commit eec8ee4, +test) so the SE batch catalogs as se.

1a SE originals (60). The source has SE folders for only 2 schools (not

~180): Tam Thanh (30) + Đại An (30). Filed live via file-student-docs --doc-type se under backup→quiesce→verify→restore (_backups/20260613-pre-se-copy/): source_doc kind=student 138→198, copy-source written 137→197, doc_type=se 2→62; child 293 unchanged; no clobber. 1 namesake ("Nguyễn An Minh Châu 6B"→journey-0028) forced + coordinator-confirmed.

1b older-year essays = N/A. CETB 23-24 / 24-25 have **no Bài viết

folders** (loose consolidated sheets only). Lone candidate LTV 24-25 "PHIẾU HỌC TẬP" (27, given-name-only filenames) deferred as unreliable-match.

1c held essays (9). The Track-B no-match/typo holds resolved + forced-filed

(bai-viet), each coordinator-confirmed (pattern: filename drops middle "Thị" or a 1-char tone typo; class = current-year vs frozen-entry). kind=student 198→207, copy-source written 197→206; held 19→10 (_backups/20260613-pre-essays/). The remaining 10 held are all consolidated docs (DAY_DU / Tong_hop / Chấm điểm / Thưởng / KQ), zero per-student.

1d Track E relocate. relocate-school-first --apply (Drive-move-only,

no DB write) moved 80 consolidated docs <năm>/<trường>/ → Truong/<school>/<năm-học>/; verify re-run = 0 moves left + 6 correctly-unresolved (Hoa sữa ×4 + _ placeholder ×2).

1e checks PASS. N8 cross-school-dup = 0; A5 dangling source_ref = 0.

Pre-existing (not introduced here): 11 id-outside-allocation flags = the Tam Thanh-2023→LTV transfer cohort (journey-0013–0023 keep their origin-school id); future consistency-check refinement should exempt transferred children.

Live state after Step 1: child 293, benefit 495, register kind=funding 19, source_doc 310 (kind=student 207), copy-source written 206. Discipline held throughout: backup → quiesce (hub+dashboard min=0 + 6 schedulers paused) → apply → settle → verify → restore (never in the same breath); per-file namesake confirms; job left inert (HMT_MIGRATE_APPLY=0).

12.9 Session 8 cont. — Step 2 bundled redeploy: A1–A5 LIVE (2026-06-13)

Funding backfill DONE. register_entry already had amount_vnd/petal/

funding_source (ensure_register_columns = no-op). Re-ran care_benefit_extract apply --apply over the 6 25-26 plan JSONs (local-apply→upload under backup→quiesce→verify, _backups/20260613-pre-funding-backfill/): the 6 tài trợ 2025-26 rows got amount_vnd (210/210/200/215/240/250M); funding non-null 7→13; benefit 495 / source_doc 310 / kind=student 207 all unchanged. Still NULL (separate follow-up): the 6 Tổng giải thưởng tổng kết prize register rows (their amount is in detail; need their own backfill — out of the 6-funding-JSON scope).

Deployed A1–A5 LIVE. management-hub rev 00075-pt6 (`deploy.ps1 -Only

management-hub) + care-dashboard rev 00018-96n (image build + run deploy --image). Both healthy (hub /login 200, / 303, no 500s; the schema columns pre-existed so no "no such column"). The hub deploy also shipped the parallel session's committed UI (693180b`: survey restructure + interactive finance charts).

journey-2002 render verified (local TestClient, auth-off): benefits table shows

Giá trị / Nguồn / Biên nhận with the BB receipt drive link + coverage-row text → A2 receipt live. (A1 journal detail+provenance is unit-tested + deployed.)

Outstanding in Step 2: journey-0028 Drive folder/Doc rename (DB name already

correct; Drive cosmetic — needs a small Cloud Run folder-rename care_migrate action, none exists yet; must precede handbook-refresh or a duplicate folder is created) + the ~27 Đại An orphan-PDF cleanup (delete Tai-lieu-goc files not in any source_doc.drive_url — needs a Cloud Run delete step).

12.10 Session 8 cont. — Step 2d Drive cleanup + Step 3 regen (2026-06-13)

Step 2d DONE. Built 3 new Cloud-Run Drive actions in care_migrate

(rename-student-folder, prune-orphan-docs, trash-file + rename_file/ trash_file on the Drive seam; commits 7282dbf, d61deb7; +tests). Findings:

journey-0028 was a duplicate-folder split, not a stale rename — this

session's forced SE/essay filings created the correctly-named folder while the old-name folder (with the handbook) lingered. Merged per administrator ("only keep Nguyễn An Minh Châu"): moved the OCR essay Doc into the correct folder (relocate-one), trashed the old-name folder + its regenerable handbook (trash-file). Now a single folder; no data lost.

Đại An orphans: 27 trashed (prune-orphan-docs, uncataloged duplicate

PDFs from a pre-idempotent re-run; recoverable trash; re-verify = 0 left).

Step 3 DONE. handbook-refresh 293/293 (0 failures) — but only after a

hard lesson: the first run aborted at 88/293 because a concurrent external writer replaced the DB object mid-batch (2 generations in 2 min; care-dashboard was NOT quiesced that window + a parallel session was active). DB stayed intact (handbook-refresh is Drive-write only). Re-ran under a full quiesce (hub + dashboard + 6 schedulers) + administrator-confirmed the other session idle + DB-generation-stable precheck → clean 293, DB object generation unchanged throughout (proof the care jobs don't touch the DB when nothing else writes). index-rebuild wrote 14 Chi-muc/*.md indexes (A4 v2 fields); consistency-check ran (N8/A5 clean per §1e; remaining drift is pre-existing: Hoa Sữa malformed names + 11 transfer id-allocation flags). Backups _backups/20260613-pre-handbook-refresh/ + …-pre-handbook-rerun/. OPERATIONAL LESSON: a care job that only opens the DB (even read-only) is unsafe while ANY other writer (hub, dashboard, scheduler, OR a parallel session) can touch the gcsfuse-mounted DB — quiesce ALL of them + confirm no parallel session + verify DB-generation stability before a long batch.

Still outstanding: Track D Hoa Sữa data-quality (journey-8415+ malformed real_name = DOB concatenated, missing v2 frontmatter — re-parse needed); 6 prize register rows still amount_vnd NULL (separate backfill); Step 5 Track G re-key (own session); Step 6 tidy (§12 resequence + review-hub republish).

12.11 Session 9 — Step 4: care-mail email intake lane + handbook job (2026-06-13)

Step 4 DONE (code + live, conservative scheduling). The email intake lane (§7 channel 3) + a dedicated standing handbook-refresh job.

tools/care_mail.py (new, committed). Mirrors bank_notification_extract:

one --once pass reads a care-documents Gmail label read-only (admin@-minted OAuth token, the SAME secret finance-mail uses), saves each .eml + .yaml evidence to the Shared Drive human band 2_Hoc-sinh/Tai-lieu-vao-email/<year>/ <year-month>/, and drops attachments into the care intake inbox Tai-lieu-vao/<YYYY-Www>/_email/<stem>/ (on the SAME care bucket the care-intake-convert sweep reads). Attachments land with no journey-id folder → the sweep flags them in _needs-review.md (never silently routed). Mailbox + Drive seams + Fakes, dedup ledger, offline selftest, 13 tests. _safe_name rewrites a leading _/. so the sweep can't skip a doc. INERT-safe: with no Gmail creds --once no-ops + exits 0 (quiet cron until provisioned). + cloud/care_mail/ Dockerfile + requirements (own image, Gmail-OAuth deps).

entrypoint.py handbook-refresh: HMT_HANDBOOK_INCREMENTAL=1 computes a

~26h --since cutoff (automation TZ, so the string compare orders right), so a standing run is short and the gcsfuse DB-clobber window stays tiny. A FULL 293-doc refresh stays on-demand under quiesce (the concurrency lesson).

cloud/care/deploy.ps1: builds the care-mail image; adds `care-handbook-

refresh to the shared-image jobs (drive id + incremental + 3600s); deploys the care-mail` Job (own image, Gmail secrets conditional → deploys inert if absent); defines both schedulers (handbook-refresh daily 03:00, care-mail hourly). Quiesce note bumped to reflect the new DB-opening job.

**LIVE (targeted gcloud, administrator go-ahead; DB-safe — no care-migrate

execute, nothing opened the care DB): rebuilt care-jobs (carries the entrypoint change) + built care-mail; deployed care-handbook-refresh + care-mail Jobs. No schedulers registered this run (administrator chose the conservative posture — handbook-refresh stays on-demand; care-mail stays unscheduled until its mailbox exists; both schedules remain defined in deploy.ps1 for the eventual full-provisioning run). care-mail smoke-test:** fetched=0 … inbox=None, exit 0 — the lane is wired + inert.

Remaining to ENABLE the email lane (one-time admin): create a Care-Docs

Gmail label + forwarding filter on the admin@ mailbox; then register care-mail-trigger (hourly). care-intake-convert schedulers already live.

12.12 Session 9 — Step 6 data fixes: prize amounts + Hoa Sữa names (2026-06-14)

Step 6 (a)+(b) DONE + verified live (one backup→quiesce→apply→verify→restore window; the local-apply→upload pattern, no deploy).

Prize amount_vnd backfill (administrator model: pool totals + individuals).

The live DB held 144 individual kind=prize rows (each with a parseable thưởng Nđ) + 6 kind=funding "Tổng giải thưởng tổng kết 25-26" pool rows (one per school), all amount_vnd NULL. Backfilled all 150 from detail: individuals each their own amount; pools their doc total. register funding rows w/ amount 13→19; 144 prize rows now carry amounts. Integrity check: 5/6 schools' individual sum == pool total exactly; thcs-tam-thanh is +300,000đ / 1 giải short (pool 5,500,000đ / 24 giải vs 23 individual rows summing 5,200,000đ) — a faithful extraction gap (one 300k prize never captured as an individual row), NOT fabricated → coordinator follow-up (find the 24th Tam Thanh prize). Total prize pool recorded = 33,000,000đ.

Hoa Sữa K4 malformed names (10) fixed. HVK4010–HVK4019 (journey-8409–8418)

had real_name = given-name fragment + DOB concatenated (e.g. 'Phinh 2009-01-01 00:00:00') with date_of_birth NULL. Root cause: the K4 file's 3 roster sheets have different header rows + a column offset, so the original parser kept only the given-name half and leaked the DOB. Fix: re-read the K4 roster via the SA (Drive read), reconstructed each full name (đệm col + given col) + DOB, cross-checked given+DOB against the DB fragment (10/10 unique matches), and updated real_name + date_of_birth + last_updated. e.g. Hùng 2008-12-29 → Vàng A Hùng (2008-12-29). 0 malformed HVK4 left.

handbook-refresh (scoped). Ran care-handbook-refresh with

HMT_HANDBOOK_SINCE pinned just below the apply timestamp → regenerated exactly the 10 corrected handbooks (then removed the override to restore the incremental default). The job opened the gcsfuse DB read-only and flushed it back on exit (new generation, content-identical — re-download verified all changes held).

Counts after: child 293 (unchanged), register funding-w/-amount 19,

prize-w/-amount 144, malformed HVK4 0, integrity ok. Backup care/_backups/20260614-pre-step6/. DB generation stable post-restore; hub /login 200.

Note (deferred): §12 sub-blocks are still in append order (12.3 sits after

12.7 chronologically-but-not-numerically); the physical renumber was deferred as cosmetic (a 68-line cross-referenced block move) — the dated headers + this note carry the reading order.

12.13 Session 9 — Track G (re-key) EXECUTION PLAN captured (2026-06-14)

Step 5 = the journey_id → student_code re-key (D19). Planning done this session; execution deferred to its own focused session (it is the single most destructive op + a lockstep code/schema/redeploy change). Two preconditions confirmed ready:

DB half is BUILT + copy-proven — tools/care_rekey.py re-keys child to a

student_code PK, re-points the FK across the 6 dependent tables (benefit, profile_entry, school_report, watchlist_item, register_entry, source_file), drops journey_id, gated by PRAGMA foreign_key_check in one transaction, idempotent, --dry-run default. Selftest green.

D19 Stream-C-independence guardrail: CLEAR. Stream C ("Hành trình của em")

uses its own journey-<NNNN> id space + its own media-domain profile store (journey_leakage_check takes profile_path from its caller; no read of the care DB / CARE_ROOT / so-dang-ky). The media registry.sqlite post.journey_id column is unused (all-NULL) and never ATTACH-joined to care. No care tool reads a media journey id and no media tool reads the care key. Dropping the care journey_id does not touch Stream C.

Execution sequence (for the dedicated session):

Pre-flight (read-only): care_rekey.check_mapping on a live copy — every

live child (293) must have a unique non-empty student_code (minted at N5). Copy-test the whole read pipeline (reports/handbook/hub/index/watchlist) against a re-keyed backup, not just care_rekey in isolation.

**Code refactor (the long pole — ~820 prod journey_id hits / 40 files + ~198

test hits): rename DB-column reads/writes journey_id→student_code and the bucket/Drive path stems Ho-so/journey-NNNN→<code>; do NOT touch** Stream-C journey-<NNNN> strings (none live in care code). Biggest files: app/main.py (91), care_migrate (62), report_child (46), survey_intake (39), care_intake_convert (36), beneficiary_registry_migrate (34, the schema source — align its SCHEMA/migrations to care_rekey._REKEYED_SCHEMA), safeguarding_watchlist (33), school_doc_extract (31), so_tay (30). Update in clusters; pytest -q tests + selftests after each; fix the ~198 fixture hits.

Bucket recode (live, supervised): rename Ho-so/journey-NNNN.md→<code>.md,

Tai-lieu-goc/journey-NNNN/→<code>/, So-phuc-loi/journey-NNNN/→<code>/, and rewrite child.profile_path to the code path. (The Drive human folders are already code-named Ho-so/<mã> — <Tên>/ from v2; only the bucket machine mirror still uses journey-NNNN.md.) The bucket-prefix rename 10_Ho-so-thu-huong→ho-so-thu-huong (D7) is a SEPARATE deferred track — not part of this re-key. care_rekey emits the recode plan but performs no Drive/bucket moves → needs a recode step (a new care_migrate action or a one-off script).

Atomic live window (most destructive op — heavy backup, sole copy): build the

refactored images FIRST, then in ONE quiesced window: backup → care_rekey --apply on the live DB → bucket recode + profile_path rewrite → checkpoint + upload → FK-check + count verify → deploy the refactored hub + care-dashboard + care-jobs together (old code on the re-keyed DB = no such column journey_id 500s) → exercise (hub child page, report_child/report_school/report_cohort, a handbook, index-rebuild, a watchlist run) → restore. Per-deploy go-ahead.

Recommendation: run as a dedicated session; the refactor is large and must

land lockstep with the DB re-key + redeploy. Planning above is the entry point.

12.14 Session 10 — data-consumer verification + report-QA + 2024-cohort recon (2026-06-14)

A live QA pass: confirmed the migration is consumable, then fixed the issues that surfaced. All LIVE (backup→quiesce→apply→verify→restore; care-jobs + care-dashboard rebuilt + redeployed; registers + 71 handbooks regenerated).

Data-consumer verification (the plan's closing sweep, sans Track G). DB census

(integrity ok, 0 FK violations, 293→310 children all student_code+profile_path, transcripts 293, benefits 495, funding 19 rolling up to 2.59 tỷ, 0 dangling provenance) + exercised report_child (transcript/benefits/receipt/funding/ provenance/transfer notes), report_school (headcount/funding/prizes), and index-rebuild (all children keyed by student_code). Verdict: migration complete + consumable; honest gaps noted (DOB 183/293, 42 roster-only identity-only, pre-v2 source_ref sparse). File-existence half of consistency-check is the standing job's domain.

SE "không khả dụng" bug. 10 Tam Thanh SE assessments rendered as "not available"

on the student report: the 3-perspective content was in school_report.payload_json but the petal-3 note detail was empty (a one-off loader bypassed care_reocr.write_se_detail, which fills it for the standing path — that is why Đại An's 30 were fine). Backfilled the 10 detail from the payload + hardened with check_se_notes_have_detail (source-agnostic consistency-check invariant; +test).

School funding total absent from registers. report_common.fetch_register_timeline

never selected amount_vnd, so the school/cohort/handbook funding rollup (guarded by "amount_vnd" in r.keys()) silently rendered nothing. Fixed the query (defensive column check); +regression test.

Two-part funding presentation (Editor). School/cohort reports + so_tay registers

now show "Tài trợ cấp trường/lứa (Thỏa thuận tài trợ)" (school grant, per năm học) and "Khen thưởng & hỗ trợ học sinh" (year-end prizes) as two separate parts; new is_school_grant_funding/school_grant_by_year classifier EXCLUDES the prize-pool funding rows from the grant total (no double-count vs the per-student prizes). The per-year agreement table now reconciles with the timeline.

Child report drops the school total. A scholarship coverage benefit (school-level

total, no per-student split) no longer prints the school's funding on a student report — the per-student "Giá trị" is blank ("—"), filled when an individual amount exists. Also fixed the funding heading keying off with_school (the roster-column flag) → now scope_type, so a school register reads "cấp trường" not "cấp lứa".

2024-cohort reconciliation. The cohort report showed 3 of 5 schools —

Đại An + Nguyễn Bính students were all tagged cohort 2025, though both schools appear in lnquang CETB 2024-2025. Cross-checked both 2024-25 rosters vs the DB (the care_roster matcher): re-cohorted 34 continuing students (Đại An 18 + Nguyễn Bính 16) 2025→2024 (codes left frozen per D19; a provenance note on each), loaded 17 truly-absent 2024-25 students (Đại An 6 + Nguyễn Bính 11) as new cohort-2024 / status=withdrawn records via care_roster.apply_roster, and held 1 namesake (Nguyễn Gia Tuấn Anh — Đại An vs Tam Thanh journey-0045) for the coordinator. Result: the 2024 cohort now spans 5 schools / 94 em (Đại An 24, Tam Thanh 6, Vĩnh Hào 23, LTV 14, Nguyễn Bính 27); child 293→310, 0 dup codes/ids.

Commits (main): [system] SE harden, school-funding-total fix, two-part funding +

child-report. Scratch: Sandbox/_scratch/_step6/. Backups care/_backups/20260614-pre-{step6,se-backfill,build-registers,cohort2024}/.

12.15 Session 11 — Track G (Step 5): the journey_id→student_code re-key, CODE + LIVE CUTOVER (2026-06-14)

The last build/migration item in the plan. Executed as its own focused session.

Code (committed a709996, branch claude/care-trackG-rekey). D19: student_code

(HSYYYYNNNN / HVK<k><NNN>) is the single id; journey_id retired. The ~820-hit refactor touched all 27 care tools + app/main.py + app/hub.py + 7 hub templates + every test file; pytest 775 passed, ruff check + format clean. Two identity-minting lifecycles rewritten (survey admission, NBTL roster — the per-school journey-NNNN block allocator → year-monotonic student_code); consistency-check per-school id-block invariant retired → shape-only check_student_code_shape. index_registry.post.journey_id / decisions_core (Stream-C media id) left untouched per the D19 guardrail (one wrong rename caught + reverted by the test suite).

Bug fixed in the re-key tool itself: care_rekey._REKEYED_SCHEMA was dropping

register_entry's A3 funding columns (amount_vnd/petal/funding_source) — row counts matched so it passed silently, but the live cutover would have wiped the 164-row / 2.62 B VND funding roll-up. Now carried + asserted. (Also fixed a care_roster 17-values-for-16-columns INSERT.)

Cutover tooling: care_rekey gained --recode-bucket + --migrate-survey +

build_jmap, with selftests; cloud/care/jobs/entrypoint.py passes the renamed --student-code flag.

LIVE cutover (supervised): backup care + survey DB → re-pull (caught a one-off

13:27 write) → re-key live DB (--apply) → migrate survey (jmap from the pre-rekey backup) → verify (FK-clean, 311 children, journey_id dropped, funding 164/2.62B, 129/129 candidates) → upload (DB quiescent, no stale -wal) → lockstep redeploy: hub 00119-k97, care-dashboard 00033-pcl (new code), care-jobs new image (9 care-DB jobs re-pinned; consistency-check ran clean = new code reads the re-keyed DB) → schedulers resumed → no clobber confirmed post-swap. Backups gs://[care bucket]/care/_backups/20260614-trackG/.

Gotchas (memory project_trackG-rekey-in-progress): bundles bucket is

[bundles bucket] (not hmt-bundles); new code crashes on the OLD DB at boot (HMT_HUB_AUTOSTART reads the care DB) → upload the re-keyed DB BEFORE any new-code instance starts; hub needs min-instances=1 (cold-start too slow for the probe at 0); hub + care-dashboard are manual-traffic → update-traffic is required, deploy alone does not route; PowerShell shell state does NOT persist across calls → load .env in the same command as the deploy.

DEFERRED → Phase-2 bucket-FILE recode (see the NEXT-SESSION block): the bucket

per-student folders are still journey-NNNN; recode_bucket must be enhanced to rewrite benefit.ledger_path (it only does profile_path today) before running. A PRE-EXISTING missing-ledger-file drift on the ben-2026-uni-* benefits (uni-funding ledger .md never written by the lnquang load) is unrelated to the re-key.

Internal — review surface. First build + live run executed 2026-06-13 (§12). Sources: 2026-06-12 stocktake (memory project_three-place-stocktake-2026-06-12) · run log _system/runs/care-storage-v2-overnight-2026-06-13.md · rules care-data.md · taxonomy care-data-migration-plan.md Part 2b · schema so-dang-ky.sqlite · handbook so_tay.py. Quỹ Hoa Mặt Trời.