Nội bộ — kế hoạch. Tài liệu thiết kế để duyệt, không phải bản phân phối. Bản nháp, có thể thay đổi. Một số định danh hạ tầng đã được lược bỏ. noindex.

_system / plans · care storage v2 (mô hình + bù đắp một lần + nhập định kỳ)

Đồng bộ & bù đắp dữ liệu học sinh — kế hoạch catch-up một lần

Mô hình lưu trữ v2 (quy tắc) + bù đắp 211→293 em + nhập định kỳ. Trạng thái build/run trực tiếp ở §12. Cập nhật 2026-06-14.

Status: decisions administrator-approved 2026-06-13 (D1–D19; Hoa Sữa/D12 in scope; student_code is the single id — journey_id retired, D19 override). Build + first live run done 2026-06-13 overnight — see §12. 9/11 build items shipped, Track C OCR live (130 docs), 211 v2 handbooks regenerated, N6 re-key proven on real data; Track G + B/K/D/E staged supervised. ⭐⭐ 2026-06-14 — TRACK G (Step 5) CODE + LIVE CUTOVER DONE (see §12.15). The journey_idstudent_code re-key shipped end-to-end: committed a709996 (branch claude/care-trackG-rekey), full refactor of all 27 care tools + app/main.py/app/hub.py + 7 hub templates + every test file (pytest 775 green, ruff clean), and the supervised live cutover is complete — the live care DB is re-keyed (journey_id dropped, 311 children, FK-clean, funding 164/2.62B intact, verified no clobber), the survey DB is migrated (129 candidates), and the hub (00119) + care-dashboard (00033) + care-jobs are redeployed on the new code with schedulers resumed. Only the deferred Phase-2 bucket-FILE recode remains (rename Ho-so/journey-NNNN.md<code>.md folders + rewrite benefit.ledger_path; reads work today via the unchanged paths). 2026-06-13 session 2 (see §12.1): N12 shipped + applied live — all 24 held OCR files resolved (15 essays filed school-scoped, 9 consolidated held by design), source_file written 130→161, 0 ambiguous left; Track D split-name parser shipped; the data-consumer audit (step 8) run — detail-parity patched, remaining punch-list recorded; journey-0028 name-order fix applied to the DB (Drive folder rename deferred). 2026-06-13 session 3 (see §12.2): Track B (CETB 2025-26 essays) DONE — all 6 schools' per-student originals filed, source_doc kind=student 1→138, counts unchanged (+2 fixes: op-aware skip, idempotent copy). 2026-06-13 session 4 (see §12.3): Track K (school funding agreements) DONE — built N11 care_benefit_extract + filed all 6 schools' BB-tài-trợ live: benefit 162→342 (+180 coverage rows, amount_vnd=NULL), register kind=funding 6→12, source_doc +6; 1.325 tỷ VND; 1 cross-school transfer (Mai Tiến Vũ) resolved. 2026-06-13 sessions 5–7 (see §12.4–§12.7): Track K extended to ALL years (2023-24 + 2024-25 — 7 more agreements, benefit→495, funding→19, 18 cross-school transfers resolved, 20 graduates reported); audit fixes A1–A5 coded (A3 = register amount_vnd so funding rolls up in reports/area4); Track D DONE (care_roster footer-guard + status-from-Ghi-chú + rich-field capture → 82 Hoa Sữa NBTL trainees loaded, child→293); N8 cross-school dup/transfer detector shipped; Track E relocate-school-first fixed (năm-học/acronym; dry-run 79 moves, apply gated). 2026-06-13 session 8 (see §12.8): COPY-ORIGINALS DIRECTIVE COMPLETE (Step 1) — care-jobs image rebuilt (+ --doc-type override, commit eec8ee4); 60 SE originals filed (Tam Thanh+Đại An; only 2 schools have SE folders, not ~180); 9 held essays force-filed (Track-B typos, each confirmed); Track E 80 consolidated docs relocated → Truong/<school>/<năm-học>/; older-year per-student essays = N/A (no Bài-viết folders). kind=student 138→207; N8/A5 checks = 0. 2026-06-13 session 8 cont. (see §12.9–§12.10): Steps 1–3 COMPLETE + LIVE. (2) A1–A5 go-live — register amount_vnd already present; 6 tài trợ 25-26 funding rows backfilled (funding non-null 7→13); deployed hub 00075 + care-dashboard 00018 (also ships parallel session 693180b UI); journey-2002 receipt verified. (2d) Drive cleanup — built rename-student-folder/ prune-orphan-docs/trash-file; journey-0028 merged to one correct folder; 27 Đại An orphan PDFs trashed. (3) Regen — handbook-refresh 293/293, index-rebuild (A4 v2), consistency-check (concurrency lesson: first run clobbered 88/293 by a parallel session + un-quiesced dashboard; clean re-run under full quiesce). 2026-06-13 session 9 (see §12.11): Step 4 DONE — built tools/care_mail.py (email intake lane, §7 channel 3, mirroring finance-mail; inert-safe; 13 tests) + cloud/care_mail/ image; entrypoint handbook-refresh gained an incremental --since affordance; deployed care-handbook-refresh + care-mail Jobs live (DB-safe targeted deploy, smoke-tested inert). Conservative scheduling: no schedulers registered (handbook-refresh on-demand under quiesce; care-mail until its mailbox exists); both schedules stay defined in deploy.ps1. 2026-06-14 session 9 cont. (see §12.12): Step 6 data fixes DONE — prize amount_vnd backfilled (144 individual + 6 pool rows; funding-w/-amount 13→19; 33M pool; Tam Thanh +300k/1-giải gap flagged) + 10 Hoa Sữa K4 malformed names fixed (full name + DOB recovered from the K4 roster; 10 handbooks regenerated); child 293 unchanged, integrity ok. 2026-06-14 session 9 cont. (see §12.13): Step 6c republish DONE (care plan page rendered via the new committed build_care_sync_plan_page.py, redaction-scanned clean, deployed to hmt-media-review-3dv.pages.dev) + Track G execution PLAN captured — DB half proven (care_rekey.py), D19 Stream-C-independence guardrail CLEAR, full re-key sequence written. 2026-06-14 session 10 (see §12.14): Data-consumer verification + report-QA fixes + 2024-cohort reconciliation — all LIVE. Confirmed the migration is complete + consumable (census + exercised report_child/school + index). Then fixed live: (a) SE "not available" — 10 Tam Thanh SE notes had empty detail (content in the payload); backfilled + a consistency-check invariant added. (b) School funding total missing from registers (fetch_register_timeline dropped amount_vnd). (c) Two-part funding — reports/registers now split "Tài trợ cấp trường (Thỏa thuận tài trợ)" from "Khen thưởng & hỗ trợ học sinh" (prize pool no longer double-counted); child report no longer shows the school's total funding (per-student blank until known); fixed the "cấp lứa→cấp trường" heading. (d) 2024 cohort showed 3/5 schools — re-cohorted 34 (Đại An + Nguyễn Bính, 2025→2024) + loaded 17 truly-absent 2024-25 students (withdrawn), 1 namesake held; cohort now 5 schools / 94 em. child 293→310. care-jobs + care-dashboard redeployed; registers + 71 handbooks regenerated.


▶ NEXT SESSION — START HERE

Track G is CODE-COMPLETE + LIVE (2026-06-14); the ONLY remaining work is the deferred Phase-2 bucket-FILE recode. Step 5 shipped: the ~820-hit refactor (all 27 care tools + app/main.py/app/hub.py + 7 hub templates + every test file; pytest 775 green, ruff clean), a latent care_rekey data-loss bug fixed (the rebuild schema was dropping the A3 funding columns — 164 rows / 2.62 B VND), and the supervised live cutover is done: the live care DB is re-keyed (journey_id dropped, 311 children, FK-clean, funding intact, verified no clobber), the survey DB migrated (129 candidates), and hub 00119 + care-dashboard 00033 + care-jobs are redeployed on the new code with schedulers resumed. Committed a709996. (Cutover gotchas captured in memory project_trackG-rekey-in-progress: bundles bucket is [bundles bucket]; new code crashes on the old DB at boot — upload the re-keyed DB first; hub + care-dashboard are manual-traffic; PowerShell shell state doesn't persist — load .env in the deploy command.)

REMAINING — Phase-2 bucket-FILE recode (its own focused session). The DB keys on student_code, but the per-student bucket folders are still journey-NNNN and child.profile_path + benefit.ledger_path still point there (reads work — the files exist at the old paths). Before running it: enhance care_rekey.recode_bucket to ALSO rewrite benefit.ledger_path (it currently rewrites only profile_path), and consider source_doc.drive_url/ bucket_path + register_entry/source_file paths. Then run under a quiesce (min-instances=0 + pause the 6 care schedulers + backup): a Cloud Run job is the cleanest place (the gcsfuse bucket is mounted there) since recode_bucket walks a filesystem root. Also clean up the pre-existing missing-ledger-file drift on the ben-2026-uni-* university benefits (their ledger .md files were never written by the lnquang load — unrelated to the re-key) and tidy the stale 0%-traffic hub revisions 00115/00116/00117/00118.

Then the final closing sweep: source-coverage + a clean consistency-check + a data-consumer sample (a child/school/cohort report, a handbook, the hub child page, the index, a watchlist run).

Parked, not blocking Track Gadmin: enable the email lane (create the Care-Docs Gmail label + filter, then register care-mail-trigger hourly). cosmetic: §12 physical renumber (12.3 sits after 12.7; dated headers carry the reading order). (Coordinator-confirmed + fixed live 2026-06-14: (a) the held namesake "Nguyễn Gia Tuấn Anh" was a different person → loaded as Đại An journey-3036/cohort 2024; (b) Tam Thanh's missing 24th prize — Nguyễn An Minh Châu's Part-III academic 300k — added → Tam Thanh now reconciles to 24 giải / 5.5M; (c) the 18 loaded 2024-25 students re-statused withdrawngraduated-exited (exit_reason=graduated, graduation_date 2025-05-31): Đại An grade 9 → THPT outside the programme, Nguyễn Bính grade 12 → finished THPT; the 11 NB classes (12A1–12A7) backfilled.)


Supersedes the 2026-06-12 sync-backfill draft by adding the target storage model (the rules), not just the gap list. Scope: Area-1 care only. Sinh viên (university students) and the standalone Hỗ trợ tài chính folder both stay out of scope. The care benefit facts come from the benefits file inside each cohort folder in the lnquang source (point 12e) — those are in scope: the handbook must show them (Track K). Three layers in one plan:

stored, what is the source of truth, how the handbook and the journal work, and where the rules are recorded. This is the new core.

obey the model.

stays compliant without a manual catch-up.

Build on what exists. Earlier migrations already shipped real tooling, jobs and data. §2 is an inventory of what is already built (reuse) vs new / changed, so nothing is rebuilt. Live counts (2026-06-12 re-run): 211 children, Ho-so/ 211 folders 1:1, source_doc 91, source_file ledger 43, profile_entry 542, school_report 293. Re-pull the live DB before any write (the year-end clobber lesson).

Single identity: student_code (administrator override, 2026-06-13).

HSYYYYNNNN (e.g. HS2025-0055; HVK<k><NNN> for NBTL) is the **one code used

everywhere** — Drive folders (Ho-so/HS2025-0055 — <Tên>/), bucket files

(Ho-so/HS2025-0055.md, Tai-lieu-goc/HS2025-0055/), the DB primary key,

handbook, reports, and hub. journey_id is retired — the earlier

keep-it-as-a-hidden-surrogate recommendation is overridden (administrator

confirms no PII concern in scope).

Two guardrails that keep student_code a sound key. (1) Frozen at mint

a primary key must be immutable, so the YYYY digits record the original

cohort year and never change even if a child's cohort is later corrected (the

correction lives in the cohort/enrolment fields, not in the code); this keeps the

key stable for the foreign key across the ~7 child-keyed tables. (2) Stream C

("Hành trình của em") keeps its own anonymised id space on the media side;

confirm it does not depend on the care key before journey_id is dropped

(Track G).


1. Target storage model (the rules, redesigned)

The model keeps the existing two spaces (human-readable Shared Drive 2_Hoc-sinh/ + machine care bucket) and the five rules of care-data.md, and sharpens four things the current rules under-specify: the student folder as source of truth, the candidate→student folder move, a school-first layout, and a record-update journal.

1.1 Student folder = the source of truth (point 4)

Every per-student document lives under one folder, and that folder is the authoritative record for the child. Target layout:

2_Hoc-sinh/Ho-so/<HSYYYYNNNN> — <Tên>/      # human space (Drive)
  So-tay · <HS…> · <Tên>                      # the handbook Doc (rule 3, §1.5)
  Tai-lieu-goc/                               # ALL originals for this child
    <original: PDF, ảnh, Word/Excel, …>        # ← source of truth (raw, never edited; ANY format, point 4)
    <original>__OCR  (Google Doc)              # readable OCR derivative (if transcribed)
  Khao-sat/                                    # the candidate's survey folder, moved here on admission (§1.2)
care/<bucket-prefix>/Ho-so/HS2025-0055.md      # machine: canonical profile (named by student code, point 1)
care/<bucket-prefix>/Tai-lieu-goc/HS2025-0055/ # machine: .txt mirror of each original

Originals are not PDF-only (point 4). Tai-lieu-goc/ holds the original in whatever format it arrived — PDF, scanned image (JPG/PNG/HEIC), Word/Excel, even a photo of a handwritten page. file-student-doc (N1) is format-agnostic; OCR/extraction runs on whatever can be transcribed, the raw file is always kept.

Source-of-truth precedence (define it once, everywhere): original file > OCR Doc > .txt mirror > extracted DB rows. Everything below the original is derived and regenerable; the original is read-only after filing. Each original is one source_doc row (kind=student), and every DB fact pulled from it carries source_ref = doc_id[#section] so any number traces back to the page it came from.

This is the part even Đại An does not yet satisfy: it has OCR Docs but the original scan PDFs were never copied out of the source, there is no .txt mirror, and only 1 source_doc kind=student row exists in the whole DB (see §12-explainer). Track B (§4) fixes it for all schools.

1.2 Candidate → student promotion (point 2)

A candidate already has a survey folder 2_Hoc-sinh/Khao-sat/<ung-vien-NNNN>/ (survey Doc + documents + photos) and a row in the separate so-khao-sat.sqlite (candidate, state=cho-xet-chon). On admission, the existing survey_intake.admit_candidate() mints the student code (HSYYYYNNNN), writes the child row, and seeds the first progress note. Change to make: instead of relocating only the survey Doc, move the entire Khao-sat/<ung-vien>/ folder (all data therein) into the new student folder as Ho-so/<HS…> — <Tên>/Khao-sat/, and catalog the survey as a source_doc kind=student doc_type=khao-sat. Rejected candidates stay in Khao-sat/ with state=tu-choi (audit trail; never deleted).

Why a nested Khao-sat/ subfolder rather than dissolving it into Tai-lieu-goc/: it preserves the candidate package intact (provenance) and keeps "pre-admission survey" visibly distinct from school-life documents.

Existing candidates need the same treatment (point 3), two cases:

v2 rules (folder naming, catalog the survey as a source_doc, ledger row), so the candidate space obeys the model before any future admission.

was left in Khao-sat/ (only the survey Doc, or nothing, was relocated). The backfill (Track I) moves those folders into the now-existing student folder and catalogs them, so no promoted student is missing their candidate package.

1.3 School folder, year-subfoldered (point 5) — replaces the year-first tree

Today consolidated docs live year-first (Tai-lieu-tong-hop/<năm>/<trường>/) and the school register is a separate Doc (So-tay-truong/<trường>); the result is the messy split the stocktake found (2025-2026 vs CETB 2025-2026). Move to school-first, collocating the register with the school's documents:

2_Hoc-sinh/Truong/<trường-slug>/
  So-tay-truong · <trường>           # the per-school register Doc (rule 4)
  <năm-học>/                          # 2023-2024, 2024-2025, 2025-2026, …
    <roster / KQ / sổ điểm / SE batch / chuyên cần / OCR derivatives>
care/<bucket-prefix>/Truong/<trường-slug>/<năm-học>/   # machine: .txt extractions

Rule: a school-scoped document is filed once here as one source_doc kind=consolidated row; the per-student facts it contains are extracted to that student's profile_entry/school_report with full detail and surface in the student's handbook (§1.5). The document is not copied into each child's folder — the fact goes to the child, the file stays with the school. This is the existing rule-2 discipline, just relocated school-first.

1.4 Cohort: register only, no docs subfolder (point 6 — my answer)

Recommendation: no. A cohort (lứa = programme-entry year) is a cross-school analytic cut, not a document owner — its data is a roll-up of per-student and per-school records. A cohort docs folder would be almost always empty and would create a third filing location for documents that already belong to a school or a child. Keep the cohort as a register Doc only (So-tay-lua/<programme>-<năm>, backed by register_entry scope_type=cohort), which reads the register and rolls up the per-student data. If a genuinely cohort-level document ever appears (a cross-school cohort event write-up), file it lazily under So-tay-lua/<programme>-<năm>/ then — do not pre-build the tree.

1.5 The handbook (sổ tay) v2 — comprehensive (point 3)

The handbook is the durable, human-readable single-child record. Current so_tay.py produces 5 sections (Hồ sơ · Ghi danh · Học bạ tất cả các năm · Sổ phúc lợi · Nhật ký tiến độ), already all-years and chronological (rule 3 ✓). Review findings + target v2:

Handbook sectionTodayv2 target
1 · Hồ sơ (identity)id, school, class, programme, cohort, entry, status+ exit reason, carer + phone, address, student code, view-class
2 · Ghi danh (enrolment)programme/school/cohort/dates/status+ exit reason per row
3 · Học bạ — tất cả các nămall years, chronological, GPA trend, source filename+ resolve source_ref→ source_doc: title + Drive link + OCR-disclaimer flag
4 · Sổ phúc lợi (benefits)lifetime, date/category/amount/source+ petal anchor + receipt link (point 12e)
5 · Nhật ký tiến độ (journal)chronological, raw_link only+ resolve provenance via source_doc (title + link), not just a bare link
6 · Tài liệu gốc (NEW)index of every source_doc for this child: title · doc_type · kind · transcription flag · Drive link · date added. This is "references to other original files + their details."
7 · Theo dõi an toàn (NEW, gated)watchlist signals — coordinator/safeguarding audience only
8 · Nhật ký cập nhật hồ sơ (NEW)the record-update journal (§1.6)
footernonecompiled-by / compiled-at line

Slug / layout checks (point 3): folder Ho-so/<HSYYYYNNNN> — <Tên> and Doc So-tay · <HS…> · <Tên> are consistent and idempotent (same student_code → same folder/Doc, upsert-in-place). One real risk to fix: student_code is minted lazily on first handbook upsert (ensure_student_code); if a bulk job runs the handbook before enrolment is recorded the code timing can wobble. Fix: mint the student code at enrolment (ghi_danh / roster apply), not at handbook time so the folder name is stable before any handbook write. The handbook is a live record (no version history) — accept that; the report (report_child) remains the versioned snapshot.

1.6 The record-update journal & the two timestamps (point 9)

The model must distinguish two timestamps and never collapse them:

benefit was delivered on a date). Already captured: profile_entry.entry_date, school_report.term/period_month, benefit.date_delivered.

(source_doc.added_at, source_file.processed_at, school_report.extracted_at) but missing on profile_entry and benefit (they only carry the event date).

Two changes:

and benefit, set at write time. No backfill of history required; new writes populate it.

chronological log built by UNION-ing the ingest timestamps (source_doc.added_at, profile_entry.created_at, benefit.created_at, school_report.extracted_at, source_file.processed_at) → "on <ingest-ts>, <what> was added to this record from <source file / doc_id>." No new table; it reads existing columns. This gives the audit answer "when did we learn X, and from which original file?" that the handbook can't currently produce.

1.7 Bucket naming — drop the 10_ (point 7 — my answer)

The bucket prefix is still care/10_Ho-so-thu-huong/ while the Drive band is now 2_Hoc-sinh; the 10_ is genuinely misleading. Recommendation: rename to care/ho-so-thu-huong/ (drop the numeric prefix; the prefix only ever pinned Drive sort order, which the bucket doesn't need). But it is a supervised live-data move, not a cosmetic edit — the bucket is the sole copy of beneficiary data and every tool reads it via profile_store.CARE_ROOT + HMT_BENEFICIARY_ROOT. Do it as an isolated, gated track (Track G): backup → quiesce hub → gcloud storage move objects under the new prefix → update the constant + env + redeploy → verify → delete old prefix last. Low priority; pure clarity. (Do not rename the Drive band; that already happened and the two namespaces are deliberately decoupled.)

1.8 Where the rules are recorded (points 1, 10)

doc that every care tool reads. This plan's §1 becomes the v2 edit to that file once approved (school-first layout, candidate-folder move, handbook v2 sections, source-of-truth precedence, the two-timestamp rule). Until approved, the rules live here as a reviewable draft; care-data.md is not edited silently (it overrides agent behaviour).

per-student / consolidated / multi-scope doc-type table) — unchanged, still the reference.

folder map) so a coordinator filing by hand follows the same model. Generated from the care-data.md markdown (same pattern as the Drive _DOC-TRUOC / _QUY-TAC-DAT-TEN reference Docs).

agree, so the rules are checked, not just written (point 1).

1.9 2_Hoc-sinh — the organized target tree (point 11)

2_Hoc-sinh/
  _Huong-dan                       # filing-rules index Doc (the recorded rules, human copy)
  Ho-so/<HS…> — <Tên>/             # per-student: handbook + Tai-lieu-goc + Khao-sat
  Truong/<trường>/<năm-học>/       # per-school register Doc + consolidated docs by year (§1.3)
  So-tay-lua/<programme>-<năm>     # per-cohort register Doc only (§1.4)
  Khao-sat/<ung-vien>/             # PENDING candidates only (moves into Ho-so/ on admission)
  Bao-cao/<YYYY>/                  # generated reports

Cleanup folded into Track E: retire the "already-removed" Tai-lieu-so-hoa-OCR/ band, fold Tai-lieu-tong-hop/<năm>/<trường>/ and the stray CETB 2025-2026 tree into Truong/<trường>/<năm-học>/, delete the _ / _/_ artifact folders.

1.10 Document naming convention (point 2) — review & extend naming-convention.md

Today naming-convention.md covers event slugs, renditions and post-bundles but says nothing about care source documents, so files land with whatever name the field gave them. Add a care-document rule and have intake rename on file (ASCII kebab, diacritics dropped, đ→d, lowercase, like the rest of the convention):

Doc scopeFilenameFiled to
Per-student<HSYYYYNNNN>__<doc-type>__<năm-học>[__<seq>].<ext>Ho-so/<HS…> — <Tên>/Tai-lieu-goc/
Consolidated<trường-slug>__<doc-type>__<năm-học>[__<seq>].<ext>Truong/<trường>/<năm-học>/
Candidate<ung-vien-NNNN>__<doc-type>[__<seq>].<ext>Khao-sat/<ung-vien>/ (→ student on admission)

<doc-type> is from the taxonomy (hoc-ba, so-diem, se, giay-khen, khao-sat, bien-ban-tai-tro, …); <ext> is the original extension, any format (point 4). The original name is preserved in source_doc.original_name so the rename is reversible and traceable. N7 covers the naming-convention.md edit; N8 (intake sweep) and N1 (file-student-doc) apply the rename.


2. What already exists (reuse) vs new / changed

Earlier migrations shipped most of the machinery. Reuse it; build only the gaps.

Already built — reuse as-is

CapabilityTool / job
Copy source archive → Drive + catalog source_doc + bucket stubcare_migrate copy-source
Move ONE file into a named student's Tai-lieu-goc/care_migrate relocate-one
Move legacy per-student OCR/survey Docs into matching folderscare_migrate relocate-legacy
Mint source_doc from raw_link, set source_refcare_migrate backfill-provenance
Seed school/cohort registers + emit register Docscare_migrate build-registers
Apply gated coordinator fixes (grades, identity, phantoms)care_migrate reconcile
Read-only Drive tree / migrated-vs-not classifycare_migrate drive-ls, migrate-status
Re-OCR scans, Gemini default, ledger skip-logiccare_reocr (--write-db)
Roster ingest (dump/propose/apply, school-scoped dedup, code mint)care_roster
Structured extractors (hoc-ba, so-diem, chuyen-can, hop-phu-huynh, ket-qua-thi, se, giay-khen)school_doc_extract
Classify → route → extract → file drops; _needs-review.md triagecare_intake_convert
Handbook + register Doc upsert, brandedso_tay
Candidate admit → mint student id + child row + first notesurvey_intake.admit_candidate
OCR engine seam (Gemini 3.1-pro-preview default via HMT_OCR_ENGINE)gemini_ocr
The migration ledgersource_file table
Deployed Cloud Run jobsintake-convert · care-migrate · reocr · roster · index-rebuild · consistency-check · watchlist-monthly · watchlist-keyword · area4-emit

New or changed (this plan's build list)

#Build / changeFor
N1care_migrate file-student-doc — copy original (any format, point 4) → Tai-lieu-goc/ under the convention name (§1.10) + .txt mirror + source_doc kind=student + source_file rowTrack B (point 12a, D1)
N2so_tay handbook v2 — sections 6 (Tài liệu gốc index), 7 (watchlist, gated), 8 (update journal); source-link resolution on §3/§5; petal+receipt in §4; exit reason; compiled-at footer§1.5 (point 3)
N3School-first relocate — a care_migrate action (extend relocate-legacy) to fold Tai-lieu-tong-hop/<năm>/<trường> into Truong/<trường>/<năm> and rebuild paths§1.3, Track E (point 11)
N4survey_intake.admit_candidate — move the whole Khao-sat/<ung-vien>/ folder into the student folder + catalog survey as kind=student§1.2 (point 2)
N5Schema additivecreated_at ingest ts on profile_entry + benefit; mint student_code at enrolment not handbook time§1.5, §1.6 (point 9)
N6student_code as the single id — re-key the DB (journey_idstudent_code, drop the column, re-point FK across the ~7 tables) + bucket prefix 10_Ho-so-thu-huong → ho-so-thu-huong + rename per-student machine files to the student code (HS….md / Tai-lieu-goc/HS…/); update profile_store.CARE_ROOT + env + profile_path§1, Track G (points 1, 7)
N7care-data.md v2 edit + the care-document naming rule in naming-convention.md (§1.10) + the _Huong-dan human Doc§1.8, §1.10 (points 1, 2, 10)
N8Standing care-intake sweep (periodic, renames files on file) + enable the email lane for care§7, §1.10 (points 2, 13)
N9Handbook lifecycle — handbook on admission (wire admit_candidate/roster → so_tay) + a standing handbook-refresh job that regenerates changed students' handbooks§11 (point 8)
N10care_migrate source-coverage — walk each lnquang source band, reconcile every file against the source_file ledger, report "in source but not migrated"§11 (point 9)
N11Benefit-list extractor (only if absent) — parse the per-cohort benefits file → cohort/school register_entry funding + per-student benefit rowsTrack K (point 12e)

3. The gaps (verified counts, 2026-06-12)

GapRuleStatus
G1 — children with no folder/handbook1CLOSED (re-run: 142→211 folders, 1:1 with DB)
G2 — per-student originals not filed3 a/b/c172 source PDFs; filed 0; kind=student = 1 → Track B (needs N1)
G3 — per-student docs not OCR'd (5 schools)3 donly Đại An done (43 ledger); ~145 PDFs (Tam Thanh, Vĩnh Hào, Tân Khánh, Nguyễn Bính, LTV) → Track C
G4 — Hoa Sữa (NBTL) trainees not loaded1, 37 source_doc, 0 trainees; ~81 need a split-name parser → Track D (held)
G5 — Drive disorganisedfile-by-ownershipretired OCR band still present; Tai-lieu-tong-hop vs CETB split; _/_/_ artifacts → Track E (now the school-first restructure)
G6 — no real migrated-vs-not register5source_file = 43 (Đại An only); copy-source/roster loads unrecorded → Track F
G7 — model not the canonical rule yet1, 10§1 not yet in care-data.md; no human filing Doc → N7
G8 — no standing compliant intake1future docs still rely on manual catch-up → §7 / N8
G9 — per-cohort benefit files not ingested2/3, 2beach lnquang cohort folder holds a benefits file of per-student support; the facts are needed in handbook §4 but were never extracted → Track K. (The standalone Hỗ trợ tài chính folder is out of scope.)

Per-student originals by school (source PDFs): LTV 24-25 = 27, LTV 25-26 = 13,

Tam Thanh 25-26 = 30, Tân Khánh = 27, Vĩnh Hào = 30, Đại An = 27, Nguyễn Bính

= 18. CETB 2023-2024 has no per-student PDFs (rosters/KQ only) — its originals

gap is empty by nature.

What "c — source_doc kind=student THIẾU. Toàn DB chỉ có 1 dòng kind=student" means (point 12d): source_doc.kind has exactly two values. consolidated = a group document (roster, KQ sheet, SE batch) → one catalog row that fans out to many per-student profile_entry/school_report rows. student = a document about one child (học bạ, essay, certificate, letter) → one catalog row. Almost everything loaded so far came from consolidated sources, so the whole DB has 1 kind=student row. The consequence: the per-student-document provenance layer is essentially absent — a child's record cannot point back from a fact to that child's own original document. Track B creates one kind=student row per filed original, closing it.


4. Work tracks (built on §2)

All writes run on Cloud Run (care-migrate-run), one DB writer at a time, each preceded by backup + hub quiesce + scheduler pause, verified after (rule 5). Re-pull the live DB immediately before every write.

handbooks; cohort/school reports regenerated. No further action.

source file to a student (name + class, school-scoped); copy the raw original (any format, point 4) into Ho-so/<HS…> — <Tên>/Tai-lieu-goc/ under the convention name (§1.10), write the .txt mirror, add source_doc kind=student + source_file. Includes Đại An (file its 27 originals; only OCR was done). Needs N1. File originals for all docs regardless of OCR status (point 12a).

school, care_reocr … --write-db → readable OCR Doc in Tai-lieu-goc/ + SE/essay profile_entry.detail. Skip anything already done in the ledger (migrate-status first) so nothing is re-billed (point 12a). Engine: Gemini 3.1-pro-preview for both printed and handwriting — the Đại An handwriting A/B passed (on par with Claude, ~2.3× cheaper), so the earlier "keep Claude for handwriting" caveat is retired (point 12b). Captions stay Claude (not OCR). Meter per school, dry-run-sample first.

(concatenate first/last columns, province Lào Cai/Sa Pa, capture CMND/DOB/ carer/hardship) → ~81 trainees under Nâng bước tương lai, codes HVK<k><NNN>, code block 8xxx. Approved into this round (D12); build the parser, load after A–C.

Tai-lieu-tong-hop + CETB 2025-2026 into Truong/<trường>/<năm-học>/; retire the OCR band; delete _/_/_ artifacts. Needs N3. Drive-mostly; rebuilds source_doc.drive_url/bucket_path paths it moves.

the earlier copy-source + roster loads; run migrate-status over every band for one authoritative migrated-vs-not view.

One supervised migration: re-key the DB from journey_id to student_code (re-point the FK across child + the ~7 child-keyed tables, then drop the journey_id column), move 10_Ho-so-thu-huong → ho-so-thu-huong, rename each per-student machine file/folder to the student code (HS….md, Tai-lieu-goc/HS…/), update profile_store.CARE_ROOT + env + child.profile_path; backup → verify → delete old paths last. Freeze student_code at mint (immutable key) and confirm Stream C is independent of the care key first. Do this FIRST (Wave 1, before Track B) so every later write keys natively on student_code and nothing is re-keyed twice. Needs N6.

211 handbooks. After Tracks B/C so the new Tài liệu gốc + journal sections have data to show.

future admission. Backfill the existing candidate space: normalise pending Khao-sat/<ung-vien>/ folders to the rules (naming + catalog), and move the folders of already-promoted candidates into their student folders (the pre-N4 admits that left their survey package behind).

(care-data.md v2 + _Huong-dan Doc) and N8 (the care-intake sweep + email lane). This is what makes the model stay compliant.

Each lnquang cohort folder contains a benefits file (per-student support for that cohort). Parse it → fan out per-student benefit rows (category, amount_vnd/amount_inkind, petal, funding_source, receipt_path/ ledger_path), matching each student school-scoped; catalog the file once as kind=consolidated under the matching Truong/<trường>/<năm> (or the cohort register) + a register_entry funding row. The facts surface in handbook §4 (petal + receipt link, via N2). Care side only — the donor match / receipt issuance (Area 3/4) and the standalone Hỗ trợ tài chính folder are out of scope. Tooling: a consolidated benefit-list extractor (N11) if school_doc_extract lacks one.


5. Synchronisation discipline (every write follows this)


6. Sequencing

re-key + bucket recode + drop journey_id, N6) → B (file 172 originals + catalog + mirror, needs N1) → F (ledger backfill). G goes first so B/F write natively on student_code; then rule-3 a/b/c is done.

apply, billed school by school. Closes rule-3 d. K (benefit ingestion) rides here — scanned receipts may need OCR; the structured fan-out to benefit rows is cheap.

(candidate move) → J (care-data.md v2 + standing intake). Do H after B/C so the new sections have data.

(school-first restructure). (G moved to Wave 1 — it is now the identity re-key, a foundation for everything after.)


7. Standing intake — keeping it compliant in future (point 13)

The backfill is one-time; future documents must land in the model without a catch-up. The pieces mostly exist — wire them into a periodic sweep.

How a document gets in (channels):

deterministic routing. Best for school sheets and per-student docs.

ingest-watchcare_intake_convert. Good for bulk/ad-hoc.

mirroring cloud/finance_mail/. Lets coordinators forward a school's scans straight in. Recommended next channel — lowest friction for the field.

later if a structured care-doc form is wanted.

Periodic jobs (Cloud Run; all currently manual under Phase A):

JobCadence (proposed)Does
care-intake (sweep, N8)daily/weeklypull new docs from drop + hub (+ email when live) → classify → route → extract → file → source_doc + source_file; unmatched → _needs-review.md triage
watchlist-keyworddailysafeguarding keyword sweep over new notes
watchlist-monthlymonthlyfull watchlist sweep
consistency-checkweeklyassert the three places agreeHo-so/ count == child count == folder count; every source_doc has a file; every fact has a source_ref; flag drift
index-rebuildon changerebuild the read index from the live DB

consistency-check is the enforcement arm of §1.8: it turns the rules from written into checked. Phase A is untouched — these are internal care jobs, not publishing; nothing reaches Facebook/website, no channel-publisher cron.


8. Constraints & risks

clobbered each other — re-pull before apply.

school, keep the ledger so a partial run resumes. Gemini cuts this ~2.3×.

ambiguous matches go to triage, never auto-merged.

backup + verify + delete-old-last; isolate from other writes.

the fact** — do not copy identifying medical / precise-circumstance detail into the handbook body (beneficiary-care-subplan §3). Handbook §7 watchlist is coordinator/safeguarding audience only.

write back out. Phase A untouched (care data, not publishing).


9. Recommendations & best practices (point 14)

OCR Doc, .txt mirror and DB rows are derived and regenerable. Never edit an original; never let a derived copy become the only copy.

school_report / benefit must carry source_refsource_doc → original file. Make the intake sweep refuse to write a fact with no source (soft gate → triage), so the chain never breaks again.

event order; the update journal reads ingest order.

is the dedup key in source_file so re-ingest is idempotent and edits are detectable.

ledger is what lets a billing-interrupted OCR run resume.

reuse the 8 care_migrate actions before writing a new one.

three places agree — the rules are enforced, not just documented.

safeguarding section is gated; the handbook is access-controlled by Drive permissions, not a watermark.


10. Decisions — administrator-approved 2026-06-13

All recommendations accepted. D1–D11 and D13–D18 are approved as

recommended; D12 = do it (Hoa Sữa in scope this round); **D19 = student_code

becomes the single id** (administrator override; journey_id retired). Recorded

below for the record; these now drive the build, no longer "open".

yes.*

(vs a PDF→Doc conversion)? Rec: raw PDF + .txt.

Claude-for-handwriting caveat — A/B passed)? Rec: yes.

E), replacing the year-first Tai-lieu-tong-hop tree? Rec: yes — it fixes the G5 split and point 11.

student folder (N4)? Rec: yes.

G) now, or defer as cosmetic? Rec: do it, but as an isolated gated track.

Rec: yes.

once D1–D8 land? Rec: yes — this is "ensure compliance in future" (point 1).

(N8, §7)? Rec: yes — email next, sweep weekly.

benefits file inside each cohort folder → per-student benefit rows + handbook §4; the standalone Hỗ trợ tài chính folder and the Area-3/4 donor side stay out of scope. Rec: yes — the handbook is incomplete without it.

split-name parser and load the ~81 trainees after A–C.

(HS….md, Tai-lieu-goc/HS…/); journey_id dropped (N6; see D19, point 1)? Rec: yes.

on file/intake (N7/N8, points 2, 4)? Rec: yes.

+ handbook-refresh + watchlist) on Cloud Scheduler — care jobs are cron-eligible; only channel-publisher is gated in Phase A (point 5)? Rec: yes.

handbook-refresh (N9, point 8)? Rec: yes.

Rec: yes — it is the only proof nothing was dropped.

— verify + ledger them as fully-migrated; the coordinator decides on their own personal Drive? Rec: yes (we have read-only access and it is the upstream origin; deletion is neither ours to do nor wise).

journey_id; re-key the DB and drop the column (Track G, Wave 1). Guardrails: freeze student_code at mint (immutable key) and confirm Stream C is independent of the care key before the drop. (This overrides the earlier keep-as-hidden-surrogate recommendation; the administrator confirms no PII concern in scope.)


11. Operations — triggering, footprint, future handbooks, verification

11.1 When/how jobs run (point 5)

Care jobs are Cloud Run Jobs, today run on-demand (gcloud run jobs execute / Cloud Run UI / a hub button). Care jobs are cron-eligible — Phase A only forbids a channel-publisher cron; an internal care job that never publishes is fine on a schedule. Proposed triggers:

Asia/Ho_Chi_Minh) and on-demand right after every migration write.

only**, never scheduled (one-time backfill work). Registering these is a /setup-schedules change; it must still refuse any channel-publisher job (autonomy-phase.md).

11.2 Operating footprint — is the tool/job count a problem? (point 7)

Not inherently — all care jobs share one container image, dispatched by the HMT_CARE_JOB env var (entrypoint.py), so more jobs ≠ more image builds; a job definition is thin. Keep it lean by:

care_roster) are backfill-only — on-demand, no schedule. The standing set is small: care-intake, handbook-refresh, consistency-check, watchlist-keyword/-monthly, index-rebuild, area4-emit.

definitions** can be removed (the tools stay in the repo, runnable ad-hoc) to shrink the deploy surface.

11.3 Handbooks for new & changing students (point 8)

care_roster apply) to call so_tay.upsert_native_doc so every new student gets a handbook immediately (N9). Mint the student code here too (§1.5).

handbook for any student whose data changed since last compile (using child.last_updated / the new created_at cols). The upsert is idempotent and cheap, so a daily/weekly pass keeps every handbook current after each intake-convert write. A hub "regenerate handbook" button covers one-offs.

11.4 Verifying nothing is missing vs the source (point 9)

After each wave, run care_migrate source-coverage (N10): it walks each lnquang source band read-only (drive-ls), lists every file, and reconciles against the source_file ledger — every source file must have a ledger row with a terminal result (written/skipped/held); any source file with no ledger row = a missed document, reported per folder. The report is the migration's completion proof and feeds Track F's authoritative migrated-vs-not view. Pair it with consistency-check (three-place agreement) for full coverage: one checks source → us, the other checks us internally consistent.

11.5 Disposition of the source folders (point 6)

Do not delete the lnquang source after migration. Three reasons: (1) the SA has read-only access to a personal Drive — it is not ours to delete; (2) it is the upstream origin we reconcile against (§11.4) — deleting it removes the ability to re-verify; (3) the firewall is read-into only. Instead: once source-coverage reports 100% and the ledger marks the band fully-migrated, the workspace copy becomes the operating source of truth, and whether to archive the personal-Drive folder is the coordinator's call on their own Drive, not a workspace action. (Folder IDs are kept out of this public page per the redaction rule.)


12. Build & run status (2026-06-13 overnight)

Branch claude/website-redesign (the care v2 foundation lives here; basing off main would orphan it). Full run log: _system/runs/care-storage-v2-overnight-2026-06-13.md. Lesson saved: memory project_care-storage-v2-overnight.

Done (shipped + verified)

ItemStatus
N1 file-student-doc(s)✅ built + selftest
N2 handbook v2 (§6/§7/§8 + source resolution)✅ built; 211 handbooks regenerated live
N3 relocate-school-first✅ built + selftest
N4 admit → handbook + Khao-sat move✅ built + tests
N5 created_at ingest timestamps✅ built; live DB migrated to add the columns
N6 care_rekey (Track G DB half)✅ built + PROVEN on a copy of the live 211-child DB (counts preserved, FK-clean, journey_id dropped) — STAGED for live
N7 care-data.md v2 + naming rule✅ written
N9 mint-at-admission + handbook-refresh job✅ built
N10 source-coverage✅ built + selftest
google-genai in care-jobs image✅ added (Gemini was ImportError-ing)
Deploy: care-jobs image + Gemini wiring on care-migrate-run✅ rebuilt + wired (verified)
Track C OCR (Gemini)130 docs transcribed (~$2.1), source_file 43→154, 24 held; the 24 held resolved 2026-06-13 session 2 (§12.1)
Track H handbook v2 regenall 211 regenerated as native Google Docs
Full test suite✅ 671 passed; ruff/mypy clean; 9 commits (overnight)
N12 school-scoped re-OCR + acronym index + entrypoint --schoolshipped + applied live (§12.1); 24 held → 15 filed + 9 consolidated held
Track D parser split-name + column overrides✅ built + tests (the ~81-trainee load is still staged)
Audit step 8 (data-consumer)◑ run; detail-parity patched (handbook §5 + report digests); punch-list staged for a redeploy
Track B — file per-student originals (CETB 2025-26 essays)DONE all 6 schools (§12.2); source_doc kind=student 1→138; +2 fixes (op-aware skip 27fea0f, idempotent copy cfabd7c). SE forms + older years + no-match typos = later passes

Next steps (staged — supervised session, all live DB writes)

Do these one DB-writer at a time under the sole-writer + backup + verify discipline (§5 + the lesson in §12 of the run log — management-hub --min-instances=0, drain care-dashboard, pause the 6 schedulers, raise the job task-timeout, verify the ledger persisted after). Suggested order:

Shipped (school-scoped pool + 1-char fuzzy + a display-name acronym index so THPT LTV - Bài viết resolves to thpt-luong-the-vinh) + entrypoint HMT_REOCR_SCHOOL→--school; deployed in the care-jobs image and applied live: all 24 held resolved. (Track B's file-student-docs already school-scoped.)

essays were filed 2026-06-13 (12 via the school-scoped batch, 3 via coordinator-confirmed forced execs — see §12.1); source_file written 130→161, 0 ambiguous left. Still pending: filing the 172 raw per-student originals** (file-student-docs --school <slug> --school-year 2025-2026, dry-run → --apply) to create the source_doc kind=student rows, then handbook-refresh so §6 populates. The 24-held resolution (for the record):

--journey <jid>, or let N12's school-scoped pass do it automatically): 7A Nguyễn Thùy Linh→0032, 7A Nguyễn Thị Thuỳ→2022, 7B Nguyễn Hải Đăng→6026, 7B Nguyễn Ngọc Gia Huy→2017, 7B Trần Nhật Minh→6004, 9A Lê Thị Diệu Linh→6017, 9B Lê Thị Diệu Linh→0008 (two different girls, same name, different schools), 10A3 Trần Thị Huyền→0020, 11A4 Phạm Nhật Minh→1026, 11A4 Vũ Khánh Vân→0023, 6B Nguyễn An Minh Châu→0028, 9A Vũ Gia Bảo→2012 (confirm: DB class 8A vs file 9A); Lại Hia Huy→6024 Lại Gia Huy (coordinator-confirmed), Ngô Công Thiệu→6016 Ngô Công Thiện (1-char typo, confirm), Trần Viết Minh Tiềm→6027 Trần Viết Minh Tiền (1-char typo, confirm). (journey-ids; drop the journey-` prefix shown.)

the Track K funding source; KQ HK1 is already in school_report; 2× TDTT are school-level. Leave them out of per-student filing.

"Tài trợ <năm>" subfolder docs, NOT the "Cùng em tiến bước" rosters), then build the extractor → benefit rows + register funding → handbook §4.

K1/K2/K4 need the split-name + header-offset parser (institutional letterhead).

the ~650-ref journey_idstudent_code code refactor (110 hub + 540 tools) + Drive/bucket recode (10_Ho-so-thu-huongho-so-thu-huong, Ho-so/journey-NNNN.Ho-so/HS….) + CARE_ROOT/env + hub redeploy, with the hub exercised after the cutover.

Now that all relevant data is migrated, verify every function/surface that uses or involves the care data actually sources it and presents it necessary + sufficiently — reports are one consumer, not the only one. Build a single consumer → source map (each surface × which tables/columns it reads) and flag any surface that under-sources, or any migrated source that no surface reads. The consumers to audit:

a. Reports (report_child, report_school, report_cohort) — per report section, confirm it reads the right table/column and surfaces:

ghi_danh history (programme transitions, lứa ≠ năm-học kept distinct).

chronological, GPA trend, with source-stamp.

OCR'd essays + 3-perspective SE notes from Track C), not just summary_line.

(incl. the Track K per-cohort benefits once loaded); school/cohort funding rolls up from benefit + register_entry funding rows, not recomputed.

a Tài liệu gốc view lists the child's filed originals (Track B output).

in the school/cohort reports; typical-case ("trường hợp tiêu biểu") picks.

a trailing slice); cohort = programme-entry year, năm-học is a separate cut.

b. Handbook (so_tay v2) — confirm §3 transcript, §4 benefits (petal + receipt), §5 journal-with-provenance, §6 Tài liệu gốc, §8 update journal all populate from live data once Track B/K land (re-run handbook-refresh). c. Hub views (management-hub + care-dashboard) — the student list, the child profile page (transcript, notes incl. OCR detail, benefits, provenance links), Phân tích (analytics/charts), Cảnh báo (watchlist), and search each show the migrated data; no view silently drops a source. d. Read index (beneficiary_index_rebuild) — includes every migrated student + the new fields; re-run after the data tracks. e. Safeguarding watchlist (safeguarding_watchlist) — scans the new OCR profile_entry.detail (essays/SE), not just summary_line, so signals in the transcribed text are caught. f. Area-4 / finance emit (benefit_report / area4-emit) — includes the Track K per-cohort benefits in the donor/finance roll-up. g. Consistency-check (beneficiary_consistency_check) — asserts the v2 invariants over the now-complete data: Ho-so/ count == child count, every source_doc has a file, every fact carries a source_ref; extend it if a new invariant (e.g. every filed original has a kind=student row) isn't checked. h. Registers (So-tay-truong / So-tay-lua Docs) — roll up the migrated per-student data, don't recompute from scratch.

Method: produce the consumer → source map; flag under-sourcing + orphaned sources; patch the offending function; then regenerate a sample of each surface (a child report, a school report, a cohort report, a handbook, the hub child page, the index, a watchlist run) and read each for sufficiency — a coordinator/principal/trustee can act from it without opening raw files. The handbook is the per-child companion; the report is the versioned snapshot; keep every surface's sourcing in sync so nothing migrated goes unseen.

12.1 Session 2 — N12 + remaining OCR + data-consumer audit (2026-06-13)

Shipped + applied live:

cross-school namesakes) + 1-char-typo fuzzy fallback + a display-name acronym index (THPT LTVthpt-luong-the-vinh) + entrypoint HMT_REOCR_SCHOOL→--school. 3 commits. Track D split-name / column-override roster parsing — 1 commit.

(confirmed in-cloud), then backup → quiesce → 4 school-scoped batches (12 essays) + 3 coordinator-confirmed forced namesakes. source_file written 130→161, 0 ambiguous left; the 9 still-held are the consolidated non-essays (6 BB tài trợ = Track-K source, KQ already in school_report, 2 TDTT) — correctly excluded. child 211 / profile_entry 542 / school_report 293 unchanged. Backup care/_backups/20260613-pre-reocr-n12/.

before any so-dang-ky.sqlite write; forced namesake mappings need explicit per-file coordinator confirmation (the matcher holds, never guesses); run forced execs AFTER all batches (a batch re-running its folder re-holds a namesake and reverts an earlier forced written mark — Doc survives, ledger reverts); wrap each gcloud run jobs update/execute in a DNS-flap retry and never execute after a failed update (stale env).

DB done live (corrects hub/reports/search/CSV). Drive folder + handbook Doc rename deferred — local Drive write is unavailable (box ADC is cloud-platform-scoped; only the SA key in Cloud Run writes Drive). Do it as a Cloud Run folder-rename step (NOT handbook-refresh, which would duplicate the folder); ids in memory project_n12-reocr-remaining-done.

Data-consumer audit (step 8) — fixes required

Ran the §12 step-8 audit (3 read-only auditors over reports / hub+index / watchlist+area4+consistency+handbook). Root cause: the migrated OCR'd note body (profile_entry.detail) was rendered as only summary_line by several surfaces. report_child + the watchlist keyword builder already read detail correctly (audited, no change).

Patched (committed dd1c2d4; offline — needs a hub/care-jobs redeploy to go live):

to the LLM (capped so a long SE note can't bloat the aggregate prompt).

Still required (punch-list; most need a redeploy):

#FixFile(s)
A1Live hub child page: show detail in the journal; resolve source_refsource_doc (title+Drive link) instead of the bare ref+raw_linkapp/main.py child route + app/templates/child.html:72,74 (care-dashboard, not app/hub.py)
A2Surface benefit.receipt_path in report_child benefits table, the hub child benefits, and the area4/IATI emit (handbook §4 already shows it)report_child.py:350, child.html:52, benefit_report.py
A3Funding roll-up: add funding_source + petal to the register generator; have area4-emit also read register_entry kind='funding' (cohort/Track-K funding is under-counted reading benefit only)so_tay._fetch_benefits_for, benefit_report.py:79, report_school.py/report_cohort.py funding sections
A4Read index to include v2 fields (student_code as a lookup key, exit_reason, graduation_date, primary_carer)beneficiary_index_rebuild.py:43-53
A5consistency-check v2 invariants: a dangling-source_ref integrity check (not a presence check — pre-v2 facts legitimately lack it), "every source_doc has a file", "every filed original ↔ kind=student"; also update it off the OLD Ho-so/<jid>.md bucket path to the v2 Ho-so/<mã> — <Tên>/beneficiary_consistency_check.py
A6(low) register_entry has no live hub view; report_child transcript lacks a source-stamp column; roster list shows frozen class_label; watchlist dead profiles_root paramvarious

Full detail: memory project_data-consumer-audit-2026-06-13. Recommended: bundle A1–A5 + the journey-0028 Drive rename + the deferred detail-parity go-live into a single hub/care-jobs redeploy.

12.2 Session 3 — Track B (file per-student originals) DONE (2026-06-13)

All 6 schools' CETB 2025-26 per-student essay originals filed (Track B / G2, rule 3 a/b/c). source_doc kind=student 1 → 138 (dai-an 27, tam-thanh 29, tan-khanh 23, vinh-hao 28, luong-the-vinh 12, nguyen-binh 18 + 1 baseline); child 211 / profile_entry 542 / school_report 293 unchanged. Each child now has its raw original in Ho-so/<HS…>/Tai-lieu-goc/ + a kind=student catalog row (the source-of-truth + handbook §6 layer §1.1 said was absent). Ran on Cloud Run (file-student-docs per school, --school school-scoped) under backup + quiesce. Backup care/_backups/20260613-pre-trackB/.

Two fixes shipped (pushed):

ledger marked done, but the Track-C OCR pass records each essay as op='reocr' 'written' (→ 'done'); without the fix Track B would file nothing. Now it skips only on a prior FILING (op='copy-source').

of duplicating, so an interrupted/clobbered filing re-runs clean.

Op-lesson (cost a clobber): the first run restored the hub in the same breath as the last job (Đại An) → a hub instance raced gcsfuse and rolled back Đại An's 27 source_doc rows (Drive copies survived). Rule: after a care-DB job, settle + re-download + VERIFY counts persisted, THEN restore the hub. Recovered by re-running Đại An + verify-before-restore. (memory project_trackB-in-progress-resume.)

Outstanding (next sessions), in suggested order

care_benefit_extract + filed all 6 schools' funding agreements live.

the shipped split-name/header-offset parser (dump → set cols → propose → apply); codes HVK<k>, block 8xxx, programme nâng bước tương lai.

journey-0028 Drive folder/Doc rename + the deferred detail-parity go-live (commit dd1c2d4) + the ~27 Đại An duplicate-orphan PDF cleanup (delete Tai-lieu-goc files not in any source_doc.drive_url; needs Cloud-Run/SA Drive write).

refactor + bucket recode + hub redeploy, hub exercised after. Its own session.

138 filed originals + everyone), not per-track.

--file-id --journey, per-file confirm), the SE forms (Cảm xúc xã hội subfolders), and CETB 2024-25 / 2023-24 essays = later file-student-docs` passes.

After the data lands: re-run source-coverage + consistency-check + the data-consumer sufficiency check (regenerate a sample of each surface).

12.4 Session 5 — data-consumer audit fixes A1–A5 (code) + Track D findings (2026-06-13)

Audit fixes A1–A5 SHIPPED (code on main; 687 tests green; A6 deferred low-pri). They go live on the next hub/care-jobs redeploy (+ a live schema migration).

amount_vnd/petal/funding_source (ensure_register_columns); care_benefit_extract writes the funding total; benefit_report (Area-4), report_school, report_cohort, and the so_tay school/cohort registers all sum register_entry kind='funding' so the 1.325 tỷ Track-K scholarship funding surfaces (the per-student coverage rows are amount_vnd=NULL by design, so before this they rolled up as 0đ).

child benefits. A1 hub child page renders the full detail body + resolves source_ref→source_doc (title + Drive link) + receipt link. A4 read-index carries student_code/exit_reason/graduation_date/primary_carer (findable by code). A5 consistency-check gains a dangling-source_ref integrity check (the file-existence + v2-path checks need Drive/bucket access → deferred to the care-intake job + Track G).

Live steps still owed for A1–A5 (bundle into the redeploy): run ensure_register_columns on the live DB (additive migration) + populate the 6 existing Track-K funding rows' amount_vnd (re-run care_benefit_extract apply over the 6 reviewed plan JSONs — idempotent; it now backfills amount_vnd on the existing rows) + deploy hub/care-jobs.

Track D (Hoa Sữa NBTL) — prepped, NOT loaded; needs a parser fix + a decision. Structures mapped (read-only): K3 clean (hdr row 0, name-col 1, sheet Danh sách, 8 trainees); K1 hdr row 7, split name (name-col 4 + given-col 5), sheet Danh sách toàn khóa, Khóa 1 (2023); K2 hdr row 3, split name (name-col 2 + given-col 3), sheet Danh sách HS, Khóa 2 (2023); K4 hdr row 7, split name (name-col 5 + given-col 6), 3 trade sheets Á/Bánh/Bàn, Khóa ~4 (2024). Per Khóa: code HVK<k><NNN>, id-block 8<k>00–8<k>99, programme nang-buoc-tuong-lai, school hoa-sua (Hà Nội). Blocker found: propose over-captures trailing footer rows as students (e.g. Người lập biểu = "prepared by") — care_roster needs a stop-at-footer / require-numeric-STT guard before any load. Decisions: trainees are adult, past, completed/dropped 2023–24 cohorts — load status derived per-row from Ghi chú (Đã hoàn thành→graduated/da-ket-thuc; Bỏ học/Nghỉ học→withdrawn; else active), which needs a small status-parse in care_roster. ~81 trainees total.

12.5 Session 6 — Track K is MULTI-YEAR; 2023-24 + 2024-25 loaded live (2026-06-13)

Track K (which had done only 2025-26) is multi-year: each CETB <năm> source folder has a Tài trợ <năm>/ subfolder of BB agreements + a school-totals summary sheet (no per-student amounts → coverage-row model holds). Loaded live the 7 prior agreements (5× 2024-25: Nguyễn Bính, Vĩnh Hào, LTV, Tam Thanh, Đại An; 2× 2023-24: LTV, Tam Thanh) via the same idempotent N11 pipeline + the A3 amount_vnd write. benefit 343→495 (+152 coverage), register kind=funding 12→19 (+7, all with amount_vnd), source_doc +7, source_file +7; child/profile_entry/school_report/ kind=student unchanged; FK-clean; backup _backups/20260613-pre-trackK-prioryears/. Scholarship coverage by year now 2023: 44, 2024: 108, 2025: 181.

matched no same-school child but resolved to an existing journey whole-DB by exact name — almost the whole Tam Thanh 2023 founding cohort (journey-0013–0023) was funded at Tam Thanh in 23-24/24-25 and is now filed under LTV (moved THCS→THPT). Their coverage is correctly attached across years (e.g. Vũ Khánh Vân scholarship = [2023,2024,2025]; Mai Tiến Vũ = [2024,2025]). 1 OCR typo (Trần Thanhh→Thanh Ngoan, coordinator-confirmed → journey-7002).

150M but Điều 3 itemises 137.4M (doc's own discrepancy — kept the committed 150M); Tam Thanh 23-24 states no per-semester amounts (installments null); LTV 23-24 has a blank date (→ 2023-09-01 fallback). Verified by reading the PDFs.

of 24-25, Đại An lớp 9; cohorts we never loaded) → reported, not created (coordinator's call): _system/runs/trackK-prioryear-absent-2026-06-13.md. The funding totals still count them. This is the wider gap: past CETB cohorts (23-24/ 24-25 graduates) are not in the care store; only current cohorts are.

12.6 Session 7 — Track D (Hoa Sữa NBTL) parser enhanced + 82 trainees loaded live (2026-06-13)

Parser (committed): care_roster gained (1) a footer-row guard (detect_stt_col + numeric-STT requirement — drops institutional signature lines like "Người lập biểu" that were polluting the list), (2) status-from-Ghi-chú (row_status scans the row text → graduated/withdrawn/paused/active, robust to column position), and (3) rich field capture (detect_fields + RosterStudent): DOB (ISO-normalised), gender, address, dân tộc, phone, hoàn cảnh, trình độ, parents — written to child columns + a searchable source='roster' intake note (vs the prior name/class-only thin records). open_enrollment gained additive trang_thai/ket_thuc/ly_do_roi. 687 tests green; verified live-read on K1/K3/K4.

Loaded live (82 trainees): K1 34 (HVK1), K2 21 (HVK2 — 1 deduped, same person across Khóa), K3 8 (HVK3), K4 19 (HVK4, across the 3 trade sheets Á/Bánh/Bàn). School hoa-sua ("Trường Trung cấp Kinh tế - Du lịch Hoa Sữa", Hà Nội), programme nâng bước tương lai, id-block 8100–8499, cohort years K1/K2=2023, K3/K4=2024. child 211→293; FK-clean. Status (from Ghi chú): 52 active, 23 graduated, 7 withdrawn + matching NBTL enrollments. Rich fill: DOB/address 71/82, parents on K4. 71 intake notes. Backup care/_backups/20260613-pre-trackD-hoasua/. Supervised local-apply→upload (DB + 82 profile .md), verify-before-restore, post-restore re-pull confirmed no clobber. Handbook Docs/folders come with the deferred handbook-refresh.

12.7 Session 7 cont. — N8 detector + Track E fixes + the "copy originals to Workspace" directive (2026-06-13)

Shipped (committed): N8 cross-school duplicate/transfer detector in beneficiary_consistency_check (the Mai Tiến Vũ pattern as a standing weekly check; name + matching DOB across schools → flag; 0 on the live DB). Track E fixes to relocate-school-first: _norm_nam_hoc strips a programme prefix ("CETB 2023-2024" → "2023-2024") + the resolver gains a display-name acronym ("THPT LTV" → thpt-luong-the-vinh). Track E dry-run = 79 moves, target verified safe (the Foundation Shared-Drive Tai-lieu-tong-hop 1Of8xB, NOT the lnquang personal source [nguồn lnquang] owned by lnquang2016@gmail.com).

Directive (administrator): copy the original files from lnquang's folders into the HMT Workspace at the relevant places (care-data.md rule 1 — the Foundation owns its readable copy; the workspace copy becomes the operating source of truth). Scoped against current state:

(1Of8xB — verified a real Foundation-owned copy, ids differ from lnquang); per-student essays 2025-26Ho-so/<mã>/Tai-lieu-goc/ (Track B, 138).

xã hội", ~180), per-student essays 2023-24/2024-25, the Track-B no-match/typo essays; then reorganize consolidated → Truong/<school>/<năm-học> (Track E apply). lnquang source per-school shape: <school>/ = loose consolidated sheets + a Bài viết/ folder (per-student essays) + a Cảm xúc xã hội/ folder (per-student SE). No separate per-student học-bạ scans exist (academic data is in consolidated KQ sheets), so essays + SE are the per-student originals to copy.

Bài viết/Cảm xúc xã hội folder (copies original → Tai-lieu-goc + bucket .txt + source_doc kind=student); relocate-school-first --apply for the reorg. All Cloud Run Drive-writes (local Drive-write unavailable) + DB catalog writes (backup/quiesce) + per-file namesake confirms → a focused supervised session.

12.3 Session 4 — Track K (school funding agreements) DONE (2026-06-13)

Built N11 care_benefit_extract (engine seam Gemini-default + school-scoped matcher + idempotent writes; selftest + 6 pytest tests; 687 suite green; commit on main) and filed all 6 schools' "Thỏa thuận tài trợ" (BB tài trợ) live.

the source "Tài trợ 2025-2026" folder (NOT the rosters; ids in source_file). Each is a school-level agreement: a school total + HK1/HK2 installments + a phụ lục of funded students, NO per-student amount.

register_entry kind='funding' (total + installments + funder + source_doc); each covered student gets a coverage benefit** (category=scholarship, petal=1, amount_vnd=NULL — no fabricated split, claims.md; amount_inkind=coverage text; receipt_path=the BB doc). BB doc cataloged once as source_doc kind=consolidated doc_type=bien-ban-tai-tro.

every school's HK1+HK2 == total. 6 schools, 1.325.000.000đ, 181 funded.

each with an exact-name candidate) + 1 no-match resolved per coordinator confirm (the 6 → journey-0027/0028, 2012/2032, 0023/1023, matching the N12 essay mappings; Mai Tiến Vũ (LTV) left unmatched — funded but no care profile).

care/_backups/20260613-pre-trackK-benefits/ → hub min=0 + 6 schedulers paused → re-pull → apply locally → upload DB + 180 ledger md → settle + verify before restore → hub min=1 + resume. Counts: benefit 162→342 (+180), register_entry kind=funding 6→12 (+6), source_doc 228→234 (+6); child 211 / profile_entry 542 / school_report 293 / kind=student 138 unchanged. FK-clean, post-restore re-pull confirmed no clobber.

the deferred handbook-refresh (after K/D) + the audit-A3 redeploy bundle.

Mai Tiến Vũ — cross-school transfer, NOT a missing student — ✅ RESOLVED (live). The one LTV funding line left unmatched was an existing beneficiary: journey-2002 / HS20240022, entered cohort 2024 at THCS Vĩnh Hào (home commune Vĩnh Hào; mồ côi cha), who moved up to THPT Lương Thế Vinh, lớp 10A6 in 2025-26. Root cause (verified): he is the only genuine LTV-roster name absent from the LTV DB (whole-DB diacritic-insensitive scan = exactly 1 record; survey DB = 1 candidate ung-vien-2002 → admitted to journey-2002 → no duplicate, no orphan LTV profile). Our loaders are school-scoped + per-cohort with no promotion/transfer step: when LTV 2025-26 data loaded, "Mai Tiến Vũ" matched nothing in the LTV pool and — because he already existed under Vĩnh Hào — no cross-school link was attempted and no LTV record was created. His 2024-25 Vĩnh Hào học bạ/notes are correct history; only his current school field was stale. Fix applied live (backup _backups/20260613-pre-mtv-move/ → quiesce → verify → restore): child.school_slug Vĩnh Hào→Lương Thế Vinh, class_label 9A→10A6 (cohort_year 2024 + student_code frozen), a coordinator transition profile_entry note, and the LTV scholarship coverage benefit attached (LTV 29→30; benefit 342→343, profile_entry 542→543). Lessons: (1) a school-scoped "no-match" may be a transfer, not an absence — check the whole-DB name first; (2) systemic gap to fix in standing intake / consistency-check (N8): detect "a new-roster name matching an existing child in another school at an adjacent grade-level" → flag as a probable promotion for coordinator confirm, so cross-school progressions are caught automatically; (3) school_slug is a single current-school field — an all-years enrolment history (per-year school) would model transfers without losing the prior-school record.

Track D prep status (read-only done; load pending go-ahead)

The 4 Hoa Sữa NBTL roster files are located (K1–K4 ids in the run log) and the care_roster parser (split-name + header-offset + --name-col/--given-col/ --header-row/--sheet overrides) is shipped. Per-Khóa convention confirmed: codes HVK<k><NNN>, id-block 8<k>00–8<k>99, programme nang-buoc-tuong-lai, province Lào Cai/Sa Pa. K3 is a clean roster (care_roster direct); K1/K2/K4 carry an institutional letterhead (header offset) and K2 splits the name across columns (needs the overrides). Next: --mode dump each (needs --db), set per-file overrides, --mode propose → review NEW vs EXISTS → --mode apply under backup→quiesce→verify. ~81 trainees, new code block 8xxx, all under Nâng bước tương lai.

12.8 Session 8 — COPY-ORIGINALS DIRECTIVE COMPLETE (Step 1) (2026-06-13)

The standing "Foundation must own its readable copies" directive (care-data rule 1) is done. All Step-1 sub-items executed live + verified.

the Cloud Run Drive-write jobs carry the latest tooling, and re-pointed care-migrate-run to it. Added a --doc-type override to file-student-docs (commit eec8ee4, +test) so the SE batch catalogs as se.

~180): Tam Thanh (30) + Đại An (30). Filed live via file-student-docs --doc-type se under backup→quiesce→verify→restore (_backups/20260613-pre-se-copy/): source_doc kind=student 138→198, copy-source written 137→197, doc_type=se 2→62; child 293 unchanged; no clobber. 1 namesake ("Nguyễn An Minh Châu 6B"→journey-0028) forced + coordinator-confirmed.

folders** (loose consolidated sheets only). Lone candidate LTV 24-25 "PHIẾU HỌC TẬP" (27, given-name-only filenames) deferred as unreliable-match.

(bai-viet), each coordinator-confirmed (pattern: filename drops middle "Thị" or a 1-char tone typo; class = current-year vs frozen-entry). kind=student 198→207, copy-source written 197→206; held 19→10 (_backups/20260613-pre-essays/). The remaining 10 held are all consolidated docs (DAY_DU / Tong_hop / Chấm điểm / Thưởng / KQ), zero per-student.

no DB write) moved 80 consolidated docs <năm>/<trường>/Truong/<school>/<năm-học>/; verify re-run = 0 moves left + 6 correctly-unresolved (Hoa sữa ×4 + _ placeholder ×2).

Pre-existing (not introduced here): 11 id-outside-allocation flags = the Tam Thanh-2023→LTV transfer cohort (journey-0013–0023 keep their origin-school id); future consistency-check refinement should exempt transferred children.

Live state after Step 1: child 293, benefit 495, register kind=funding 19, source_doc 310 (kind=student 207), copy-source written 206. Discipline held throughout: backup → quiesce (hub+dashboard min=0 + 6 schedulers paused) → apply → settle → verify → restore (never in the same breath); per-file namesake confirms; job left inert (HMT_MIGRATE_APPLY=0).

12.9 Session 8 cont. — Step 2 bundled redeploy: A1–A5 LIVE (2026-06-13)

funding_source (ensure_register_columns = no-op). Re-ran care_benefit_extract apply --apply over the 6 25-26 plan JSONs (local-apply→upload under backup→quiesce→verify, _backups/20260613-pre-funding-backfill/): the 6 tài trợ 2025-26 rows got amount_vnd (210/210/200/215/240/250M); funding non-null 7→13; benefit 495 / source_doc 310 / kind=student 207 all unchanged. Still NULL (separate follow-up): the 6 Tổng giải thưởng tổng kết prize register rows (their amount is in detail; need their own backfill — out of the 6-funding-JSON scope).

management-hub) + care-dashboard rev 00018-96n (image build + run deploy --image). Both healthy (hub /login 200, / 303, no 500s; the schema columns pre-existed so no "no such column"). The hub deploy also shipped the parallel session's committed UI (693180b`: survey restructure + interactive finance charts).

Giá trị / Nguồn / Biên nhận with the BB receipt drive link + coverage-row text → A2 receipt live. (A1 journal detail+provenance is unit-tested + deployed.)

correct; Drive cosmetic — needs a small Cloud Run folder-rename care_migrate action, none exists yet; must precede handbook-refresh or a duplicate folder is created) + the ~27 Đại An orphan-PDF cleanup (delete Tai-lieu-goc files not in any source_doc.drive_url — needs a Cloud Run delete step).

12.10 Session 8 cont. — Step 2d Drive cleanup + Step 3 regen (2026-06-13)

(rename-student-folder, prune-orphan-docs, trash-file + rename_file/ trash_file on the Drive seam; commits 7282dbf, d61deb7; +tests). Findings:

session's forced SE/essay filings created the correctly-named folder while the old-name folder (with the handbook) lingered. Merged per administrator ("only keep Nguyễn An Minh Châu"): moved the OCR essay Doc into the correct folder (relocate-one), trashed the old-name folder + its regenerable handbook (trash-file). Now a single folder; no data lost.

PDFs from a pre-idempotent re-run; recoverable trash; re-verify = 0 left).

hard lesson: the first run aborted at 88/293 because a concurrent external writer replaced the DB object mid-batch (2 generations in 2 min; care-dashboard was NOT quiesced that window + a parallel session was active). DB stayed intact (handbook-refresh is Drive-write only). Re-ran under a full quiesce (hub + dashboard + 6 schedulers) + administrator-confirmed the other session idle + DB-generation-stable precheck → clean 293, DB object generation unchanged throughout (proof the care jobs don't touch the DB when nothing else writes). index-rebuild wrote 14 Chi-muc/*.md indexes (A4 v2 fields); consistency-check ran (N8/A5 clean per §1e; remaining drift is pre-existing: Hoa Sữa malformed names + 11 transfer id-allocation flags). Backups _backups/20260613-pre-handbook-refresh/ + …-pre-handbook-rerun/. OPERATIONAL LESSON: a care job that only opens the DB (even read-only) is unsafe while ANY other writer (hub, dashboard, scheduler, OR a parallel session) can touch the gcsfuse-mounted DB — quiesce ALL of them + confirm no parallel session + verify DB-generation stability before a long batch.

Still outstanding: Track D Hoa Sữa data-quality (journey-8415+ malformed real_name = DOB concatenated, missing v2 frontmatter — re-parse needed); 6 prize register rows still amount_vnd NULL (separate backfill); Step 5 Track G re-key (own session); Step 6 tidy (§12 resequence + review-hub republish).

12.11 Session 9 — Step 4: care-mail email intake lane + handbook job (2026-06-13)

Step 4 DONE (code + live, conservative scheduling). The email intake lane (§7 channel 3) + a dedicated standing handbook-refresh job.

one --once pass reads a care-documents Gmail label read-only (admin@-minted OAuth token, the SAME secret finance-mail uses), saves each .eml + .yaml evidence to the Shared Drive human band 2_Hoc-sinh/Tai-lieu-vao-email/<year>/ <year-month>/, and drops attachments into the care intake inbox Tai-lieu-vao/<YYYY-Www>/_email/<stem>/ (on the SAME care bucket the care-intake-convert sweep reads). Attachments land with no journey-id folder → the sweep flags them in _needs-review.md (never silently routed). Mailbox + Drive seams + Fakes, dedup ledger, offline selftest, 13 tests. _safe_name rewrites a leading _/. so the sweep can't skip a doc. INERT-safe: with no Gmail creds --once no-ops + exits 0 (quiet cron until provisioned). + cloud/care_mail/ Dockerfile + requirements (own image, Gmail-OAuth deps).

~26h --since cutoff (automation TZ, so the string compare orders right), so a standing run is short and the gcsfuse DB-clobber window stays tiny. A FULL 293-doc refresh stays on-demand under quiesce (the concurrency lesson).

refresh to the shared-image jobs (drive id + incremental + 3600s); deploys the care-mail` Job (own image, Gmail secrets conditional → deploys inert if absent); defines both schedulers (handbook-refresh daily 03:00, care-mail hourly). Quiesce note bumped to reflect the new DB-opening job.

execute, nothing opened the care DB): rebuilt care-jobs (carries the entrypoint change) + built care-mail; deployed care-handbook-refresh + care-mail Jobs. No schedulers registered this run (administrator chose the conservative posture — handbook-refresh stays on-demand; care-mail stays unscheduled until its mailbox exists; both schedules remain defined in deploy.ps1 for the eventual full-provisioning run). care-mail smoke-test:** fetched=0 … inbox=None, exit 0 — the lane is wired + inert.

Gmail label + forwarding filter on the admin@ mailbox; then register care-mail-trigger (hourly). care-intake-convert schedulers already live.

12.12 Session 9 — Step 6 data fixes: prize amounts + Hoa Sữa names (2026-06-14)

Step 6 (a)+(b) DONE + verified live (one backup→quiesce→apply→verify→restore window; the local-apply→upload pattern, no deploy).

The live DB held 144 individual kind=prize rows (each with a parseable thưởng Nđ) + 6 kind=funding "Tổng giải thưởng tổng kết 25-26" pool rows (one per school), all amount_vnd NULL. Backfilled all 150 from detail: individuals each their own amount; pools their doc total. register funding rows w/ amount 13→19; 144 prize rows now carry amounts. Integrity check: 5/6 schools' individual sum == pool total exactly; thcs-tam-thanh is +300,000đ / 1 giải short (pool 5,500,000đ / 24 giải vs 23 individual rows summing 5,200,000đ) — a faithful extraction gap (one 300k prize never captured as an individual row), NOT fabricated → coordinator follow-up (find the 24th Tam Thanh prize). Total prize pool recorded = 33,000,000đ.

had real_name = given-name fragment + DOB concatenated (e.g. 'Phinh 2009-01-01 00:00:00') with date_of_birth NULL. Root cause: the K4 file's 3 roster sheets have different header rows + a column offset, so the original parser kept only the given-name half and leaked the DOB. Fix: re-read the K4 roster via the SA (Drive read), reconstructed each full name (đệm col + given col) + DOB, cross-checked given+DOB against the DB fragment (10/10 unique matches), and updated real_name + date_of_birth + last_updated. e.g. Hùng 2008-12-29Vàng A Hùng (2008-12-29). 0 malformed HVK4 left.

HMT_HANDBOOK_SINCE pinned just below the apply timestamp → regenerated exactly the 10 corrected handbooks (then removed the override to restore the incremental default). The job opened the gcsfuse DB read-only and flushed it back on exit (new generation, content-identical — re-download verified all changes held).

prize-w/-amount 144, malformed HVK4 0, integrity ok. Backup care/_backups/20260614-pre-step6/. DB generation stable post-restore; hub /login 200.

12.7 chronologically-but-not-numerically); the physical renumber was deferred as cosmetic (a 68-line cross-referenced block move) — the dated headers + this note carry the reading order.

12.13 Session 9 — Track G (re-key) EXECUTION PLAN captured (2026-06-14)

Step 5 = the journey_idstudent_code re-key (D19). Planning done this session; execution deferred to its own focused session (it is the single most destructive op + a lockstep code/schema/redeploy change). Two preconditions confirmed ready:

student_code PK, re-points the FK across the 6 dependent tables (benefit, profile_entry, school_report, watchlist_item, register_entry, source_file), drops journey_id, gated by PRAGMA foreign_key_check in one transaction, idempotent, --dry-run default. Selftest green.

uses its own journey-<NNNN> id space + its own media-domain profile store (journey_leakage_check takes profile_path from its caller; no read of the care DB / CARE_ROOT / so-dang-ky). The media registry.sqlite post.journey_id column is unused (all-NULL) and never ATTACH-joined to care. No care tool reads a media journey id and no media tool reads the care key. Dropping the care journey_id does not touch Stream C.

Execution sequence (for the dedicated session):

live child (293) must have a unique non-empty student_code (minted at N5). Copy-test the whole read pipeline (reports/handbook/hub/index/watchlist) against a re-keyed backup, not just care_rekey in isolation.

test hits): rename DB-column reads/writes journey_idstudent_code and the bucket/Drive path stems Ho-so/journey-NNNN<code>; do NOT touch** Stream-C journey-<NNNN> strings (none live in care code). Biggest files: app/main.py (91), care_migrate (62), report_child (46), survey_intake (39), care_intake_convert (36), beneficiary_registry_migrate (34, the schema source — align its SCHEMA/migrations to care_rekey._REKEYED_SCHEMA), safeguarding_watchlist (33), school_doc_extract (31), so_tay (30). Update in clusters; pytest -q tests + selftests after each; fix the ~198 fixture hits.

Tai-lieu-goc/journey-NNNN/<code>/, So-phuc-loi/journey-NNNN/<code>/, and rewrite child.profile_path to the code path. (The Drive human folders are already code-named Ho-so/<mã> — <Tên>/ from v2; only the bucket machine mirror still uses journey-NNNN.md.) The bucket-prefix rename 10_Ho-so-thu-huong→ho-so-thu-huong (D7) is a SEPARATE deferred track — not part of this re-key. care_rekey emits the recode plan but performs no Drive/bucket moves → needs a recode step (a new care_migrate action or a one-off script).

refactored images FIRST, then in ONE quiesced window: backup → care_rekey --apply on the live DB → bucket recode + profile_path rewrite → checkpoint + upload → FK-check + count verify → deploy the refactored hub + care-dashboard + care-jobs together (old code on the re-keyed DB = no such column journey_id 500s) → exercise (hub child page, report_child/report_school/report_cohort, a handbook, index-rebuild, a watchlist run) → restore. Per-deploy go-ahead.

land lockstep with the DB re-key + redeploy. Planning above is the entry point.

12.14 Session 10 — data-consumer verification + report-QA + 2024-cohort recon (2026-06-14)

A live QA pass: confirmed the migration is consumable, then fixed the issues that surfaced. All LIVE (backup→quiesce→apply→verify→restore; care-jobs + care-dashboard rebuilt + redeployed; registers + 71 handbooks regenerated).

(integrity ok, 0 FK violations, 293→310 children all student_code+profile_path, transcripts 293, benefits 495, funding 19 rolling up to 2.59 tỷ, 0 dangling provenance) + exercised report_child (transcript/benefits/receipt/funding/ provenance/transfer notes), report_school (headcount/funding/prizes), and index-rebuild (all children keyed by student_code). Verdict: migration complete + consumable; honest gaps noted (DOB 183/293, 42 roster-only identity-only, pre-v2 source_ref sparse). File-existence half of consistency-check is the standing job's domain.

on the student report: the 3-perspective content was in school_report.payload_json but the petal-3 note detail was empty (a one-off loader bypassed care_reocr.write_se_detail, which fills it for the standing path — that is why Đại An's 30 were fine). Backfilled the 10 detail from the payload + hardened with check_se_notes_have_detail (source-agnostic consistency-check invariant; +test).

never selected amount_vnd, so the school/cohort/handbook funding rollup (guarded by "amount_vnd" in r.keys()) silently rendered nothing. Fixed the query (defensive column check); +regression test.

now show "Tài trợ cấp trường/lứa (Thỏa thuận tài trợ)" (school grant, per năm học) and "Khen thưởng & hỗ trợ học sinh" (year-end prizes) as two separate parts; new is_school_grant_funding/school_grant_by_year classifier EXCLUDES the prize-pool funding rows from the grant total (no double-count vs the per-student prizes). The per-year agreement table now reconciles with the timeline.

total, no per-student split) no longer prints the school's funding on a student report — the per-student "Giá trị" is blank ("—"), filled when an individual amount exists. Also fixed the funding heading keying off with_school (the roster-column flag) → now scope_type, so a school register reads "cấp trường" not "cấp lứa".

Đại An + Nguyễn Bính students were all tagged cohort 2025, though both schools appear in lnquang CETB 2024-2025. Cross-checked both 2024-25 rosters vs the DB (the care_roster matcher): re-cohorted 34 continuing students (Đại An 18 + Nguyễn Bính 16) 2025→2024 (codes left frozen per D19; a provenance note on each), loaded 17 truly-absent 2024-25 students (Đại An 6 + Nguyễn Bính 11) as new cohort-2024 / status=withdrawn records via care_roster.apply_roster, and held 1 namesake (Nguyễn Gia Tuấn Anh — Đại An vs Tam Thanh journey-0045) for the coordinator. Result: the 2024 cohort now spans 5 schools / 94 em (Đại An 24, Tam Thanh 6, Vĩnh Hào 23, LTV 14, Nguyễn Bính 27); child 293→310, 0 dup codes/ids.

child-report. Scratch: Sandbox/_scratch/_step6/. Backups care/_backups/20260614-pre-{step6,se-backfill,build-registers,cohort2024}/.

12.15 Session 11 — Track G (Step 5): the journey_id→student_code re-key, CODE + LIVE CUTOVER (2026-06-14)

The last build/migration item in the plan. Executed as its own focused session.

(HSYYYYNNNN / HVK<k><NNN>) is the single id; journey_id retired. The ~820-hit refactor touched all 27 care tools + app/main.py + app/hub.py + 7 hub templates + every test file; pytest 775 passed, ruff check + format clean. Two identity-minting lifecycles rewritten (survey admission, NBTL roster — the per-school journey-NNNN block allocator → year-monotonic student_code); consistency-check per-school id-block invariant retired → shape-only check_student_code_shape. index_registry.post.journey_id / decisions_core (Stream-C media id) left untouched per the D19 guardrail (one wrong rename caught + reverted by the test suite).

register_entry's A3 funding columns (amount_vnd/petal/funding_source) — row counts matched so it passed silently, but the live cutover would have wiped the 164-row / 2.62 B VND funding roll-up. Now carried + asserted. (Also fixed a care_roster 17-values-for-16-columns INSERT.)

build_jmap, with selftests; cloud/care/jobs/entrypoint.py passes the renamed --student-code flag.

13:27 write) → re-key live DB (--apply) → migrate survey (jmap from the pre-rekey backup) → verify (FK-clean, 311 children, journey_id dropped, funding 164/2.62B, 129/129 candidates) → upload (DB quiescent, no stale -wal) → lockstep redeploy: hub 00119-k97, care-dashboard 00033-pcl (new code), care-jobs new image (9 care-DB jobs re-pinned; consistency-check ran clean = new code reads the re-keyed DB) → schedulers resumed → no clobber confirmed post-swap. Backups gs://[care bucket]/care/_backups/20260614-trackG/.

[bundles bucket] (not hmt-bundles); new code crashes on the OLD DB at boot (HMT_HUB_AUTOSTART reads the care DB) → upload the re-keyed DB BEFORE any new-code instance starts; hub needs min-instances=1 (cold-start too slow for the probe at 0); hub + care-dashboard are manual-trafficupdate-traffic is required, deploy alone does not route; PowerShell shell state does NOT persist across calls → load .env in the same command as the deploy.

per-student folders are still journey-NNNN; recode_bucket must be enhanced to rewrite benefit.ledger_path (it only does profile_path today) before running. A PRE-EXISTING missing-ledger-file drift on the ben-2026-uni-* benefits (uni-funding ledger .md never written by the lnquang load) is unrelated to the re-key.


Internal — review surface. First build + live run executed 2026-06-13 (§12). Sources: 2026-06-12 stocktake (memory project_three-place-stocktake-2026-06-12) · run log _system/runs/care-storage-v2-overnight-2026-06-13.md · rules care-data.md · taxonomy care-data-migration-plan.md Part 2b · schema so-dang-ky.sqlite · handbook so_tay.py. Quỹ Hoa Mặt Trời.