The Foundation's student data lives in a coordinator's personal Google Drive archive ("Thông tin học sinh"), outside the Foundation Shared Drive, across 3 school years — uploaded by year, then by school, with many consolidated files (a roster / KQ sheet / SE batch that covers a group of students, not one).
Over 2026-06-09/10, prior sessions already loaded 142 children into the live care store via one-off scratch scripts. That work is real but ad-hoc: the raw source still sits outside the Foundation Drive, provenance back to source documents is partial, there are known unresolved gaps (phantom students, grade-label mismatches), and there is no documented repeatable process.
This plan does two things, in order:
HSYYYYNNNN — already implemented; already names per-student Drive folders.partner_school, child, benefit,
profile_entry, school_report, watchlist_item, ghi_danh.
profile_entry already carries provenance: source, by_person,
raw_link, source_ref. No register tables yet.2_Hoc-sinh/Ho-so/<mã> — <Tên>/, idempotent.…/Bao-cao/.2_Hoc-sinh/
vs internal machine store. This is the "two spaces" rule — already the architecture; this plan fills the gaps.| Rule | Gap |
|---|---|
| 1 — two spaces (human Doc/Sheet/PDF + machine md/sqlite) | Architecture exists, but raw source docs are not in the Foundation Drive and have no machine md/txt mirror. |
| 2 — consolidated → per-student extraction + journal records entry+source; a folder for consolidated docs | Extraction happened and source_ref exists, but there is no consolidated-docs Drive folder and no catalog linking a journal entry to its consolidated source file. |
| 3 — student docs in student folder + bucket md + key details to sổ tay; reports full-review, early→latest | Per-student folders exist; but no per-student source-doc subfolder, and the handbook + child report order the journal newest-first (the reverse of early→latest). |
| 4 — school + cohort register | No school/cohort register exists; reports aggregate per-child each run. |
| 5 — migrate, reuse existing data | Done ad-hoc via scratch scripts; needs one documented idempotent tool + gap reconciliation. |
New rule file .claude/rules/care-data.md — the canonical, agent-facing reference
for student-data storage. Codifies:
.md/.txt/.sqlite).2_Hoc-sinh/Tai-lieu-tong-hop/ (human)
+ a .md/.txt extraction (machine), cataloged in a new source_doc table. Each
per-student fact pulled from it is written to that student's sổ tay as a journal entry whose source_ref
= the source-doc id/section. The journal records both the entry and its source.2_Hoc-sinh/Ho-so/<mã> — <Tên>/Tai-lieu-goc/ (human) + a machine md/txt mirror; key details →
sổ tay. Reports do a full record review and present every time-series chronologically early→latest.register_entry log) that accretes high-level events (prizes, hardships,
milestones, headcount/funding roll-ups) as documents arrive. Reports read the register and roll up
per-student data.Cross-references: a pointer in CLAUDE.md §Pointers and a short §16 "Storage &
migration rules" in the beneficiary-care subplan (which stays the design-of-record).
2_Hoc-sinh/2_Hoc-sinh/
Ho-so/<mã> — <Tên>/ # per-student folder (EXISTS)
So-tay · <mã> · <Tên> # the handbook Doc (EXISTS)
Tai-lieu-goc/ # NEW: this student's source docs (rule 3)
Tai-lieu-tong-hop/ # NEW: consolidated/group source docs (rule 2)
<school-year>/<school>/…
So-tay-truong/<school> # NEW: per-school register Doc (rule 4)
So-tay-lua/<programme>-<year> # NEW: per-cohort register Doc (rule 4)
Bao-cao/<YYYY>/… # generated reports (EXISTS)
Khao-sat/<ung-vien>/… # candidate surveys PRE-admission only
Tai-lieu-goc/; group sheet → Tai-lieu-tong-hop/). The Tai-lieu-so-hoa-OCR/ band is retired — its 41 Docs (all per-student) relocate into student folders.
Survey report + information is student-specific selection data: once a candidate is admitted, the survey Doc + key details move into that student's Tai-lieu-goc/ + sổ tay; Khao-sat/ keeps only un-promoted candidates. The "AI transcription" fact lives on source_doc, not a folder name.so-dang-ky.sqlite # registry (the queryable shape) Ho-so/journey-NNNN.md # canonical profile (EXISTS) So-phuc-loi/journey-NNNN/… # benefits ledger (EXISTS) Tai-lieu-vao/<week>/<journey>/… # weekly intake inbox (EXISTS) Tai-lieu-tong-hop/ # NEW: .md/.txt extraction of each consolidated doc Tai-lieu-goc/journey-NNNN/ # NEW: .md/.txt mirror of student-specific docs
The ad-hoc loads added one document type at a time. To stop retrofitting, every source doc is typed (source_doc.doc_type) and classified by scope (per-student → rule 3 / consolidated → rule 2 / multi-scope → register, rule 4), with a routing target and a petal anchor known up front. Only the four highest-volume, recognisable types have structured extractors today (hoc-ba, so-diem, chuyen-can, hop-phu-huynh); we add a new one only where volume + shape justify it.
Sources (profile_entry.source, unchanged): school | coordinator | student | family | partner.
| Loại tài liệu | doc_type | Routing → target | Cánh |
|---|---|---|---|
| Học bạ / phiếu liên lạc | hoc-ba | extractor → school_report | 1 |
| Kết quả thi (vd. Đại An) | ket-qua-thi NEW extractor | → school_report | 1 |
| Bài viết / bài dự thi | bai-viet | prose → profile_entry | 4 |
| Đánh giá cảm xúc-xã hội (SE) | se NEW extractor | → profile_entry (+school_report) | 3 |
| Giấy khen / danh hiệu | giay-khen NEW | prose → profile_entry + register | varies |
| Hồ sơ khảo sát / ứng viên | khao-sat | survey → student folder on admission | 1 |
| Báo cáo thăm gia đình / chọn trường hợp | tham-gia-dinh | prose → profile_entry | 1 |
| Giấy khai sinh / tùy thân (nhạy cảm) | giay-to | identity → child (DoB) | — |
| Xác nhận hộ nghèo / hoàn cảnh (nhạy cảm) | hoan-canh | background + register hardship | 1 |
| Thư cảm ơn / thư gia đình | thu | context-link | varies |
| Biên nhận / ảnh trao quà | bien-nhan | → benefits ledger / benefit | 1 |
| Giấy chuyển trường / tốt nghiệp | chuyen-tot-nghiep | child status + register milestone | — |
Tai-lieu-tong-hop/ + per-student extraction)| Loại tài liệu | doc_type | Routing → target | Cánh |
|---|---|---|---|
| Danh sách lớp / roster | roster | → child rows | — |
| Sổ điểm / bảng điểm tháng | so-diem | extractor (multi-child) → school_report | 1 |
| Bảng xếp hạng / KQ học kỳ | xep-hang | extractor → school_report | 1 |
| Báo cáo chuyên cần lớp | chuyen-can | extractor → school_report | 1 |
| Báo cáo họp phụ huynh (nhiều em) | hop-phu-huynh | extractor → school_report | 1/3 |
| Biên bản sinh hoạt kỹ năng hàng tháng | sinh-hoat-ky-nang | prose → per-student entries | 2/3 |
| Tham quan / trải nghiệm / doanh nghiệp | tham-quan | prose → per-student entries | 6/7 |
| Kết quả thể thao / nhảy dây | the-thao | prose → per-student entries | 5 |
| Báo cáo công tác cộng đồng | cong-dong | prose → per-student entries | 8 |
| Phiếu điểm danh / tham gia hoạt động | tham-gia | list → per-student entries | varies |
| Biên bản nhận tài trợ (school-level funding receipt) | bien-ban-tai-tro | school folder + register → fan out per-student benefit rows (funding_source links donor/finance) | 1 |
Multi-scope (rule 4 → register_entry): a school/cohort aggregate prize, an event recap, a hardship affecting a group, and the headcount/funding roll-ups — written to the school or cohort register.
3_Nha-tai-tro/ + 4_Tai-chinh/ + the finance registry), handled by existing finance/donor tooling, not the care store. bien-ban-tai-tro is the bridge: one school-level funding receipt creates per-student benefit rows on the care side and ties to a donor contribution on the finance side via funding_source. This plan owns only the care side of that link.Build implication: extend the school-doc extractor with three new schemas (ket-qua-thi, se, giay-khen) + their intake routing tokens. Sensitive types (giay-to, hoan-canh, health) honour subplan §3: the source doc is stored, but identifying medical / precise-circumstance detail is not copied into the profile body — only the care-relevant fact, scoped.
Four ways a document arrives; all funnel into one classify → route → extract → file pipeline; anything unmatched goes to a triage queue, never silently filed (the flag-on-failure discipline already built).
ingest_watch): same triage.onFormSubmit doorbell already knows the form's context (which candidate/student/event), moves the files into the matching folder + machine store tagged with that context, then queues extraction — no guessing.Routing/association: explicit metadata (hub pick or form context) → deterministic; else a filename token (<journey-id|student_code>__<doc-type>); a consolidated doc fans out per student via the roster name-matcher. Unmatched / low-confidence / multi-child-needing-split → the triage queue (a hub screen), never silently filed.
Extend the registry migration tool — following the existing IF NOT EXISTS / additive-column discipline.
source_doc — catalog of every source filedoc_id (PK, src-YYYY-NNNN) · title · kind (consolidated|student) doc_type (Part-2b taxonomy: hoc-ba|se|giay-khen|…) · scope_type · scope_key school_year · drive_url (human copy) · bucket_path (machine extraction) original_name · transcription (1 = AI/OCR) · added_at · added_by
The journal's source_ref and the school-report's source field reference doc_id.
register_entry — one lightweight log for both register kindsreg_id (PK) · scope_type (school|cohort) · scope_key · entry_date kind (prize|hardship|milestone|headcount|funding|note) · detail journey_id (nullable) · source_doc_id (nullable) · by_person · created_at
source_doc + register_entry (tables, index, helper fns); extend the selftest.source_ref → source_doc (traceable to its source); the school + cohort reports read the
standing register_entry log (prizes/hardships/milestones) and roll up per-student data;
and the two register compilers get the same delivery path as reports — a one-click hub trigger + scheduled run,
tracked in report_index, upserted as native Docs.register_entry event timeline), upserted to
So-tay-truong/ and So-tay-lua/ via the existing Drive seam.tools/care_migrate.py (NEW; supersedes the scratch one-offs):
one documented idempotent CLI with --dry-run / --selftest and subcommands
copy-source (copy the personal-Drive archive into the Foundation Drive + catalog it),
backfill-provenance (link the loaded 142 to their source docs), reconcile (apply
held-back fixes from a coordinator decisions manifest), build-registers (seed
register_entry + emit register Docs).From the administrator's output notes. The current tools cover part of this; the new cuts are the school-year report, the all-programme Tổng hợp report, the grade-level (khối 6–12) breakdown, and window as a first-class parameter. Note lứa (cohort = programme-entry year) ≠ năm học (academic year) — both are reportable. Window: tu-dau (lifetime) · nam (trailing 12 mo) · nam-hoc (one academic year) · latest.
| Báo cáo | Tool | Window | Breakdown | Content / audience |
|---|---|---|---|---|
| Bài viết cho media | Area-2 pipeline | — | per event/student | dignified post copy · public (via /approve) |
| BC theo trường | report_school | từ đầu | one school | SL hs · tiền tài trợ · thời gian tham gia · nhận xét chung · principal/trustee |
| BC theo lứa | report_cohort | từ đầu | school × khối 6–12 | + headcount/funding/time/remarks · trustee |
| BC theo năm học | report_cohort (+nam-hoc — NEW) | một năm học | school × khối | as cohort · trustee |
| BC tổng hợp | NEW report_overall | từ đầu; 1 năm | all schools × all programmes | as above, every programme · trustee/board |
| BC theo học sinh | report_child | từ đầu; 1 năm | one student | full record review · nhà tài trợ (sponsor) |
| BC theo học sinh (talking points) | report_child (latest) | latest | one student | what's new + "Gợi ý đồng hành" · CTV (coordinator) |
Build deltas: --window first-class on child/cohort (adds nam-hoc, latest); khối grouping in cohort/năm-học/overall (derive current khối from latest học bạ, already done in the banner); a thin Tổng hợp report across all programmes; the student talking-points variant = report_child latest + the already-built coordinator guidance block. "Bài viết cho media" stays the Area-2 output, listed for completeness.
Per the known write-race discipline: quiesce before any machine-store write, back up first, verify after.
2_Hoc-sinh/.copy-source --dry-run → eyeball consolidated-vs-student classification → execute; populate source_doc.backfill-provenance over the 142.reconcile against the coordinator decisions manifest (phantom-student call is the coordinator's).build-registers → register_entry + School/Cohort Docs.reconcile manifest); this plan wires the mechanism, not the verdict.