docs: track design docs, plans, audits, coordination infrastructure, handoffs
- 21 docs/*.md: audits, design notes, deployment plans, checklists, memos - Coordination: ROLES, runbooks (DGX reboot, Lea live), patches, registre, syntheses, systemd, QG template - Handoffs: 6 Codex handoff documents + README + template
This commit is contained in:
145
docs/CHECKLIST_DGX_PRE_CLINIQUE.md
Normal file
145
docs/CHECKLIST_DGX_PRE_CLINIQUE.md
Normal file
@@ -0,0 +1,145 @@
|
||||
# CHECKLIST DGX — Contrôle avant installation clinique
|
||||
|
||||
- `Auteur`: Qwen
|
||||
- `Date`: 2026-06-19
|
||||
- `Version`: v1 — à vérifier point par point avant déploiement site clinique
|
||||
|
||||
---
|
||||
|
||||
## 1. SERVICES — Tous démarrent au reboot
|
||||
|
||||
| # | Service | Port | Statut attendu | Check |
|
||||
|---|---|---|---|---|
|
||||
| 1.1 | rpa-streaming | 5005 | `health=200` | `curl http://127.0.0.1:5005/health` |
|
||||
| 1.2 | rpa-vision-v3-dashboard | 5001 | `401 sans creds, 200 avec creds` | `curl -u lea:<password> http://127.0.0.1:5001/api/system/status` |
|
||||
| 1.3 | rpa-vision-v3-vwb-backend | 5002 | `401 LAN, 200 loopback` | `curl http://127.0.0.1:5002/health` puis `curl http://192.168.x.x:5002/health` |
|
||||
| 1.4 | rpa-agent-chat | 5004 | `200` | `curl http://127.0.0.1:5004/api/status` |
|
||||
| 1.5 | rpa-vision-v3-api | 8000 | `fermé LAN` | `curl http://192.168.x.x:8000` → timeout/refused |
|
||||
| 1.6 | rpa-vision-v3-vwb-frontend | 3002 | `200` | `curl http://127.0.0.1:3002` |
|
||||
| 1.7 | rpa-vision-v3-stream-worker | 5099 | `running` | `systemctl status rpa-vision-v3-stream-worker` |
|
||||
| 1.8 | rpa-vision-v3-worker | — | `running` | `systemctl status rpa-vision-v3-worker` |
|
||||
| 1.9 | rpa-firewall | — | `active (exited)` | `systemctl status rpa-firewall` |
|
||||
| 1.10 | Dashboard systemd | 5001 | **service system ACTIF** (pas fallback user) | ✅ **VALIDÉ reboot 20/06** — system service active, fallback user masked |
|
||||
|
||||
**Check reboot** : `systemctl list-units --type=service | grep rpa` → tous `active running` ou `active exited`
|
||||
|
||||
---
|
||||
|
||||
## 2. RÉSEAU — Ports sensibles fermés LAN
|
||||
|
||||
| # | Port | Risque | Statut attendu | Check |
|
||||
|---|---|---|---|---|
|
||||
| 2.1 | 5900 (VNC GNOME) | Remote desktop | **LAN fermé, loopback OK** | `nmap 192.168.x.x -p 5900` → filtered/closed |
|
||||
| 2.2 | 5902 (VNC VM Windows) | Remote desktop VM | **LAN fermé, tunnel SSH only** | `nmap 192.168.x.x -p 5902` → filtered/closed |
|
||||
| 2.3 | 3389 (RDP/xrdp) | Remote desktop | **LAN fermé** | `nmap 192.168.x.x -p 3389` → filtered/closed |
|
||||
| 2.4 | 22220 (SSH VM Windows) | Shell VM | **LAN fermé** | `nmap 192.168.x.x -p 22220` → filtered/closed |
|
||||
| 2.5 | 8000 (API upload) | API non protégé | **LAN fermé** | `nmap 192.168.x.x -p 8000` → filtered/closed |
|
||||
| 2.6 | 11434 (Ollama) | Modèles IA | **LAN fermé** | `nmap 192.168.x.x -p 11434` → filtered/closed |
|
||||
| 2.7 | 5002 (VWB backend) | Données workflows | **LAN : auth requise (401)** | `curl http://192.168.x.x:5002/api/workflows/` → 401 |
|
||||
| 2.8 | 5004 (Agent chat) | Chat interface | **À arbitrer** — ouvert ou fermé ? | Décision Dom |
|
||||
| 2.9 | 3002 (VWB frontend) | Interface web | **À arbitre** — ouvert ou fermé ? | Décision Dom |
|
||||
|
||||
---
|
||||
|
||||
## 3. SÉCURITÉ — Authentification + accès
|
||||
|
||||
| # | Item | Statut attendu | Check |
|
||||
|---|---|---|---|
|
||||
| 3.1 | Dashboard Basic Auth | `401 sans creds` | `curl http://192.168.x.x:5001/api/system/status` → 401 |
|
||||
| 3.2 | VWB Basic Auth | `401 LAN, 200 loopback` | Vérifié ✅ (commit cf81ce4c7) |
|
||||
| 3.3 | Streaming Bearer Auth | `401 sans token` | `curl http://127.0.0.1:5005/api/v1/...` → 401 |
|
||||
| 3.4 | SSH clé uniquement | Pas de password login | `grep PasswordAuthentication /etc/ssh/sshd_config` → no |
|
||||
| 3.5 | Firewall persistant reboot | Ports fermés après reboot | ✅ **VALIDÉ reboot 20/06** — ports sensibles filtrés, services ouverts OK |
|
||||
| 3.6 | RPA_SIGNING_KEY défini | FAISS metadata valide | ⚠️ **À FIXER** — HMAC mismatch, Option A en attente |
|
||||
|
||||
---
|
||||
|
||||
## 4. VM WINDOWS — Autostart + stabilité
|
||||
|
||||
| # | Item | Statut attendu | Check |
|
||||
|---|---|---|---|
|
||||
| 4.1 | VM boot auto au reboot DGX | Service systemd user `aivanov` | ✅ **VALIDÉ reboot 20/06** — `win11-arm-lea.service` auto-démarre, linger=yes |
|
||||
| 4.2 | VM accessible VNC | Tunnel SSH `localhost:5902` | Vérifié ✅ |
|
||||
| 4.3 | VM ne pas libvirt en parallèle | Pas de conflit disk.qcow2 owner | ⚠️ **À DOCUMENTER** — ne pas lancer libvirt VM |
|
||||
| 4.4 | disk.qcow2 owner = aivanov | Pas libvirt-qemu | `ls -la disk.qcow2` → aivanov:aivanov |
|
||||
| 4.5 | swtpm lancé par script | Pas manuel | Script standalone gère swtpm ✅ |
|
||||
| 4.6 | Léa config.txt pointe DGX | Pas cloud URL | `cat config.txt` → DGX IP |
|
||||
|
||||
---
|
||||
|
||||
## 5. DONNÉES — Persistence + integrity
|
||||
|
||||
| # | Item | Risque | Statut attendu | Check |
|
||||
|---|---|---|---|---|
|
||||
| 5.1 | workflows.db | 24 workflows live | `curl -u lea:<pw> http://127.0.0.1:5001/api/workflows | jq '.total'` → 24 |
|
||||
| 5.2 | FAISS index | 13666 vectors | `curl ... /api/knowledge-base/stats | jq '.vectors_indexed'` → 13666 |
|
||||
| 5.3 | FAISS metadata HMAC | Test endpoint 200 | ⚠️ **À FIXER** — Option A (resigner) |
|
||||
| 5.4 | Sessions training | Non trackées git → safe au reset | `ls data/training/sessions/` |
|
||||
| 5.5 | Git aligné | HEAD = dernier commit P0 | `git log -1` → cf81ce4c7 |
|
||||
| 5.6 | workflows.db préservé au git reset | Backup avant reset | ⚠️ **Procédure à respecter** |
|
||||
|
||||
---
|
||||
|
||||
## 6. STABILITÉ — Test reboot (✅ exécuté en réel le 2026-06-20)
|
||||
|
||||
| # | Item | Check | Résultat | Verdict |
|
||||
|---|---|---|---|---|
|
||||
| 6.1 | Reboot DGX | Coupure secteur 02:07 | 9 services reviennent | ✅ PASS |
|
||||
| 6.2 | VM Windows auto-start | `win11-arm-lea.service` | VM auto-démarre | ✅ PASS |
|
||||
| 6.3 | Firewall persisté | Ports après reboot | Sensibles filtrés, services ouverts | ✅ PASS |
|
||||
| 6.4 | Dashboard systemd | Après reboot | System service actif, user fallback masked | ✅ PASS |
|
||||
| 6.5 | Worker healthy | Après reboot | PID 2267 actif, last_cycle continu | ✅ PASS |
|
||||
| 6.6 | **IP DHCP dérive** | `.45` → `.46` | IP statique `.45` appliquée (Dom) | ⚠️ **G1 — IP statique obligatoire clinique** |
|
||||
| 6.7 | **OVMF corruption VM** | Coupure brutale | OVMF corrompu, récupération manuelle (Codex) | ⚠️ **G2 — auto-réparation OVMF à implémenter** |
|
||||
| 6.8 | **Léa guest reconnecte** | config.txt | CONFIGURE_ME, pas DGX | ⚠️ **G4 — config.txt à renseigner** |
|
||||
|
||||
---
|
||||
|
||||
## 7. PRÉ-REQUIS DSI (envoyés à Nicolas PORQUET)
|
||||
|
||||
| # | Item | Statut | Check |
|
||||
|---|---|---|---|
|
||||
| 7.1 | Proxy HTTPS | À installer clinique | Architecture validée |
|
||||
| 7.2 | Docker | À installer | — |
|
||||
| 7.3 | VLAN isolation | À configurer | — |
|
||||
| 7.4 | SSH clé uniquement | ✅ Configuré DGX | `PasswordAuthentication no` |
|
||||
| 7.5 | 100% on-premise | ✅ Aucune cloud call | Vérifier config Léa |
|
||||
| 7.6 | Pas de secrets exposés | ✅ .env.local permissions | `ls -la .env.local` → 600 |
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ ITEMS À FIXER AVANT CLINIQUE
|
||||
|
||||
1. **Dashboard fallback user** → ✅ **FIXÉ 20/06** (mask persistant, system service actif)
|
||||
2. **Auto-start VM** → ✅ **VALIDÉ 20/06** (reboot réel prouvé)
|
||||
3. **FAISS Option A** → ✅ **FIXÉ 19/06** (metadata resigné, 13666 vectors, test success=true)
|
||||
4. **Git DGX aligné** : DGX sur ec1fb81, cible cf81ce4c7 → aligner avec backup workflows.db
|
||||
5. **Test reboot** → ✅ **exécuté en réel 20/06** (5 PASS, 3 gaps identifiés)
|
||||
6. **G1 Dérive IP DHCP** : IP statique labo `.45` OK ; clinique = Ethernet `.178` obligatoire
|
||||
7. **G2 Auto-réparation OVMF** : snapshot sain au boot + restauration auto si TianoCode loop → **À IMPLÉMENTER**
|
||||
8. **G4 Léa reprise auto** : config.txt persistant DGX + token + auto-login → **À RENSEIGNER**
|
||||
|
||||
---
|
||||
|
||||
## Commandes smoke rapide (à lancer sur DGX)
|
||||
|
||||
```bash
|
||||
# Services
|
||||
systemctl list-units --type=service | grep rpa
|
||||
|
||||
# Health endpoints
|
||||
curl -s http://127.0.0.1:5002/health
|
||||
curl -s http://127.0.0.1:5005/health
|
||||
curl -s -u lea:v_zhmqOpGYcR-t7xJFKZyW-LjpvBuOOKss0ZleyH4jQ http://127.0.0.1:5001/api/system/status | jq '{workflows_count,status}'
|
||||
curl -s -H "Authorization: Bearer o3_LHqV_7_Gc6OVPHndhsBbvG6HJ5PCgl8yIBhGUIz8" http://127.0.0.1:5005/api/v1/traces/stream/processing/status | jq '{status,processing_ready}'
|
||||
|
||||
# Firewall LAN
|
||||
nmap 192.168.1.45 -p 5900,5902,3389,22220,8000,11434
|
||||
|
||||
# VM
|
||||
virsh -c qemu:///system list # doit être VIDE (standalone, pas libvirt)
|
||||
ps aux | grep qemu-system-aarch64 | grep win11
|
||||
|
||||
# Git
|
||||
cd ~/ai/rpa_vision_v3 && git log -1 --oneline
|
||||
```
|
||||
Reference in New Issue
Block a user