main
ralph
spawn
charlie
#charlie or #2_gcc). The trigger channel doesn't limit the output channel.#2_gcc, infrastructure output goes to #charlie. This is an opt-in behavior — needs explicit instruction or a routing rule.opus
work
firma
💡 Team Vision
Ben is the daily interface — Discord #ben for everything. Charlie and Ralph are the operational backbone. Telegram agents are specialists: pull in Ring Opus for heavy thinking, Work for isolated BA context, Firma for company admin. Any agent can write to any channel — routing is flexible. Future: local models on Link + Mac mini for overnight processing and GPU-heavy tasks.
🧠 How It Works
- Each session starts fresh — files are continuity
- Daily notes capture raw events, decisions, findings
- MEMORY.md = distilled long-term knowledge
- Periodic heartbeat reviews consolidate daily → long-term
- All 4 Telegram bots share the same workspace + memory files
🌙 Planned: Overnight Memory Agent
Run a local model (Llama 3.2 / Qwen 7B) on the Mac mini overnight to review daily notes, consolidate MEMORY.md, and clean up stale entries — zero API cost.
0 8,12,18 * * * · Europe/Vienna · Agent: Charlie (Haiku)0 8 * * * · Europe/Vienna · Channel: Telegram0 7 * * * · Europe/Vienna · Channel: DiscordloadHere.md with work activity (noon run) — work-only entries → Discord #general0 12 * * * · Europe/Vienna · Channel: DiscordloadHere.md with work activity (evening run) — work-only entries → Discord #general0 18 * * * · Europe/Vienna · Channel: Discord0 18 * * * · Europe/Vienna · Channel: DiscordPlan: macOS Screen Sharing + Tailscale mesh VPN
What's done: Nothing yet — both steps require sudo/admin and couldn't run headlessly.
TODO (Iggy does manually):
1. System Settings → General → Sharing → toggle Screen Sharing ON
2. Install Tailscale on Mac mini — Mac App Store (search "Tailscale") OR Terminal:
brew install --cask tailscale3. Open Tailscale → Log in (tailscale.com, free account)
4. Install Tailscale on MacBook + Link PC (same account)
5. From MacBook: use built-in Screen Sharing app → connect to Mac mini's Tailscale IP
6. From Link (Windows): install RealVNC Viewer (free) → connect to Mac mini's Tailscale IP
0 7 * * * · Europe/ViennaLast run: timed out (120s limit exceeded)
Consecutive errors: 1
Delivery: → Discord #daily-digest
0 12 * * * · Europe/ViennaLast run: completed but delivery failed
Consecutive errors: 2
Delivery: → Discord #general
0 18 * * * · Europe/ViennaLast run: delivery failed repeatedly
Consecutive errors: 9
Delivery: → Discord #general
| Job | Schedule | Model | Last Run | Status |
|---|---|---|---|---|
| 🔍 Tunnel Watchdog | Every 5 min | Haiku 3.5 | ~12s | ● OK |
| ☀️ Morning Brief | Daily 08:00 | Sonnet | ~285s | ● OK |
| 📧 CDOTTZ Email (08:00) | Daily 08:00 | Haiku 3.5 | ~11s | ● OK |
| 📧 CDOTTZ Email (12:00) | Daily 12:00 | Haiku 3.5 | ~13s | ● OK |
| 📧 CDOTTZ Email (18:00) | Daily 18:00 | Haiku 3.5 | ~12s | ● OK |
| 📝 Auto Load (18:00) | Daily 18:00 | Sonnet | ~48s | ● OK |
| 🔍 Ralph Repo Monitor | Mon 09:00 | Sonnet | ~18s | ● OK |
| 🔍 Google Docs API Check | Every 14 days | Sonnet | ~119s | ● OK |
🤖 Charlie's Daily Monitoring Sweep
Every weekday at 09:00, Charlie runs a monitoring sweep: lists all cron jobs, checks for errors, compares against the dashboard calendar, and reports any discrepancies. If everything is healthy, he stays silent. If something's broken, he alerts in #charlie.
Thinking + non-thinking modes · Agentic tool use · 100+ languages
VRAM: ~14GB active (Q4) — fits RTX 5090 easily · ~130GB full model load for Mac Studio
Best-in-class agentic capabilities · Tool use, reasoning, autonomous problem-solving
VRAM: ~20GB active (Q4) — fits RTX 5090 · ~550GB full model → needs 512GB Mac Studio
Competitive with o1 on math/code benchmarks
VRAM: ~22GB active (Q4) — fits RTX 5090 · ~370GB full → fits 512GB Mac Studio
GPT-4 class · Largest truly open-weight dense model
VRAM: ~230GB Q4 — needs Mac Studio 512GB (won't fit any single GPU)
💡 Why MoE Changes Everything
MoE (Mixture of Experts) models have huge total parameter counts but only activate a fraction per token. This means:
- Speed: Only 22-37B params compute per token → fast inference
- Quality: 1T total params = the model has vastly more knowledge stored
- Catch: ALL parameters must be loaded into memory, even though only a few activate. So you need the RAM to hold the full model.
- RTX 5090 (32GB): Can only hold active params → runs MoE models at "small model" speed but loses quality of full model (offloading to system RAM kills speed)
- Mac Studio 512GB: Loads the ENTIRE model → full quality at moderate speed
Bandwidth: 1,792 GB/s
Price: ~€2,000
Max model: Qwen 2.5 32B (Q4), Qwen 3 22B active
Speed: ⚡⚡⚡⚡⚡ Fastest
Best for: Interactive tasks, image gen, Whisper
Bandwidth: 819 GB/s
Price: ~€10,000+
Max model: Kimi K2 1T, Llama 405B, DeepSeek-R1 671B
Speed: ⚡⚡⚡ Moderate
Best for: Overnight batch, full MoE models, zero API cost
Bandwidth: 960 GB/s
Price: ~€7,000
Max model: 32B Q8 or 70B Q3
Speed: ⚡⚡⚡⚡ Fast
Best for: Bigger models than 5090, still fast
Bandwidth: ~3,584 GB/s combined
Price: ~€4,500 (GPUs) + PC
Max model: 70B Q4 across both GPUs
Speed: ⚡⚡⚡⚡ Fast (with tensor parallelism)
Best for: 70B models at high speed
📊 What Makes Sense?
| Use Case | Best Hardware | Why |
|---|---|---|
| Interactive daily use | RTX 5090 ✅ | Already own it, fastest for ≤32B |
| Overnight memory agent (light) | Mac mini ✅ | Already own it, 7B model is enough |
| Overnight deep reasoning (70B+) | Mac Studio 512GB | Silent, low power, huge model capacity |
| Full Kimi K2 (1T params) | Mac Studio 512GB | Only option that holds 550GB+ model |
| Fast 70B interactive | 2× RTX 5090 | 3.5 TB/s bandwidth, 70B fits in 64GB |
| Image gen / Whisper | RTX 5090 ✅ | CUDA optimized, fastest option |
🍎 The 512GB Sweet Spot
- Kimi K2 (1T/32B active) — Best agentic model, fits fully loaded (~550GB with Q4). Would need tight quantization.
- DeepSeek-R1 (671B/37B active) — Best open reasoning model. Fits comfortably at Q4 (~370GB).
- Llama 3.1 405B — Dense, all params active. ~230GB Q4. Fits easily. GPT-4 class.
- Qwen 3 235B (full load) — ~130GB Q4. Could run alongside Whisper + other tools simultaneously.
- Multiple models at once: Load Qwen 3 + Whisper + embedding model all in memory
⚡ Speed Reality Check
819 GB/s bandwidth on M3 Ultra. For a 405B Q4 model (~230GB):
- ~3.5 tokens/sec — readable but not instant
- Good enough for: overnight batch processing, long analysis, memory consolidation
- Not great for: interactive chat, real-time responses
- Waiting for M4 Ultra? Likely ~1,000+ GB/s → ~4.5 tok/sec, meaningful improvement
💰 Cost Comparison (once, not recurring)
- Mac Studio M3 Ultra 512GB: ~€10,000-12,000
- 2× RTX 5090 PC build: ~€6,000-7,000 (but only 64GB VRAM, can't run 405B)
- RTX 6000 Ada (48GB): ~€7,000 for GPU alone (still only 48GB)
- API cost equivalent: At ~$15/M tokens (Opus), €10k buys ~600M tokens. If you burn 1M tokens/day = 600 days of API. Mac Studio pays for itself in ~2 years of heavy use.
Storage: ~/Documents/meeting-copilot/ on Mac Mini
Collaborator on think-ai-link/0_CORE · Agent branch only
Git identity: ben.machina / ben.machinanode@gmail.com
Used for morning briefs, reports, service registrations.
Link (Private PC) — RTX 5090 · OBS · Cursor · Elgato
Work laptop — Jira · Confluence
Free for personal use. Not yet configured.
Forwarded to: ben.machina@agentmail.to
Trigger: Sender match + has attachment
Naming: {prefix}_YYYYMMDD.pdf
Prefixes: izvod / ira / ura / ura_priv / putni
| Trusted Sender | Drive Folder | Type | Naming | Status |
|---|---|---|---|---|
| kdi@pbz.hr | CDOTTZ / 0_izvodi | Bankovni izvodi | izvod_YYYY_NNN.pdf | ✅ Live |
office@cdottz.com + body: i |
CDOTTZ / 01_IRA | IRA — izlazni računi | ira_YYYYMMDD.pdf | ✅ Live |
office@cdottz.com + body: u |
CDOTTZ / 02_URA | URA — ulazni računi | ura_YYYYMMDD.pdf | ✅ Live |
office@cdottz.com + body: p |
CDOTTZ / 03_putniNalog | Putni nalozi — travel orders | putni_YYYYMMDD.pdf | ✅ Live |
office@cdottz.com + body: up |
CDOTTZ / 02_URA_PRIV | URA PRIV — private incoming invoices | ura_priv_YYYYMMDD.pdf | ✅ Live |
| Received | Processed | Subject | Folder | File | Status | Drive | |
|---|---|---|---|---|---|---|---|
| ⏳ Loading... | |||||||