Journal of AI, ML & Cloud Architecture
Latent Thoughts
AI, ML, cloud architecture, and engineering — decoded.
/
All entries
16 posts
Fine-Tuning Qwen3.5 Instruct on SageMaker: 8 Cells, 1 OOM, 6 Gotchas
The Instruct recipes carried over from Base with one line changed. Sizing the GPUs didn't — 4xL40S OOMs on 9B full SFT, and g7e.12xlarge is the cheapest box that actually fits.
JEPA: The Architecture Behind LeCun's Vision for World Models
JEPA predicts in latent space, not pixel space. That one difference underpins Yann LeCun's entire blueprint for machines that learn world models, plan hierarchically, and reason by simulation.
Three Ways to Run an Agent on AWS: AgentCore Runtime, AgentCore Harness and OpenAI Managed Agents
AWS now offers three distinct paths to deploy production agents: bring your own code (Runtime), configure a managed loop (Harness), or use OpenAI's optimized agent orchestration (Managed Agents). Here's when to choose which — and what each costs.
One Blackwell GPU Beats Four L40S: Benchmarking Qwen3.6-27B on SageMaker
A single $2.49/hr RTX PRO 6000 Blackwell GPU delivers 44.82 req/s — 2.2x faster and 14x cheaper than four L40S GPUs at $15.68/hr. We benchmark Qwen3.6-27B across g6e vs g7e instances, three containers (vLLM, LMI, SGLang), FP8 vs FP16, and short to 16K-token contexts.
Project Deal: What Happens When AI Agents Trade with Each Other
Anthropic gave 69 employees Claude agents that autonomously negotiated real trades on Slack. Stronger models got better deals — and nobody noticed. A deep-research analysis of Project Deal and what it means for AI-mediated commerce.
World Models: From Cognitive Science to Biological Simulation
A comprehensive survey of world models — from Ha & Schmidhuber's dream-training agents to AlphaFold, Evo 2, and the AI Virtual Cell. Covering three architectural generations, the JEPA debate, and how biology is recapitulating AI's history.
Writing Your First Agent Skill: From SKILL.md to AWS Agent Registry
Agent skills are the portable plugin format that works across Claude Code, GitHub Copilot, Strands Agents, and dozens more. Here's how to write one, wire it into a Strands agent, and register it in AWS Agent Registry so your whole org can discover it.
You Don't Need a Real-Time Endpoint to Predict 100 GB Every Sunday Morning
Stop paying for a 24/7 inference endpoint when all you need is a weekly batch run. A simple architecture change can cut your costs by up to 99% and your inference time from days to minutes.
The AWS Playbook: From AgentCore to Agent Registry
AWS has been building the managed infrastructure for agentic AI at enterprise scale — from AgentCore's runtime and governance services to the newly announced Agent Registry. Here's how the pieces fit together, what a real production deployment looks like, and where the gaps remain.
The Platform Engineering Playbook for AI Agents
AI agents create two distinct relationships with your Internal Developer Platform — agents *on* the platform and agents *in* the platform. Here's the technical architecture for both.
Why Your AI Program Stalls Between Pilot and Production
71% of CDOs are experimenting with generative AI. Only 6% have it in production. The gap isn't a model problem — it's an infrastructure problem, and platform engineering is how you close it.
The Self-Improving Stack — From CLI to Platform to Paradigm
The teams that win in the agentic era won't have the best agents — they'll have the best optimization loops and the governance to trust them. Here's the full platform design and the argument for why the eval is the product.
autoresearchctl — Ship the Loop as a CLI
A pip-installable CLI that bakes the seven principles, dual eval harness, and six mutation operators into six verbs: init, eval, run, log, diff, rollback.
Beyond ML Training — Autoresearch as a Universal Optimization Pattern
The autoresearch loop has nothing inherently to do with ML. I generalized it to optimize docs for SEO (22/40 → 30/40 in 5 cycles, zero LLM calls) and distilled seven principles that make the loop reliable.
Autoresearch on SageMaker — Sleep While Your GPU Fleet Experiments
Porting Karpathy's autoresearch to SageMaker with parallel hypothesis testing, warm pools, and per-experiment cost tracking. We run real experiments and look at real results.
The Autoresearch Pattern — What Karpathy Got Right (and What's Missing)
Karpathy's 630-line Python file hit 50k stars. Here's why the pattern matters, what it gets right, and the five gaps that need closing.
No posts match your search.