Journal of AI, ML & Cloud Architecture

Latent Thoughts

AI, ML, cloud architecture, and engineering — decoded.

/

All entries 16 posts

May 6, 2026 13 min

Fine-Tuning Qwen3.5 Instruct on SageMaker: 8 Cells, 1 OOM, 6 Gotchas

The Instruct recipes carried over from Base with one line changed. Sizing the GPUs didn't — 4xL40S OOMs on 9B full SFT, and g7e.12xlarge is the cheapest box that actually fits.

May 4, 2026 14 min

JEPAworld-modelsself-supervised-learning

JEPA: The Architecture Behind LeCun's Vision for World Models

JEPA predicts in latent space, not pixel space. That one difference underpins Yann LeCun's entire blueprint for machines that learn world models, plan hierarchically, and reason by simulation.

Apr 28, 2026 17 min

AgentCoreBedrockAI-agents

Three Ways to Run an Agent on AWS: AgentCore Runtime, AgentCore Harness and OpenAI Managed Agents

AWS now offers three distinct paths to deploy production agents: bring your own code (Runtime), configure a managed loop (Harness), or use OpenAI's optimized agent orchestration (Managed Agents). Here's when to choose which — and what each costs.

Apr 27, 2026 16 min

SageMakerg7eBlackwell

One Blackwell GPU Beats Four L40S: Benchmarking Qwen3.6-27B on SageMaker

A single $2.49/hr RTX PRO 6000 Blackwell GPU delivers 44.82 req/s — 2.2x faster and 14x cheaper than four L40S GPUs at $15.68/hr. We benchmark Qwen3.6-27B across g6e vs g7e instances, three containers (vLLM, LMI, SGLang), FP8 vs FP16, and short to 16K-token contexts.

Apr 27, 2026 8 min

AI-agentsAnthropicnegotiation

Project Deal: What Happens When AI Agents Trade with Each Other

Anthropic gave 69 employees Claude agents that autonomously negotiated real trades on Slack. Stronger models got better deals — and nobody noticed. A deep-research analysis of Project Deal and what it means for AI-mediated commerce.

Apr 27, 2026 19 min

world-modelsdeep-learningtransformers

World Models: From Cognitive Science to Biological Simulation

A comprehensive survey of world models — from Ha & Schmidhuber's dream-training agents to AlphaFold, Evo 2, and the AI Virtual Cell. Covering three architectural generations, the JEPA debate, and how biology is recapitulating AI's history.

Apr 24, 2026 16 min

AI-agentsStrands-AgentsAgent-Skills

Writing Your First Agent Skill: From SKILL.md to AWS Agent Registry

Agent skills are the portable plugin format that works across Claude Code, GitHub Copilot, Strands Agents, and dozens more. Here's how to write one, wire it into a Strands agent, and register it in AWS Agent Registry so your whole org can discover it.

Apr 20, 2026 17 min

SageMakerbatch-transforminference

You Don't Need a Real-Time Endpoint to Predict 100 GB Every Sunday Morning

Stop paying for a 24/7 inference endpoint when all you need is a weekly batch run. A simple architecture change can cut your costs by up to 99% and your inference time from days to minutes.

Apr 15, 2026 21 min

agentic-AIplatform-engineeringAWS

The AWS Playbook: From AgentCore to Agent Registry

AWS has been building the managed infrastructure for agentic AI at enterprise scale — from AgentCore's runtime and governance services to the newly announced Agent Registry. Here's how the pieces fit together, what a real production deployment looks like, and where the gaps remain.

Apr 15, 2026 18 min

platform-engineeringagentic-AIIDP

The Platform Engineering Playbook for AI Agents

AI agents create two distinct relationships with your Internal Developer Platform — agents *on* the platform and agents *in* the platform. Here's the technical architecture for both.

Apr 15, 2026 11 min

agentic-AIplatform-engineeringenterprise-AI

Why Your AI Program Stalls Between Pilot and Production

71% of CDOs are experimenting with generative AI. Only 6% have it in production. The gap isn't a model problem — it's an infrastructure problem, and platform engineering is how you close it.

Apr 3, 2026 7 min

autoresearchplatformarchitecture

The Self-Improving Stack — From CLI to Platform to Paradigm

The teams that win in the agentic era won't have the best agents — they'll have the best optimization loops and the governance to trust them. Here's the full platform design and the argument for why the eval is the product.

Apr 2, 2026 5 min

autoresearchCLItools

autoresearchctl — Ship the Loop as a CLI

A pip-installable CLI that bakes the seven principles, dual eval harness, and six mutation operators into six verbs: init, eval, run, log, diff, rollback.

Apr 1, 2026 9 min

autoresearchoptimizationKiro

Beyond ML Training — Autoresearch as a Universal Optimization Pattern

The autoresearch loop has nothing inherently to do with ML. I generalized it to optimize docs for SEO (22/40 → 30/40 in 5 cycles, zero LLM calls) and distilled seven principles that make the loop reliable.

Mar 31, 2026 8 min

autoresearchSageMakerML

Autoresearch on SageMaker — Sleep While Your GPU Fleet Experiments

Porting Karpathy's autoresearch to SageMaker with parallel hypothesis testing, warm pools, and per-experiment cost tracking. We run real experiments and look at real results.

Mar 30, 2026 10 min

autoresearchMLagents

The Autoresearch Pattern — What Karpathy Got Right (and What's Missing)

Karpathy's 630-line Python file hit 50k stars. Here's why the pattern matters, what it gets right, and the five gaps that need closing.