ABOUT

Oddur Sigurdsson

Independent AI Researcher · Reykjavík, Iceland

Building Deepwork Research: an autonomous AI research platform that uses Claude Code agents to conduct literature reviews, develop formal frameworks, design experiments, and write papers targeting top ML/AI venues — NeurIPS, ICML, ICLR, ACL. The premise is straightforward: give AI agents the tools, structure, and oversight to do real research, then publish the results under rigorous peer review.

Before Deepwork, I spent years working at the intersection of software engineering and machine learning, building systems that needed to reason about uncertainty, language, and scale. The platform is the natural extension of that work: instead of using AI as a tool within a larger system, the AI is the researcher, and the system exists to keep it honest.

Research Agenda

LLM Reasoning Capabilities & Limitations

Characterizing what large language models can and cannot reason about. Formal analysis of reasoning gaps — the systematic failures that persist even as models scale. Current work targets NeurIPS 2026 with a paper on computational complexity bounds for chain-of-thought reasoning.

Computational Complexity of AI Verification

Understanding the theoretical limits of verifying AI-generated outputs. When can we efficiently check that an LLM's reasoning is correct? When is verification fundamentally harder than generation? These questions connect to classical complexity theory in ways that have practical implications for AI safety.

Autonomous Agent Architectures

Designing and studying multi-agent systems that can conduct sustained, independent research. This includes failure modes (an agent failure taxonomy targeting ACL 2027), coordination protocols, and the metacognitive capabilities required for an AI system to do genuinely novel work.

Scaling Laws & Emergent Capabilities

Empirical and theoretical work on how capabilities emerge (or fail to emerge) as models scale. Particularly interested in phase transitions — the sharp thresholds where qualitative changes in behavior appear — and whether these can be predicted from first principles.

The Platform

Deepwork Research is an autonomous research platform built on Claude Code. It manages multiple research projects simultaneously, each running in isolated git worktrees with their own agent sessions, briefs, and status tracking. The platform operates 24/7 — agents read papers, develop theoretical frameworks, write LaTeX, run experiments, and iterate on drafts without waiting for human input.

The architecture is deliberately simple. A TypeScript orchestrator manages project state and coordinates agent sessions. Each project follows a structured workflow: literature review, framework development, experimentation, paper writing, and revision. Agents make decisions autonomously and log their reasoning. Human oversight happens through pull request review, status dashboards, and periodic check-ins — not through constant supervision.

The goal is not to remove humans from research but to test a hypothesis: that AI agents, given sufficient structure and autonomy, can produce work that survives peer review at top venues. Every paper produced by the platform will be submitted under full disclosure of its AI-driven methodology.

Links