Idea for an advanced code optimization harness where an agent is in the loop, changing code, which is then built and benchmarked, feeding results back to the agent.
Overall design:
Metrics collection
- Metrics are streamed by appending to a CSV file
- First column is always the iteration count
- Column name can optionally indicate direction of optimization including a string, “lower is better” or “higher is better”. If not specified, defaults to higher is better
- In case of multiple metrics, priority is left to right. In other words, the leftmost metric should be optimized over the others
TODO: weighted metrics?
Code access
The agent is configured to have read/write access only to a subset of the codebase.
The rest can be read-only or not accessible.
It should be possible to restrict to individual files, in addition to directories.
The configured restrictions are also injected in the system prompt, so the agent is “aware” of which files can be modified.
TODO: look into agent sandboxes, for example nono
Optimization tree
The optimization process starts from a baseline (root of the tree) and generates branches while different options / “lines of inquiry” are explored
Each node of the tree stores inputs of the experiment (what code was changed, with a short description) and outputs (metrics and analysis of results)
Nodes are identified by dot-separated numeric IDs like 1.1 and 3.5.21
With the following meaning:
- First level is the optimization run. While exploring the solution space, we should be able to tweak optimization criteria and re-run the loop
- Every next level is the identifier of the branch taken
Example:
- 1 : baseline for the very first run
- 1.3 : experiment for the third independent change from the baseline
- 1.3.2 : a promising change on top of 1.3
- 5.1.23 : we tweaked the optimization criteria four times, this is the fifth invocation of the optimizer. After the first change over baseline (5.1), we are now testing the 23rd variants
Code changes and revision control
Todo: investigate best way to track code changes
Options:
- in branches :same tree structure defined above, branch names include node IDs (example: loopty-5.1.23)
- as diffs stored in each node (directory) of the optimization tree
Context management
Before every experiment, the tool builds context for the agent.
- Initial statement from the user, describing the problem
- Short descriptions of each code change, from root to parent leaf
- Short summary of each result, from root to parent leaf
Todo: what if we provided summaries of all experiments, not just the current line of inquiry?
Protections - rate limits and token budget
Having an infinite loop consuming tokens could become very expensive in case of unexpected failure modes.
Include at least two protections:
- rate limit invocations of the main optimization loop
- token budget: the optimization run is terminated if a configured maximum number of tokens has been used across all agent invocations