have2lemma¶
Extract have statements from proofs and convert them into standalone lemmas.
Python API¶
result = await axle.have2lemma(
content=lean_code,
environment="lean-4.26.0",
names=["main_theorem"], # Optional
include_have_body=False, # Optional: use sorry instead
include_whole_context=True, # Optional
reconstruct_callsite=False, # Optional
verbosity=0, # Optional: 0-2
)
print(result.content)
print(result.lemma_names) # ["main_theorem.h1", "main_theorem.h2"]
CLI¶
Usage: axle have2lemma CONTENT [OPTIONS]
# Extract all have statements
axle have2lemma theorem.lean --environment lean-4.26.0
# Extract from specific theorems
axle have2lemma theorem.lean --names main_proof,helper --environment lean-4.26.0
# Include proof bodies in extracted lemmas
axle have2lemma theorem.lean --include-have-body --environment lean-4.26.0
# Reconstruct callsites (replace have with lemma call)
axle have2lemma theorem.lean --reconstruct-callsite --environment lean-4.26.0
# Skip context cleanup
axle have2lemma theorem.lean --no-include-whole-context --environment lean-4.26.0
# Pipeline usage
cat theorem.lean | axle have2lemma - --environment lean-4.26.0 | axle check - --environment lean-4.26.0
HTTP API¶
curl -s -X POST http://10.1.10.100:8989/v1/axle/have2lemma \
-d '{"content": "import Mathlib\ntheorem foo : 1 = 1 ∧ 2 = 2 := by\n have h1 : 1 = 1 := by rfl\n have h2 : 2 = 2 := by rfl\n exact ⟨h1, h2⟩", "environment": "lean-4.26.0"}' | jq
Input Parameters¶
| Name | Type | Required | Description |
|---|---|---|---|
content |
str |
Yes | A string of Lean code |
environment |
str |
Yes | Environment to use (e.g., "lean-4.26.0", "lean-4.25.1") |
names |
list[str] |
No | List of theorem names to process |
indices |
list[int] |
No | List of theorem indices (0-based, negative allowed) |
include_have_body |
bool |
No | Include have bodies in extracted lemmas (default: false, uses sorry) |
include_whole_context |
bool |
No | Include whole context when extracting (default: true) |
reconstruct_callsite |
bool |
No | Replace have statement with lemma call (default: false) |
verbosity |
int |
No | Pretty-printer verbosity: 0=default, 1=robust, 2=extra robust |
ignore_imports |
bool |
No | If true, ignore import statement mismatches and use the environment's default imports. If false, raise an error on mismatch. Default: false |
timeout_seconds |
float |
No | Maximum execution time (default: 120) |
Note: Specify only one of names or indices. If neither specified, processes all theorems.
Output Fields¶
| Name | Type | Description |
|---|---|---|
lean_messages |
dict |
Messages from Lean compiler with errors, warnings, infos lists |
tool_messages |
dict |
Messages from have2lemma tool with errors, warnings, infos lists |
content |
str |
The transformed Lean code with have statements extracted as lemmas |
lemma_names |
list[str] |
List of names of the newly created lemmas |
timings |
dict |
Timing info: total_ms, parse_ms |
Example Response¶
{
"lean_messages": {
"errors": [],
"warnings": [],
"infos": []
},
"tool_messages": {
"errors": [],
"warnings": [],
"infos": []
},
"content": "import Mathlib\n\nlemma foo.h1 : 1 = 1 := sorry\n\nlemma foo.h2 (h1 : 1 = 1) : 2 = 2 := sorry\n\ntheorem foo : 1 = 1 ∧ 2 = 2 := by\n have h1 : 1 = 1 := by rfl\n have h2 : 2 = 2 := by rfl\n exact ⟨h1, h2⟩",
"lemma_names": ["foo.h1", "foo.h2"],
"timings": {
"total_ms": 95,
"parse_ms": 88
}
}
Demo¶
There are a lot of configurable options for have2lemma. Let's go through them and discuss why they exist.
Options¶
There are three main options to discuss:
include_have_body: Whether to include have bodies in extracted lemmas. If false, lemmas will usesorryinstead. Defaults to false.include_whole_context: Whether to include the whole context (skip cleanup) when extracting have statements. Defaults to true.reconstruct_callsite: Whether to reconstruct the callsite (replace have statement with lemma call). Defaults to false.
Default behavior¶
Let's look at the simple following example.
theorem example_theorem (p q r : Prop) : p ∧ r → p ∨ q := by
intro hpr
have h1 : p := by simp_all
have h2 : r := by simp_all
left
assumption
include_have_body=false, include_whole_context=true, and reconstruct_callsite=false, giving us
lemma example_theorem.h1 (p q r : Prop) (hpr : p ∧ r) : p := sorry
lemma example_theorem.h2 (p q r : Prop) (hpr : p ∧ r) (h1 : p) : r := sorry
theorem example_theorem (p q r : Prop) : p ∧ r → p ∨ q := by
intro hpr
have h1 : p := by simp_all
have h2 : r := by simp_all
left
assumption
- both generated lemmas are sorried out -- this is the result of
include_have_body=false. - in both lemmas, the entire local context is provided, which is the result of
include_whole_context=true. This might include redundant variables -- in this case,qisn't relevant to the goal. - the main theorem is left unchanged -- this is the result of
reconstruct_callsite=false.
include_have_body¶
Let's see what happens if we set this value to true:
lemma example_theorem.h1 (p q r : Prop) (hpr : p ∧ r) : p := by simp_all
lemma example_theorem.h2 (p q r : Prop) (hpr : p ∧ r) (h1 : p) : r := by simp_all
Why bother making this configurable?
This option is NOT guaranteed to be robust, and might introduce errors into the file. In this example:
theorem complex_types : ∀ (n : Nat), n + 0 = n := by
intro n
have base : 0 + 0 = 0 := by rfl
have step : ∀ m, m + 0 = m → (m + 1) + 0 = m + 1 := by
intro m ih
rfl
sorry
lemma complex_types.step : ∀ (n : ℕ), 0 + 0 = 0 → ∀ (m : ℕ), m + 0 = m → m + 1 + 0 = m + 1 := by
intro m ih
rfl
n in the type. This means the proof will fail, because there is a missing intro n ....
include_whole_context¶
Now let's set this option to false. In our original example, this gives us:
lemma example_theorem.h1 (p r : Prop) (hpr : p ∧ r) : p := sorry
lemma example_theorem.h2 (p r : Prop) (hpr : p ∧ r) (h1 : p) : r := sorry
Notice that the tool has now removed the q variable from both lemmas, as it is irrelevant to the goal and hypotheses.
Why make this configurable?
In general, Lean's dependency analysis is purely based on heuristics. See the source:
A variable is relevant if (1) it occurs in the target type, (2) there is a relevant variable that depends on it, or (3) the type of the variable is a proposition that depends on a relevant variable.
Therefore, it's possible that a hypothesis in the context is useful even though Lean judges it to be irrelevant. In rare cases, it can break the proof when used in conjunction with include_have_body=true. For example:
theorem foo : Odd 5 ∨ Even 5 := by
have odd : Odd 5 := by exists 2
have sol : Odd 5 ∨ Even 5 := by
left
assumption
exact sol
include_have_body=true and include_whole_context=false, the tool will output the lemmas
Notably, in the second lemma, Lean judged the hypothesis odd as irrelevant -- no good! The proof body now breaks on assumption.
reconstruct_callsite¶
Our final option is the most intricate. Let's try enabling this option:
...
theorem example_theorem (p q r : Prop) : p ∧ r → p ∨ q := by
intro hpr
have h1 : p := example_theorem.h1 p q r hpr
have h2 : r := example_theorem.h2 p q r hpr h1
left
assumption
Why make this configurable?
Let's make a very small change to our original proof. Instead of running intro hpr, we'll have Lean generate the name for us, and just run intros.
have2lemma again.
lemma example_theorem.h1 (p q r : Prop) (a : p ∧ r) : p := sorry
lemma example_theorem.h2 (p q r : Prop) (a : p ∧ r) (h1 : p) : r := sorry
theorem example_theorem (p q r : Prop) : p ∧ r → p ∨ q := by
intros
have h1 : p := sorry /- try using example_theorem.h1 here -/
have h2 : r := sorry /- try using example_theorem.h2 here -/
left
assumption
intros, we introduce a new hypothesis with type p ∧ r -- but we haven't given it a name! This means we can't ever refer to it explicitly (i.e., it is inaccessible). (This is a Lean quirk which can be disabled, but hygienic names are generally a good thing.) have2lemma automatically generated the name a in the lemmas, but we can't assign anything to it -- so our tool complains that we've encountered an inaccessible variable, and gives up.
Verbosity¶
The verbosity parameter controls how explicit the pretty-printer is when generating lemma signatures. Higher verbosity levels produce more explicit output, which can help avoid ambiguity in complex type situations.
verbosity=0(default): Standard pretty-printing optionsverbosity=1: Robust options with additional explicitnessverbosity=2: Extra robust options with maximum explicitness
When to use higher verbosity¶
Consider this example involving coercions:
theorem explicit_coercion_test (n : ℕ) (hn : n > 0) : True := by
have h : (∑ i : Fin n, (1 : ℝ) / (i.val + 1)) ≤ (harmonic n : ℝ) + 1 := by
sorry
trivial
With default verbosity (verbosity=0), the coercion (harmonic n : ℝ) may be pretty-printed as Rat.cast (harmonic n), losing the target type ℝ. This causes Lean to fail with errors like "failed to synthesize RatCast ℕ" because it can't infer the correct target type for the coercion.
With verbosity=2, the pretty-printer uses pp.explicit=true, which preserves the target type information and produces a valid lemma signature.
Rule of thumb: If you encounter type inference errors in generated lemmas—especially involving coercions, casts, or polymorphic functions—try increasing the verbosity level.
Do note that at verbosity=2, type signatures may become incredibly complex and unreadable, so it should be used sparingly.
Summary¶
These configuration options provide some flexibility around usage, at the cost of correctness in some cases. Try to keep this in mind when generating bug reports -- some of these errors aren't fixable without significant effort.