From Esterman Disagreement to Rule-Checking Web App

GRAPHIC of glasses driving along a road.

Sydney ophthalmologist Dr Simon Chen explains how a borderline case, a quiet correction from an optometrist, and a failed image-parsing artificial intelligence (AI) experiment led to a free Austroads driving-fields tool for Australian clinicians.

A patient came into clinic after his driver’s licence had been refused on the basis of his Esterman visual field. When I looked at the printout, the horizontal extent appeared to be well above the Austroads cut-off for a private licence. I asked two experienced ophthalmology colleagues and a couple of optometrists what they thought. All four agreed with me. Between us, we concluded that the licensing authority had probably read the field wrong.

He came back a month later and told me the decision had been upheld on review. I assumed the authority had been stubborn. Then another optometrist pointed out, quietly, that I had been measuring it wrong. Austroads measures horizontal extent from the last point seen on each side, not from the last point not seen. Once you reread the field with the correct convention, the licensing decision was right. All five of us, looking at the same Esterman, had been confidently wrong.

That kind of case stays with you. The interesting question wasn’t whose reading was correct. It was that five experienced clinicians had looked at the same printout and disagreed with the licensing authority’s reading of it.

Since then, I have shown difficult fields to colleagues and discussed the rules with both optometrists and ophthalmologists. My impression is that the disagreement was not unusual. Competent clinicians can read the same field differently because the operational details are easy to misremember.

What’s at stake cuts both ways. Get the rules wrong in one direction and a patient who actually meets the standard loses their licence: a tradesperson who can’t drive to jobs, an aged-care worker on a 6am start, a parent who can no longer ferry children, an older patient who becomes dependent on family, anyone in a rural area with no realistic public transport. Get it wrong the other way and someone who shouldn’t be driving stays on the road, putting their passengers, other motorists, pedestrians, and cyclists at risk of serious injury.

Many glaucoma patients drive safely for years, but the conversation becomes difficult when field loss approaches the thresholds. Optometrists often carry the front-line burden of that discussion: detecting and monitoring disease, and explaining why a seemingly technical result can have a very practical licensing consequence.

Why the Rules are Harder Than They Look

Most Esterman fields are easy to interpret. There is a clear pass or a clear fail, and the rule application is unambiguous. But a meaningful minority sit close to the thresholds, and those are the cases where the operational details of Austroads start to matter.

The Australian standard, Austroads Assessing Fitness to Drive,¹ is the source document. The visual-field section is not especially long, but it is more nuanced than many of us remember when a printout lands on the desk between patients.

For private drivers, a binocular Esterman with at least 110° of horizontal extent may meet the unconditional numeric field requirement, assuming the central field and reliability criteria are also satisfactory. A field between 90° and 109° may support a conditional licence, depending on the broader clinical situation and the licensing authority’s decision. Commercial driving is stricter: a commercial unconditional licence is not normally compatible with a significant field defect, and a conditional pathway requires at least 140° of horizontal extent, no field loss likely to impede driving, and careful consideration of the driving task.

Those numbers sound straightforward, but the 120-point Esterman grid hides its complexity. Horizontal extent is measured from seen point to seen point within a band around the meridian. Small clusters of missed points on or across the meridian may be disregarded in specific circumstances. Central field rules turn on topology, not point counts. Reliability is not a footnote: the false-positive rate matters and fixation monitoring should be recorded. None of this is hidden in the document. It is just easier to remember roughly than to apply exactly when a borderline field is in front of you between patients.

This is particularly relevant in glaucoma. Many glaucoma patients drive safely for years, but the conversation becomes difficult when field loss approaches the thresholds. Optometrists often carry the front-line burden of that discussion: detecting and monitoring disease, and explaining why a seemingly technical result can have a very practical licensing consequence.

Why I Built DRIVE Fields

I wanted a second set of eyes for this problem. Not a tool that replaced clinical judgement, and certainly not something that pretended to be the licensing authority. Something narrower: take the field data the clinician has already reviewed, apply the rules consistently, show the working, and produce a report that can be checked against Austroads.

I’m an ophthalmologist by training, with the bulk of my clinical work in cataract and retinal surgery. I am not a software developer. Over recent years I have become increasingly interested in how AI can be applied to day-to-day work and life. I had been using it for writing, lecture preparation, personal finance, business administration, knowledge management, personal workflow tools, and a tool for analysing genomic data.

As the tools have matured, the bar for what a non-developer clinician can build has shifted. A clinician can now describe a precise clinical problem in ordinary language and, with enough correction and iteration, turn the specification into working software. The clinical truth still has to come from the clinician: the source document, the edge cases, and the uncomfortable question of what is safe enough for a real clinical record.

The Clever Version That Wasn’t Safe Enough

The first version of DRIVE Fields (DRIVE stands for Driving Rules Interpreter – Visual-field Engine) was the obvious one: upload a photo of the printed Esterman and let AI do the rest. The clinician would photograph the printout with a phone, an AI image parser would read off the missed points, apply the Austroads logic, and produce a verdict. No data entry. Just upload and wait.

It was attractive because it was frictionless. In early testing it worked often enough to be tempting. Then I started feeding it the kinds of photos clinicians actually take: glare across the page, slight crops, phones held at angles. Grid layouts differed between Humphrey and Medmont reports. Marker conventions for seen versus missed varied. The AI image parser sometimes made confident-looking errors.

I spent a few hours trying to push it further. Image preprocessing helped. Better grid registration helped. I applied Andrej Karpathy’s ‘auto-research’ workflow, where an AI reviews its own outputs and autonomously adjusts its prompts to iterate towards a better result. I also used different large language models to critique each other’s outputs and refine the parsing logic from there. Accuracy climbed into the 80–90% region.

Ninety per cent accuracy sounds impressive. It is useless for clinical decision support. A one-in-10 error rate on a medico-legally significant judgement is not something you ship, even with a clinician review step bolted on top, because the wrong answer can quietly bias the reviewer.

There was a deeper problem. An Esterman is already a grid of yes-or-no answers. Asking a probabilistic AI to reconstruct that discrete information from a photograph adds uncertainty to data that didn’t need to be uncertain. It is much more reliable to have the clinician enter the points directly.

The Boring Version is Better

So, I built the manual grid. The clinician looks at the printed Esterman and taps the missed points on a 120-point on-screen grid that mirrors the printed layout, which takes less than a minute. The clinician then enters the relevant clinical details: licence class, state or territory, visual acuity, false-positive rate, fixation monitoring, monocular status, test type, and the source of the printout.

That minute of data entry buys something important: a structured, clinician-confirmed data set fed into a deterministic rule engine. The same input produces the same output every time. There is no language model interpreting the case at runtime, and no image parser deciding which points were missed. The code applies the Austroads rules, reports the measurements, and shows it’s working.

There is another benefit of staying away from image upload. De-identifying clinical photographs is easy to get wrong. Markers, stickers, and partial labels can all leave identifying information in the picture. The safest way to avoid that mistake is not to need the image at all, and the manual grid removes the question entirely.

The result is not simply ‘pass’ or ‘fail’. It may indicate that the entered findings appear to meet Austroads unconditional criteria, may support conditional licensing, do not meet criteria, are unreliable, or require manual review. The manual-review category is deliberate. Borderline cases should not be made falsely neat just because software prefers a tidy answer. If the field is near a threshold, the reliability data are missing, or the clinical context falls outside the rule engine’s safe operating area, the app should stop short of confidence.

The same caution applies to device selection. Perimeters do not always test identical point layouts, even when the printout looks familiar. The app supports common and less common devices. Where a device’s test pattern hasn’t been confirmed equivalent to the standard Esterman, the result is flagged for manual review rather than reported as a confident verdict.

Figure 1. Esterman printout and DRIVE Fields grid, side by side. The DRIVE Fields manual grid (right) mirrors the printed Esterman report (left). The clinician taps the missed points onto a 120-point on-screen grid, which takes less than a minute. The same input produces the same Austroads verdict every time.

Figure 2. Manual grid in use. The on-screen Esterman grid in use, with missed points marked in red. The app shows live measurements, including the horizontal extent and the position of the cursor in degrees from fixation, as the clinician enters points.

What the Clinician Sees

The tool is live at drivefields.com.au. It is free, runs in the browser, requires no account, and has no commercial interests behind it: no advertising, no sponsorship, no affiliation with any perimetry manufacturer. All processing happens locally on the clinician’s device. No patient data is collected or transmitted. The full Austroads section 10.2 chapter is reproduced verbatim inside the app, so any clinician who wants to audit the logic against the source can do so in a single click.

The output is designed for the clinic. It shows the computed measures, the relevant Austroads clauses, the points contributing to each finding, and a printable summary or detailed report. That helps with documentation. It also helps with patient explanation. Many patients find a difficult driving conversation easier to understand when the reasoning is visible. “The licensing authority decides, but here is how the field aligns with the national standard” is a better conversation than “I think it probably passes, but I’m not sure how they will read it”.

The app also has a Learning Centre. I started with a small help section, then realised the educational part was part of the tool rather than a side note. It now covers test selection, printout checks, Esterman anatomy, the roving Esterman, common rule traps, device guidance, a glossary, and demo cases. A clinician using the app repeatedly should become more familiar with the rules, not more dependent on the tool.

Figure 3. Print-ready clause-referenced report. The print-ready DRIVE Fields report documents the verdict, the computed measures (horizontal extent, central misses, false-positive rate, central cluster size), and the Austroads clause supporting each finding. The report attaches directly to the patient’s notes.

What AI Actually Did – and Didn’t – Do

In this project, AI coding agents (Claude Code and OpenAI Codex) helped create the interface, write the rule functions, generate tests, and challenge assumptions. The development loop was roughly as follows: I would describe the clinical behaviour I needed in plain English. The agent would propose an implementation and write the code. I would review the output as a clinician: Does this actually match what Austroads says?

A second AI would then review the same code from a ‘what could break this?’ angle, raising objections the first agent hadn’t considered. What if fixation monitoring is missing? What if the patient is applying for a commercial licence? What if the field source isn’t actually equivalent to the standard Esterman grid? Some of the most useful AI outputs in the project weren’t code at all, but these kinds of objections. For one of the more fiddly components, the adversarial pass surfaced four real bugs I would otherwise have overlooked.

The honest division of labour was something like 80/20. The agents handled the scaffolding, the plumbing, and the first passat the user interface. The remaining 20% was clinician work and could not have been outsourced: the clinical specification, the edge-case judgement, the final reading of Austroads, and the question of what counts as safe enough. The non-developer clinician’s job here is to be precise about clinical truth and to argue with what the agent produces. The code becomes secondary.

AI also helped me iterate the user interface design. A key priority was making the app feel intuitive, visually pleasant, and efficient to use in a busy clinic. I also made a short walkthrough that runs through the workflow. I wanted it to feel like a proper software launch rather than a silent screen recording. Good tech-product launches use upbeat music that builds anticipation, so I asked an AI to write a production brief for a tech-launch soundtrack and used a separate AI music generator to produce the audio. AI prompting AI to score a video about AI helping a non-developer build a clinical decision-support tool. That was useful, and slightly amusing, but it was packaging. The clinical core still had to be boring in the right way.

But AI was not the right mechanism for the clinical decision itself. The deployed tool does not ask an AI model whether the patient appears fit to drive. It asks the clinician to enter the facts, then applies deterministic
rules to those facts. That distinction matters. AI was helpful in the workshop. It should not be the judge in the clinic.

The failed image parser was not wasted work. It taught me something I probably would not have learnt from a purely theoretical discussion of AI safety: Better AI is not always the safer answer. Sometimes the safer answer isa simpler interface, a transparent rule engine, and a clinician who remains firmly in the loop

Where It Sits Now

DRIVE Fields is live and being used in clinics around Australia. Representatives from Optometry Australia, the Royal Australian and New Zealand College of Ophthalmologists, and Glaucoma Australia, along with numerous optometrists and ophthalmologists in clinical practice, have reviewed it very positively.

The Austroads visual-field guidelines themselves may also be revised in 2027. If they are, the app’s rule engine will be updated to match.

Where This Fits In Practice

I see DRIVE Fields as a practical adjunct for clinicians who already perform these assessments. It is most useful when the field is borderline and when the clinician wants a clear record of the rule basis. The printable report, with the relevant Austroads clauses sitting next to the computed measures, makes a useful addition to the patient’s notes. It documents the logical basis for the advice you give about the result, which is harder to reconstruct later from memory if the assessment is ever queried. The cases are also useful for teaching.

The tool is not a Therapeutic Goods Administration-listed medical device, not a licensing authority, and not a guarantee of any outcome. The final licensing decision sits with the driver licensing authority. Clinical judgement sits with the clinician. As the population ages and field-affecting conditions remain common, patients deserve a better workflow than memory, rough counting, and hope.

The tool will improve as clinicians argue with it. I would be particularly interested in de-identified examples where the app appears too strict, too generous, unclear, or inconsistent with a licensing authority’s interpretation. Send them to feedback@drivefields.com.au.

Dr Simon Chen MBBS BSc (Hons) FRANZCO is an experienced cataract and vitreoretinal surgeon at Vision Eye Institute, Chatswood, Sydney, and an Adjunct Senior Lecturer at the University of New South Wales. His practice focuses on cataract and retinal surgery, with particular experience in complex cataract surgery in eyes with retinal disease or trauma.

DRIVE Fields is available at: drivefields.com.au with a 90-second walkthrough at drivefields.com.au/video. It is free to use, has no commercial sponsorship, and is decision support only.

Reference

Austroads and National Transport Commission, Assessing fitness to drive, 2022. Available at: austroads.gov.au/publications/assessing-fitness-to-drive/ap-g56 [accessed May 2025].

Recent Posts

Demodex Blepharitis Study

Call for WHO Recognition of Toxoplasmosis

MIVISION DIGITAL JOURNAL