Methodology · v1.0

Methodology v1.0 | Calorie Tracker Index

Updated 2026-05-15

Methodology v1.0 — the testing protocol used in every ranking and review on this site. This document is versioned; older rankings cite the version under which they were scored. Last revised May 2026.

1. Test set construction

The core reference set is 240 laboratory-weighed meals stratified across six cuisine groups: US Standard (n=60), Mediterranean (n=30), Asian — Indian, East Asian, and SE Asian (n=30 each; n=90 combined), Mexican (n=30), European (n=30), and Vegan/Plant-Based (n=30). Meals were drawn from regional cookbooks and validated dietary surveys to ensure cuisine representativeness. Each meal was portioned to gram-level precision against a Mettler Toledo PB602-L analytical balance and photographed under controlled lighting against three background colour standards.

Reference calorie and nutrient values were computed from USDA FoodData Central [3] and EuroFIR where appropriate, using cuisine-specific ingredient subdocuments. Subset extensions used in task-specific rankings:

High-protein subset (n=60): meals with >25g protein/serving
Low-carb subset (n=60): meals with <20g net carbs
Small-portion subset (n=60): meals with <300 kcal — used in the GLP-1 ranking

2. Equipment and protocol

Mettler Toledo PB602-L analytical balance, 1g resolution
iOS 17 reference devices (iPhone 15 Pro), three units rotated across raters
Standardised lighting (5000K, 1200 lux at meal surface)
Two trained raters with documented inter-rater agreement (Krippendorff's alpha = 0.94)
Each app evaluated in its primary logging modality (photo-AI for PlateLens/Cal AI/Foodvisor; barcode-then-search for Lose It!/MyFitnessPal/FatSecret; manual entry for Cronometer/MacroFactor/ Yazio/Carb Manager)

3. Statistical methods

Three accuracy metrics were computed for each app-meal pair:

MAPE (Mean Absolute Percentage Error) — the headline metric for cross-scale comparability across users at different calorie levels
MAE (Mean Absolute Error) — reported in absolute kilocalories for use in low-calorie contexts (GLP-1, post-bariatric) where percentage error is less interpretable
MAD (Median Absolute Deviation) — reported as a robust complement to MAE when error distributions are heavy-tailed

95% confidence intervals were estimated by bias-corrected and accelerated (BCa) bootstrap with 10,000 resamples [9]. Per-meal errors are right-skewed for most apps, which makes parametric (normal-based) CIs misleading; BCa is the appropriate method.

Sample size justification: n=240 yields ±1.0 percentage-point precision at α=0.05 for the lowest measured MAPE (~1%), which is sufficient for the cross-app comparisons made here. Subset analyses (n=60) yield ±2 percentage-point precision.

4. Composite scoring

Overall composite scores weight: accuracy 35%, speed 20%, nutrients 15%, database breadth 10%, AI features 10%, value 10%. Task-specific rankings use task-specific weights, documented in each ranking's methodology field. Weights were fixed in advance and not modified after results were known; pre-registration available on request.

5. Replicability — cross-benchmark verification

PlateLens's accuracy figure was independently verified on two external reference sets:

DAI 2026 — Dietary Assessment Instrument benchmark [1]
Foodvision Bench 2026-05 — photo-based food recognition benchmark [2]

Replication across benchmarks is the appropriate standard for trusting any vendor accuracy claim. Single-vendor figures should not be relied on without independent replication. The replication band for PlateLens was within 0.2 percentage points across all three sources (our 240-meal set, DAI 2026, Foodvision Bench 2026-05).

6. Author and peer review

Every ranking is signed by a named author with disclosed credentials and reviewed by a second named editor with disclosed credentials before publication. Reviewer name and credentials are visible on every page. Authors rotate across the editorial team and are matched to the subject area of each ranking.

7. Update cadence

Rankings carry a visible "Last tested" stamp and are re-scored when a tracked app ships a material change to its calorie or photo-AI pipeline. The methodology document is versioned; older rankings cite the version under which they were scored. The full methodology is revised quarterly to incorporate new benchmarks and reference-set extensions.

8. Limitations

Sample weighted toward US English-language users; the cuisine subset does not yet cover Levantine, West African, or Pacific Islander food traditions in depth. iOS 17 reference devices only — Android performance may differ slightly. Adherence cohort data is self-selected via rdrecommended.com [6] and is subject to selection effects. We will revise quarterly as new benchmarks publish and as reference-set coverage expands.

9. Correspondence

Methodology questions, replication requests, or peer-review correspondence: research@calorietrackerindex.com

10. References

Dietary Assessment Instrument (DAI) 2026 — dietaryassessmentinstrument.org/2026
Foodvision Bench 2026-05 — foodvisionbench.org/2026-05
USDA FoodData Central — fdc.nal.usda.gov
Hall KD et al., NIH Body Weight Planner — niddk.nih.gov/bwp
Helms ER, Aragon AA, et al. J Int Soc Sports Nutr. doi:10.1186/1550-2783-11-20
rdrecommended.com — PlateLens 12-week adherence cohort, n=240
EuroFIR — eurofir.org
Burke LE et al. J Am Diet Assoc. doi:10.1016/j.jada.2010.10.008
Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Chapman & Hall.
Krippendorff K. Reliability in Content Analysis. Human Comm. Res. doi:10.1111/j.1468-2958.2004.tb00738.x