Lexicase Selection

Lexicase selection is a selection method first developed for genetic programming. It has since been applied to other machine learning algorithms. It considers performance on individual training cases separately, rather than aggregating performance measures across multiple cases. It considers cases in different random orders, and therefore with different priorities, for each selection event.

Python Package Available: Install from PyPI

How It Works

The following describes the ordinary lexicase selection method. See variants for other versions.

Random Ordering: For each selection event, the training cases are randomly shuffled
Sequential Filtering: Candidates are filtered based on performance on each training case in order
Best Performance: Only individuals with the best performance on the current training case advance
Final Selection: The process continues until one individual remains or all training cases are considered, in which case a radom remaining individual is selected

Advantages

Avoids averaging traps: Non-aggregating selection prevents search from getting stuck in local optima caused by balancing mediocre performance across training cases, enabling richer exploration of the solution space
Preserves diversity and specialists: Random ordering of training cases combined with non-aggregation allows specialized solutions to survive and reproduce based on excellence in specific areas, maintaining population diversity
Handles multidimensional problems naturally: Eliminates the need to weight competing objectives or collapse multiple criteria into a single fitness score

Applications

Lexicase selection has been successfully applied to:

Program synthesis
Symbolic regression (using epsilon lexicase selection)
Deep learning (using gradient lexicase selection)
Multi-objective optimization
Reinforcement learning
Learning classifier systems

Algorithm Pseudocode

function lexicase_selection(population, training_cases):
    shuffle(training_cases)  // Random ordering of training cases
    candidates = copy(population)
    
    for each training_case in training_cases:
        if length(candidates) <= 1:
            break
            
        best_performance = find_best_performance(candidates, training_case)
        candidates = filter_by_performance(candidates, training_case, best_performance)
    
    return random_choice(candidates)

Variants

Epsilon Lexicase

Designed specifically for continuous-valued errors (while standard lexicase is best suited for problems in discrete domains, for example with Boolean or integer errors), epsilon lexicase selection allows individuals within an epsilon threshold to be considered equivalent:

best_performance = find_best_performance(candidates, training_case)
threshold = best_performance + epsilon
candidates = filter_by_threshold(candidates, training_case, threshold)

In the recommended form of epsilon lexicase selection, we keep not only the candidates with exactly the best performance of any individual currently in the set (on the given case), but also any other candidates with performance that differs from the best performance (on that case) by less than epsilon, where epsilon is the median absolute difference between errors on the case and the median error for the case, computed over the entire population, once per generation. This is sometimes called MAD epsilon lexicase selection, for Median Absolute Deviation (from the median).

Down-sampled Lexicase

Uses a subset of training cases for each selection event to reduce computational cost, often allowing better solutions to be found when the savings are used to run the learning/evolutionary algorithm for additional epochs/generations. Random downsampling often works well, but informed downsampling can work better.

DALex (Diversely Aggregated Lexicase)

In DALex, training cases are aggregated, but only after randomized weighting. When weight differences are large, DALex's behavior approaches that of ordinary lexicase selection. This permits significant efficiency improvements and provides a "particularity pressure" hyperparameter that can be adjusted for better performance in some environments.

Plexicase (Probabilistic Lexicase)

Approximates the selection probabilities for each individual under lexicase selection, and then samples from that distributuion instead of performing lexicase selection's repeated filtering steps. Behaves similarly to lexicase selection but is significantly more efficient. Also allows for adjustment of the kurtosis of the distribution, which can be beneficial in some problem environments.

Gradient Lexicase

Integrates lexicase selection into deep learning by simultaneously performing gradient descent on several copies of the model, each trained with respect to a different subset of the training data, and then performing lexicase selection to determine which model to carry forward to the next learning epoch.

Batch Lexicase

Aggregates errors within randomly-chosen batches of training cases, and then treats the batches as individual training cases for lexicase selection. May be useful for noisy training data.

Time Complexity

The time complexity of ordinary lexicase selection is O(n × m) where n is population size and m is number of training cases. However, the empirically observed runtimes are generally significantly better than this basic analysis would lead one to expect. Explanations for the better-than-expected performance are explored in this paper. DALex and Plexicase provide other high performance options.

References

Spector, L. (2012). Assessment of Problem Modality by Differential Performance of Lexicase Selection in Genetic Programming: A Preliminary Report. In Companion Publication of the 2012 Genetic and Evolutionary Computation Conference, GECCO’12 Companion, pp. 401-408.
Helmuth, T., L. Spector, & Matheson, J. (2014). Solving Uncompromising Problems with Lexicase Selection. In IEEE Transactions on Evolutionary Computation, vol. 19, no. 5, pp. 630-643.
La Cava, W., L. Spector, & Danai, K. (2016). Epsilon-lexicase selection for regression. GECCO '16: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 741-748.
Hernandez, J. G., A. Lalejini, E. Dolson, & Ofria, C. (2019). Random subsampling improves performance in lexicase selection. GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 2028-2031.
Aenugu, S., & Spector, L. (2019). Lexicase selection in learning classifier systems. In GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 356–364.
Ding, L., & Spector, L. (2022). Optimizing Neural Networks with Gradient Lexicase Selection. In The Tenth International Conference on Learning Representations, ICLR 2022.
Helmuth, T., Lengler, J., & La Cava, W. (2022). Population Diversity Leads to Short Running Times of Lexicase Selection. In Parallel Problem Solving from Nature – PPSN XVII: 17th International Conference, PPSN 2022, pp. 485-498.
Ding, L., Pantridge, E., & Spector, L. (2023). Probabilistic lexicase selection. In GECCO '23: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1073-1081.
Spector, L., Ding, L., & Boldi, R. (2023). Particularity. In Genetic Programming Theory and Practice XX. New York: Springer. pp. 159-176.
Ni, A., Ding, L., & Spector, L. (2024). Dalex: Lexicase-like selection via diverse aggregation. In European Conference on Genetic Programming (EuroGP, part of EvoStar), pp. 90-107.
Boldi, R., Briesch, M., Sobania, D., Lalejini, A., Helmuth, T., Rothlauf, F., Ofria, C., & Spector, L. (2024). Informed Down-Sampled Lexicase Selection: Identifying Productive Training Cases for Efficient Problem Solving. In Evolutionary Computing 32 (4): 307–337.
Bahlous-Boldi, R., Ding, L., Spector, L., & Niekum, S. (2025). Pareto Optimal Learning from Preferences with Hidden Context. In Reinforcement Learning Conference, RLC 2025.
See also: Lexicase Selection on Google Scholar.

Resources

Contact

Lee Spector, lspector@amherst.edu, leespector.com

Last updated: October 2025