One of the more common ways to train an artificial intelligence model is with a method called k-fold cross-validation. You train the algorithm on a certain subset of data (the training set) and validate it on unseen data (the testing set). But instead of dividing it into two data sets, you divide it into more subsets, say 10. This allows you to train your models on more diverse subsets of training data (or “folds”) and test their fitness on more diverse sets as well.
Usually, this is done with k minus one sets, or, if you use our example of folding a set ten times, you train the algorithm on 90% of the data (nine folds) and you validate it on the last 10% (the final fold). So while you’re training on folds one through nine, you can also train a model using folds two through ten, or all the folds but the fourth and so on. The unseen data is different in each and so is the training data. This means you’ll get more diversity from your models and, hopefully, better results. You also remove much of the bias inherent in the makeup of the seen and unseen datasets, improving your estimation of the system’s generalization.
In a recently awarded patent, we have outlined the invention of a different but fundamentally effective twist on the process. Inventors Babak Hodjat, Hormoz Shahrzad, Kaivan Kamali and Daniel Edward Fink were awarded a US patent for “Evolutionary technique with n-pool evolution” and we’d like to take a moment to explain not just how it works, but why we think it’s important.
Essentially, the patent covers a method that starts by using smaller folds for initial training. So instead of training on k minus one of the dataset (90% in our example), you train your model on just 10%. If that model is generalizing (i.e. it’s performing successfully), you can graduate it to learning on the next 10%, then the next, and so on. What’s important is that the technique is incredibly parallelizable, meaning you can train many more models (and, importantly, with potentially more diversity) from the same original set of data. Instead of using 90% of your data to train a model, you’re using a smaller sliver and judging the success of the model based on that. Suitable candidates graduate up to be validated on the remaining 90% of the folds. But while doing this, you’re again evolving more and more models on the same data set, just using different folds as your starting point. So while you’re looking for graduation candidates from the first fold, you’re also, in parallel, looking for other fit models from the second fold, the third, the fourth, and so on. The training and validation happen simultaneously in a massively distributed manner.
One key here is that the training sets do not overlap as they would in traditional k-fold cross-validation methods. This allows us to assign specific folds to specific training nodes, and send the trained models or solution candidates to different compute nodes depending on whether they are being trained, or they have graduated and are being validated now. The fact that the folds are mutually exclusive can potentially promote diversity, but it can also lead to overfitting. This risk is mitigated, however, by the fact that we are validating generalization on a much larger portion of the data-set.
Oh, and also: a reminder. If you’re in the AI field and want to work with some smart folks to apply massively distributed AI to real-world problems, we know the place. Actually, we are the place. Let us know if you’re interested. And we’ve got another patent award to announce soon, so do stay tuned.