Active Learning Review Case Study
Active learning is the AI within RelativityOne’s analytics suite.
We have recently completed a large-scale review for an investigation. Time and cost were, as they always are, huge issues, so this was a fantastic project to show off the capabilities and spectacular accuracy of active learning.
The case initially had over a million and a half documents. After filtering, keyword,
and concept searching, about 600,000 documents were left to review.
When you start a case, visualize that you have a huge mountain of documents to look at.
With Active Learning running in the background, the system watches and analyses the decisions that you make, right from the very first document that is reviewed and tagged relevant or not relevant.
It says, “Hey! Let me go find some documents that look like this!”
So now, instead of having your one big mountain, you have two smaller mountains one relevant and one not relevant.
The system is going to serve up the documents that it considers relevant first, based upon the decisions that you’ve been making and a percentile chance of the document being relevant; this is called a prioritised review.
All the while Active Learning is still working in the background analyzing your
decisions and improving their own accuracy.
In a typical review, the two document mountains will be fairly close together but as the review goes on, you want them to move further and further apart and for the valley between the documents to become more and more sparse of documents, as these are the documents that the system is not sure about.
As your mountains separate so does the risk of relevant
or hot documents remaining unreviewed and lost within the not-relevant documents.
You carry on with prioritised review until the flow of relevant documents that the system is serving you dries up. Then you would move to a coverage review.
A coverage review is a review of those documents in the ‘valley’ which the system is unsure about. The system analyses the decisions you make on these documents and subsequently learns and improves its own accuracy, further separating those mountains and reducing the documents between them.
Now is probably a good time to run what we call an elusion test. What’s an elusion test?
This is where we test the system to see if that the document marked not relevant and is therefore being discarded, might actually contain relevant documents.
The system starts serving the reviewer’s documents that it feels are not relevant. If we find a relevant document, the test is considered a fail and we go back to a prioritised review.
During elusion testing, using the magic of statistical analysis, we measure how well the system is performing, and ask ourselves, “If we stop reviewing now, how many documents could we potentially miss?” and of course this becomes a matter of proportionality.
You would continue until you feel that the chance of finding any further relevant document would be disproportionate to the resources and cost required to find them.
In the review that we mentioned earlier, we initially had over half a million documents. Following filtering and searches, however, less than 20 percent about 90,000 of those documents were actually reviewed as our elusion tests were showing just a 0.59 percent chance that we might find anything else relevant.
This showed a 99.44% accuracy.