Artificial Intelligence in eDiscovery: what is it and why do we need it?
Artificial intelligence, or AI, is any task that is performed by a machine, that if performed by a human would require intelligence. In simple terms, it is intelligence inspired by humans and powered by machines, and it plays a part in almost every aspect of 21st century life.
A Little History:
Artificial intelligence first began to be postulated in the early 20th century, with the personification of machines in the form of robots, etc. becoming prevalent in science fiction of the era. Consequently, it wasn’t long until the possibility was entertained that AI could become science fact. Alan Turing was one of the first pioneers on this front, exploring the mathematical possibilities regarding AI in his 1950 paper ‘Computing Machines and Intelligence’ which focused on the building of such intelligent machines. However, it wasn’t until 1956 that the (widely regarded) first artificially intelligent program was created, sparking a real interest in AI and proving that artificial intelligence was indeed a possibility. Since 1957 AI, though having ups and downs, has progressed on a steady upwards trajectory, going from strength to strength and ultimately giving us the AI that we know and love today, Alexa and Siri being two key examples.
For more info on the history of AI click here.
Within artificial intelligence there is machine learning, this is the use of AI to give computers the ability to learn and progressively improve performance on specific tasks without the need for specific programming. Machine learning can be subdivided into three main categories, supervised learning, unsupervised learning, and reinforcement learning.
Supervised learning – The computer is taught using a training set that provides inputs and the desired outcomes from which the computer learns a general rule which it then applies to further data.
Unsupervised learning – The computer is not fed any training set, rather it learns from decisions made in the moment, or finds structure in the input data by itself. This is often the purpose of unsupervised learning, to uncover hidden patterns within data.
Reinforcement Learning – The computer interacts with a dynamic environment in which it performs tasks and receives ‘rewards’ for completing them. The program aims to maximise these rewards.
Within supervised learning we then find active learning…
Active Learning – A special case of machine learning in which the learning algorithm can interact with an existing data source to correctly label new data points with desired outcomes.
Active Learning in eDiscovery:
Artificial intelligence has found its way into almost every industry in the world, with the legal industry being no different. Over the past 20 years, we have seen AI and machine learning algorithms replace many of the more mundane day-to-day tasks and processes within the legal industry. Areas in which AI is commonly used now extend through all levels of the law, from aiding judges in making prosecution decisions, to helping businesses working within the confines of the law manage GDPR and compliance issues.
Click the image for more information on AI and its uses within the law.
The most prominent area in which AI is used, however, is in eDiscovery in Technology-Assisted Review. What is Technology-Assisted Review? It is the process of using technology to support your reviewers through the document review process. Over time there have been several iterations of TAR as active learning has advanced, now when most people say TAR, they are referring to continuous active learning, but this has not always been the case.
Predictive coding, now commonly referred to as TAR 1.0, is the predecessor of Active Learning. It is the process of training an algorithm using a training set of data from which the algorithm can learn your decision-making process and replicate it upon an unseen data source. The algorithm is given a base set of criteria that relevant documents meet and proceeds to filter out documents from your ‘review’ pile that don’t meet this criterion, thus it is used to greatly reduce the number of documents your reviewers must review. In most cases, active learning is now used in place of predictive coding as it is more convenient and provides more relevant information. The chief difference between predictive coding and active learning is that, where predictive coding simply identifies the relevant and not relevant documents, active learning identifies these documents and presents them to you in a prioritised view, so you get the most relevant documents first.
See our in-depth example of how active learning can be used within RelativityOne with this case study!
Active Learning in RelativityOne:
Continuous Active Learning or CAL (active learning that continually updates itself based on reviewer choices), is used within the RelativityOne software to profound effect, drastically reducing the time and money spent on review for cases that suit the capabilities of the algorithm. However, active learning is not infallible and there are certain criteria a case must meet in order to use active learning and have it be successful.
- Your case should have a large number of documents of a similar type – this allows for the algorithm to be of the most use to you, as small numbers of sporadic documents are much easier to code by a reviewer.
- Your case should not have too many image files – as the algorithm is searching for keywords and phrases that the reviewers have deemed relevant, image files, which contain no readable plain text, cannot be interpreted.
- Your case documents should all be in the same language – if your case is an international case with multiple languages, it is best to segment the data into those languages and review them separately. The algorithms can only review in one language at a time, so by combining languages you may receive false positives, or miss crucial data.
In cases where you have either few documents, or many documents of diverse type/language it is best to conduct a human review as this way you can ensure the most accuracy. If, on the other hand, you have a case which requires the review of many documents that have similar formatting and are in the same language, active learning is perfect for you!
Click here to listen to a podcast about one of our successful CAL cases –
The Accuracy of Active Learning:
As mentioned above, active learning is not fool-proof and errors can sometimes occur! The algorithm is only as good as the material it is learning from, so if reviewers code the algorithm with a 10% sample set that has lots of similar data, the algorithm will work well in identifying data with commonalities among the rest of the review pile. This, unfortunately, means that if relevant data of a different type, or under a different keyword is found within the review pile that wasn’t part of the training sample, the algorithm will not know to code it as relevant. This is where the ability to check the accuracy of your algorithm becomes extremely useful, and where the Elusion Test comes in. The Elusion Test is a method of checking the accuracy of your active learning algorithm, it takes a random sample of the documents that the algorithm has coded ‘not relevant’ and provides them to your reviewers. Your reviewers then review these documents for relevance and any that are determined relevant are seen as ‘missed’ by the algorithm. At the end of the test, the number of missed documents from the sample set is then extrapolated back to give an estimate for the number of missed documents in the whole discard pile. From this, your team can then decide if it is worth reviewing the rest of the documents to find these relevant ones, or whether it is okay to end the review here. This is a great measure put in place, not only to ensure that the algorithms maintain the accepted level of accuracy but also as a reliable way of determining the endpoint of a review. If from the test you can see that there are not many relevant documents left in a large number of discarded documents, you can save a significant amount of money by deciding to terminate the review at this point. Something you would not have the luxury of in a purely human review.
Want to learn more about the Elusion test and how it works?
Click on the image above.
Some Final Comments:
Active learning and AI have massively changed the landscape for legal technology and the industry as a whole, this is true, but it is important to remember that without the human input to oversee, correct, and provide the basis from which these algorithms learn, they would be worse than useless. Many of us in the industry do worry about the implications that all this advancing technology has on our jobs and livelihoods, but I am here to reassure you that we are a long way off ever being replaceable. The current cognitive limit of AI is that of a two-year-old child. Until toddlers can be hired to do our jobs, I would like to think we are safe!
Or to talk to one of our friendly and knowledgeable staff, click here.