By Dr Jonathan Richard Schwarz; Dr Ahmed Izzidien; Dr Poorna Mysoor; Professor Felix Steffek; and Professor Jodi Gardner
At Safe Sign, we believe that the unprecedented explosion in the capabilities of Generative AI systems has, for the first time in humanity's history, shown us the potential to transform the rule of law from a governing philosophy to a practical principle protecting everyone, at all times, everywhere. Our mission is to make the world's best legal advice available at the click of a button - safeguarding all online transactions, the signing of every lease or mortgage, the drafting of each contract in business, the application for every visa and the comparison of each flight or broadband deal.
As it stands, this promise only holds in theory: Expert legal advice is attainable only by a small minority at high expense, leaving billions of businesses and consumers poorly advised or entirely unaware of their rights. This is bad for all of us, as such intransparency makes it easy for producers or large businesses to hide unattractive terms & conditions, disabling market forces from incentivising healthy competition for the most beneficial agreements.
Instead, we believe that the labour and knowledge-intensive review and drafting of legal documents can be largely automated by building world-leading legal AI systems. As such, we aim to develop the world's best Legal AI foundation model, rooted in the most accurate and helpful analysis, while outperforming competitors through a safety-first approach, building product confidence that will make our approach succeed. Our legal foundation model can then be specialised to a wide range of legal tasks, bringing these benefits to all aspects of life. Safe Sign draws its unique strength from a synergy between the world's leading legal experts and AI researchers and innovators, ensuring that our systems hold up to the highest degrees of scrutiny while always developing one step ahead of the AI curve.
Our mission encapsulates the promise of trustworthiness, robustness, explainability and transparency people expect from their lawyers. This is a uniquely challenging requirement for current AI systems. Models such as the ones developed by OpenAI, Mistral or Meta are all known to suffer from hallucinations, overconfidence, adversarial exploitability and have a reputation for being black-box predictors with unexplainable predictions.
Instead of merely thinking of these problems as an afterthought, we will make their mitigation a guiding principle at every stage of the development of our Legal AI model. As such, we develop our AI system using cutting-edge and nascent research paradigms enabled by our strong academic connections. Safety is enabled through a combination of technical approaches, specifically the targeted collection of annotated data based on Active Learning, the iterative improvement of model predictions through Multi-Agent Dialogue Systems and Prompt Engineering, and uncertainty estimation through conformal prediction. In addition, any identified incorrect predictions are fixed through fine-grained Model Editing. We will furthermore specialise the model to a particular jurisdiction through Retrieval Augmentation (RAG), which also improves the explainability and factual correctness of predictions (by forcing the model to cite from approved legislative documents). Finally, each iteration of our model will undergo rigorous adversarial testing against complex edge cases and uncommon inputs, enabled by the expertise of our legal team. This testing will also be routinely performed on competing methods (e.g. ChatGPT) to ensure our competitive advantage.
The quickly developing AI landscape and leadership turmoil of recent weeks at leading companies such as OpenAI and Microsoft makes it evident that a strategy purely relying on the use of models supplied by third parties is unsustainable. Companies fully exposed to drastic strategic shifts in pricing, open source or API availability can easily find themselves in an untenable position, fundamentally shaking their business models. This makes the multitude of recently emerging "Chat-GPT wrapper companies" both easily replaceable and uniquely exposed to market risks. Instead, we are strong proponents of strategic independence, aiming to fully develop our own foundation model for the legal setting through training on the highest quality data relevant to our domain.
This gives us a strategic advantage: Instead of claiming to build a general-purpose AI which necessarily involves compromises between the performance of various domains, we build upon models with common knowledge trained on the web and then specialise them exclusively to our domain of interest. This makes it significantly easier to build the best finetuning tools for our domain of interest and allows us to make use of legal domain insight to reduce the amount of necessary training data. Finally, this also gives a big advantage of safety: Instead of having to foresee the myriad of errors a general model can make, we can specifically and more thoroughly test our system in the legal domain.
Massive international legal regulation of AI is on its way. AI start-ups, as well as large industry players, are not prepared for this. Our team includes lawyers who are advising governments and parliaments on the drafting of AI regulations. The legality and ownership of data used in generative AI systems is coming under increasing scrutiny, and class action suits face much of the generative AI industry in the coming 24 months. Safe Sign understands these legal concerns and believes in leading the way in ethical dataset creation and developing industry best practice in this area that reduces the risk of AI litigation and sanctions. This also leads to higher data quality and better explainability, leading to overall better performance and safer products.
Legal foundation models for protection in all aspects of life
At Safe Sign, we believe that the world's best Legal Foundation Model will enable the development of legal products for a wide variety of settings, allowing us to create a wide network of products and/or partnerships with either businesses or by directly providing services to consumers. This allows us to fund the development of an ever-improving foundation model directly through an immediate path to commercialisation.
Our estimate, immediately realisable products include:
Connecting the brightest minds in Law & AI
The Safe Sign founding team brings together a unique team consisting of the best minds in Law, AI and the business world. We are tenured professors, leading scholars, award-winning lawyers, mathematicians, coders and influential business leaders. Our talent is associated with the best global institutions involving current or former affiliations with the University of Cambridge, DeepMind, MIT, Harvard University, UCL, Lenovo & Sony.
Alexander (Sami) Kardos-Nyheim - Founder & CEO University of Cambridge (Law); Allen & Overy Founder of CJLPA (international legal research organisation), UDRO (international law reform body)
Dr Jonathan Richard Schwarz - Co-Founder & Chief Scientist Research Fellow at Harvard University Former Senior Research Scientist at Google DeepMind (7 years of experience) PhD from prestigious joint UCL / DeepMind programme (Advisor: Prof Yee Whye Teh, Oxford) > 3600 academic citations, >20 peer-reviewed publications in the world's leading AI venues (NeurIPS, ICML, ICLR, JMLR).
Terence Nguyen - Co-Founder & COO Global Head of Software and Managed Services at Lenovo Twenty years of experience in Cross Platform Ecosystem partnerships across leading Consumer Electronics, Information Technology and Mobile companies Leader and Innovator in Device + Service Ecosystem and Partnerships
Professor Jodi Gardner, Co-Founder & Chief Legal Officer Brian Coote Chair in Private Law at the University of Auckland Fellow of St John's College, University of Cambridge Senior Adjunct Research Fellow at the Centre for Banking & Finance Law, National University of Singapore Mediator in consumer law matters
Professor Felix Steffek, Co-Founder & Lead Adviser Professor, University of Cambridge Co-Director, Centre for Corporate and Commercial Law, University of Cambridge Ex-Senior Research Fellow (Max Planck Institute for Comparative and International Private Law) Member of Cambridge Law Faculty Legal Tech Group
Dr Ahmed Izzidien - Co-Founder & Head of Legal AI Senior Research Associate, University of Cambridge Regarded as global expert in Legal AI Degrees from King's College London, UCLA, University of Cambridge, University of Manchester, Cardiff University Former Research positions at University of Cambridge, Harvard University
Dr Poorna Mysoor - Co-Founder & IP Adviser Fellow, Lucy Cavendish College, University of Cambridge Regarded as global expert in regulation of AI and data
Jonathan Barton - Co-Founder & CRO Vice President International, Mobile gaming, PC gaming, at leading gaming companies Corporate Strategy in Satellite and Telecoms​
Elliot Wright - Co-Founder & Chief Legal Innovation Officer University of Cambridge (Law), UCL (Law)
We already have a shortlisted group of future employees, which will all be engineers and researchers drawn from leading institutions with several years of experience in AI. As the hunt for global AI talent continues at an unprecedented pace, the value of our networks in hiring the best technical talent will be a crucial contributor to our success.
Much of the progress of Generative AI (especially once pre-training on publicly available data is exhausted) is due to the availability of proprietary high-quality data, with the value of such data only increasing with the proliferation of ideas such as RLHF. The wide network and unique composition of our founding team allow us to obtain access to proprietary data sources and expert annotation, significantly improving upon the quality of common data collection methods based on low-pay manual annotation used by companies such as OpenAI.
We also aim to fundamentally challenge the existing paradigm of LLM Foundation Model training by randomly choosing a small subset from a huge dataset at each training step. Instead, we will use methods that estimate the expected improvement in performance for each example data point, a strategy known as curriculum learning, prioritising the most promising data at each point in training, a strategy already showing huge promise of better results and significant savings in training costs [1, 2].
Safe Sign founding members are world experts in specialised AI technologies such as Meta-Learning (optimal learning from a small number of examples) and Hohfeldian Legal Theory (high-quality data annotation targeted to extract essential rights and duties).
In our existing work, we have already shown how we can train high-quality predictive ranking models for iwantbetter.com (surpassing expert human performance) using only a couple of hundred data points by a combination of annotation instructions based on legal theory in combination with our expertise in efficient AI Finetuning. Comparable systems typically require tens of thousands of examples, showing an efficiency improvement of 10-100x.
First-year
The development of our legal foundation model precedes three stages within the first year, subsequently building the best legal foundation model in the world. First, by Q1 2024, will have implemented and improved on the best-known automated prompting strategies, significantly accelerating this otherwise labour-intensive process and automatically detecting the best prompts. This will result in Safe Sign v1, a state-of-the-art legal model evaluated on publicly available academic benchmarks.
Secondly, by Q2 2024, Safe Sign v2 will have exceeded this performance by making use of any additional publicly available legal data on the web in combination with our existing proprietary legal documents. We will also incorporate a thorough error and weakness analysis of Safe Sign v1 using our Curriculum Learning algorithms to counter known issues by automatically excluding low-quality or toxic data sources. Finally, we will employ state-of-the-art sparsity and quantisation schemes, ensuring our foundation models can be easily run on consumer-grade laptops or through calls to our servers from mobile devices.
Finally, by Q4 2024, Safe Sign v3 will be released as the world's most accurate, safe and explainable LegalAI Foundation model, featuring:
Our fundamental goal of strategic independence from competing players will fundamentally require our own web-scale LLM pre-training before the above-outlined techniques are employed in the model. The purpose of our work in the first year is thus to establish ourselves as a current world leader in Legal AI (at a stage of semi-independence) while setting the foundation of full independence through significantly more efficient, independent training run through our exploration of curriculum learning.
Our north star for Safe Sign (full independence, maximum safety, highest accuracy) will thus be achieved in the following year, provided an extra funding round of significantly higher magnitude, ensuring significant edge and moat over any future versions of GPT-X or competing legal tech.