• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
Conventus Law

Conventus Law

Conventus Law

  • About Us
  • Channels
    • Jurisdiction Channel
    • Practice Area Channel
    • Industry Channel
    • Business Of Law
    • Law Firms
    • Special Reports
  • Video
  • Events
  • Explore
  • Search
  • Membership
  • Conventus Doc
x
Search

More results...

Generic filters
Home » Legal Resources » Demystifying AI: Self-Supervised Learning.

Demystifying AI: Self-Supervised Learning.

January 13, 2022

Let’s start by thinking about how we learn new skills. Suppose you’ve downloaded and installed a new piece of software, and you’re loading it up for the first time. When you enter the interface, everything is new. As you explore the menus, options, and functions, you start to realize just how complicated this program really is, and you feel lost and confused.

Thankfully, the home page guides you to a short tutorial. There, you learn the basic rules for controlling the software. You learn how to navigate through it, how to make simple changes, and other preliminaries. Half an hour later, you’re back at the home page, and the program looks a little different than it did before. Sure, there’s still plenty that you don’t know or understand, but you’re feeling confident that you can figure it out.

Armed with foundational knowledge, you start clicking around. Your skills gradually grow, building on top of each other. You keep practicing, you make some mistakes, and soon you’ve come to master it.

Building an Algorithm: The Basics of Self-Supervised Learning

The process we’ve described is a lot like what AI engineers are trying to mimic when we build a class of algorithms called self-supervised learning. Essentially, we want to teach a computer to teach itself. That means we need to provide a launchpad, which we call pretext tasks, and then we let it loose; from there, it uses a built-in mechanism to collect clean data, learn the underlying concepts, and begin generalizing. After that, it snowballs through the downstream tasks and progressively becomes better and better at its assigned job. In this way, it goes from learning to deal with the least noisy data until it eventually can handle the messiest data that we can throw at it.

In his keynote address at AAAI 2020, Turing award winner and leading researcher Yan LeCun described self-supervised learning as “the idea of learning to represent the world before learning a task.”

This points to the reason why so many researchers are now interested in self-supervised learning. Not only is it an intuitive model for learning, but it also demonstrates this same progression within the AI field itself. It’s how we’re building on what we’ve learned so far about AI. That’s because self-supervised learning tries to synthesize supervised and unsupervised learning to compensate for their respective shortcomings and to create a better, more robust solution.

Self-supervised learning seeks to combine the best elements from both worlds to create a better solution. It only requires unlabeled data, but it can do much more than unsupervised learning because we start it off with a tutorial and the ability to teach itself.

The Process: Self-Supervised Learning from Start to End

Now let’s break down the self-supervised learning process. It starts with two key components: a sizable amount of unstructured data and an algorithm.

We start by encoding two sets of rules onto the algorithm. The first is for training the algorithm during the pretext task, where it will gain a foundation for moving forward. These rules let the machine find “clean data” within the sea of unorganized data. Let’s say, for instance, that we want to do some natural language processing (NLP) and teach our computer how to read emails. We’ll tell it to find messages that are nicely formatted, follow certain patterns, and are generally easier to parse.

The machine then treats this data subset as labeled data, and it learns from it in the same way that a supervised learning algorithm would. We’re left with a kernel of intelligence that’s ready to move into uncharted territory.

Before we send our computer into the wilds, however, our second ruleset takes effect. This adds noise to the data. It mixes things up, changes the data, and adds an element of chaos.

We do this to prevent overfitting, to stop the machine from “thinking too narrowly” or “memorizing” the data. Let’s take a moment to explore this concept.

Another great analogy is studying for a math test. If you just memorized the answers from the back of the book, you’d fail on test day because you only knew the answers to specific problems instead of how to solve these problems for yourself. If, during your studies, you changed numbers, rearranged the syntax, or introduced new elements, you’d be forced to work through the problems on your own. By adding noise, you’ll learn to generalize your problem-solving approach. That’s what it means to prevent overfitting.

So, now that we’ve trained our machine on the clean data and the noise-added data, we’re ready to attack the more difficult data. The machine has learned to generalize and has learned the underlying concepts, all without any human labeled data. The self-supervised model is ready! The machine then snowballs through the data, parsing emails without any supervision. This is what ultimately gives results.

What kind of results? Let’s go through a real example of how Text IQ uses self-supervised learning to create genuine insights.

Example: Adding a Layer of Intelligence to Email Data

We begin with one of Text IQ’s advanced self-learning algorithms and a hard drive containing 5,000,000 emails. We want to search through these correspondences for potentially privileged, sensitive, or private data, but since there’s no structure, it’s impossible to find the information that we’re looking for without going through it by hand.

It’s time to set our algorithm to work. We’ll encode rules for the computer to find clean email chains, and then we’ll set parameters for noise-inducing randomization. For the first ruleset, we can include features like including full names and addresses in the “To:” and “From:” lines, as well as consistent formatting for the “Date:” line, and other similar features.

The machine then searches through all the data and pulls out the few emails that fit all our criteria. This becomes the tutorial, the clean data set from which our computer can learn the ropes and build that essential ground level.

From there, we add noise by mixing up the order of different fields, removing names, adding random names, and other chaotic elements that make the formatting difficult to parse. Once it’s done training on the noisy data, it dives into the ocean of emails at its disposal. When it’s done, we’re left with a layer of intelligence that’s derived from the data’s underlying structure.

This enables us to make connections between people, documents, and concepts, find sensitive data, and complete our mission.

Conclusion: The Next Step in Artificial Intelligence

Self-supervised learning is an exciting advancement that’s just getting started. By overcoming the downfalls of both supervised and unsupervised learning, we’re opening up opportunities for greater levels of intelligence than ever before.

The use cases speak for themselves. Besides the example that we went through above, we’re seeing great strides in using self-supervised learning for NLP, as in the case of ULMFiT, a natural language process training approach that’s dramatically altering the landscape in a field that we see in household devices like Apple’s Siri and Amazon’s Alexa.

Researchers are also using self-supervised learning to improve computer vision programs.

For instance, we’re making progress on robotic surgical platforms and teaching self-driving cars how to detect and respond to rough terrain. These problems are hard to solve with supervised learning due to the difficulty of generating enough high-quality labeled data, while these applications go well beyond the scope of what’s possible with unsupervised learning.

That’s what makes self-supervised learning such an ideal candidate.

Self-supervised learning is a paradigm shift in AI technology, though it actually began with a 1989 paper by Jürgen Schmidhuber titled Making the World Differentiable. There he “described how the algorithm can be augmented by dynamic curiosity and boredom. This can be done by introducing (delayed) reinforcement for controller actions that increase the model network’s knowledge about the world. This in turn requires the model network to model its own ignorance, thus showing a rudimentary form of introspective behavior.”

This takeaway gets at what’s so profound about self-supervised learning. Curiosity and boredom are two forces that drive our own learning because they point to something outside of us, something that we don’t yet have or know, yet something that we can work toward.

Knowing our own ignorance compels us to learn more and to grow, but opening our eyes to that Socratic wisdom remains a challenging prerequisite.

Ultimately, that’s what self-supervised learning is trying to do. We want machines to hunger for knowledge so that they can learn on their own.

Written By Apoorv Agarwal

As chief executive officer and founder of Text IQ, Apoorv Agarwal drives the strategy, product vision, and technology innovation at one of the fastest-growing technology startups in the US. With a natural curiosity towards finding ways to use AI to solve organizational and business problems, Apoorv has dedicated a significant part of his early business and academic career to research and development. Today, he works closely with chief executives of highly regulated Fortune 500 companies, helping them implement AI as a tool for risk mitigation and power digital transformation.

Primary Sidebar

PRESS RELEASES

  • Shortlisted In Four Categories At The ALB Japan Law Awards 2025. 30 May 2025
  • Withers Announces New Chairperson Paul Hewitt. 30 May 2025
  • Indonesia – SSEK Law Firm Partner Joins AmCham Dialogue With Danantara CIO. 30 May 2025
  • Linklaters Advises Dubai Holding On The Landmark REIT IPO Of Dubai Residential REIT. 29 May 2025
  • Linklaters Advises Africa50 On Negotiations For EDF’s Entry Into The Company Dedicated To The Volobe Hydroelectric Dam In Madagascar. 29 May 2025

NEWS FEED

    May 30, 2025

    India – Decoding Patent Infringement: Essential Elements, Equivalents, And Estoppel In Crystal Crop Protection V. Safex Chemicals.

    - Swati Sharma - Cyril Amarchand Mangaldas,
    May 30, 2025

    India – Peppa Pig Has A New Sister! What Should Mummy & Daddy Pig Do About Their Estate Plan?

    - Rishabh Shroff - Cyril Amarchand Mangaldas,
    May 30, 2025

    India – FIG Paper (No. 45 – Series 3) – SEBI Mulls Relaxation Of FPI Norms For Investment In Government Bonds.

    - Anu Tiwari - Cyril Amarchand Mangaldas,
    May 30, 2025

    Consultation Following The Supreme Court’s Decision On The Meaning Of ‘Sex’ In The UK Equality Act.

    May 30, 2025

    SyCipLaw Contributes Philippine Chapter To Chambers Debt Finance 2025 Global Practice Guide.

    May 30, 2025

    Malaysia – MyCC Warns Against Price-Fixing In The Private Healthcare Sector.

    - Anand Raj - Shearn Delamore & Co,
    May 29, 2025

    India – Public Interest Rules Against Injunction Again.

    - DPS Parmar - Lex Orbis,
    May 29, 2025

    Indian Supreme Court Rules On Judicial Powers To Modify Arbitral Awards.

    May 29, 2025

    Netherland – Amending Act Restricted Access Dutch UBO Registers Adopted.

    May 29, 2025

    The Practical Q&A Guide To Cutting-Edge Intellectual Property Issues In China.

Footer

Conventus Law
  • Facebook
  • Twitter
  • Linkedin

CONVENTUS LAW

  • About Us
  • Explore
  • Video
  • Events
  • Contact Us
  • Jurisdiction Channel
  • Practice Area Channel
  • Industry Channel
  • Law Firms
  • Business Of Law
  • Special Reports

OTHERS

CONVENTUS DOCS
CONVENTUS PEOPLE

3/f, 13/F, Two Harbourfront, 22 Tak Fung Street, Hunghom, Kowloon, Hong Kong

social@conventuslaw.com

Terms of use | Privacy statement © 2025 Conventus Law. All Rights Reserved.