What is machine learning?

Name: What is machine learning?
Brand: OVHcloud
Rating: 4.8 (476 reviews)

We generate more and more information every day, thanks to the multiplicity of technologies we use (smartphones, computers, tablets, connected devices, etc.). All of these devices generate a massive amount of data. An average person generates 1.7 MB of data per second in 2020. Big data is a huge source of information, stored in digital databases. But without adequate processing and an effective strategy, this mass would be a collection of problematic bytes to pile up. This is where machine learning comes in and makes the most of it.

La définition du Machine Learning – OVHcloud

What is machine learning?

The first machine learning algorithms were developed in 1950. Machine learning, or machine learning, is both a technology and a science (Data Science) that allows a computer to perform a learning process without having been programmed for that purpose. This technique — which is linked to artificial intelligence (AI) – is designed to highlight patterns of statistical repetition, and derive statistical predictions from them. Data mining, which is the extraction of information from a large amount of data, serves as the raw material for machine learning to highlight patterns for statistical prediction. This is why big data (all of the data generated and stored) is an integral part of machine learning. The larger the set that reveals trends, the more accurate the predictions.

More specifically, the learning algorithm applied enables the computer to refine its analysis and responses, based on empirical data from the associated database. Machine learning is a great learning model for businesses, because it allows them to harness the power of the data generated by their customers or activity. Artificial intelligence is thus a major challenge if they are to succeed.

There are several types of learning that are classified according to existing data during the learning phase. If the response to the defined task is already known, the data is referred to as ‘labelled’. This is what is known as supervised learning. Depending on whether the data is discrete or continuous, classification or regression is used. If the learning takes place step by step, with a reward system in place for each task performed correctly, then it is known as reinforcement learning. The most common type of learning is unsupervised learning, which involves searching without labels. It aims to predict a result, without using known answers beforehand.

Two approaches to machine learning

Supervised machine learning

Supervised machine learning is a type of machine learning where a model is trained on a labeled data set. This means that each example in the data set has an input (or characteristic) and a corresponding output (or label). The goal is to learn a function that, from input characteristics, correctly predicts output labels for new data.

The basic process of supervised machine learning is as follows.

Data Collection: Gather a data set with labeled samples.
Data division: separate data into training and test sets.
Training: Use the training assembly to learn a model that connects input characteristics to output labels.
Validation and testing: Evaluate the model performance on the test set to verify its accuracy and generalizability.

Supervised machine learning is used in several types of activity: for classification (e.g., determining a category like spam) or predicting a numerical value (e.g., estimating the price of a house based on its characteristics).

Supervised learning is used in many practical applications, such as speech recognition, fraud detection and referral systems.

Machine learning not supervised

Unsupervised machine learning is a type of machine learning where a model is trained on unlabeled data. Unlike supervised learning, there is no predefined exit. The goal is to find hidden structures or patterns in the data.

Main types of unsupervised learning:

Clustering: dividing data into similarity-based groups or clusters (e.g. grouping customers with similar buying behaviors);
Dimensionality reduction: Simplify data by reducing the number of characteristics while retaining most of the information (for example, the principal components method or PCA).

Common examples of unsupervised machine learning use:

Customer segmentation: identify groups of customers with similar behaviors or characteristics;
anomaly detection: identify unusual data that does not follow general behavior (e.g. detect fraudulent transactions).

Unsupervised learning is useful for exploring data and discovering patterns or relationships without the need for prior knowledge of expected labels or outcomes.

What is machine learning used for?

The power and advantage of machine learning lies in its ability to process a huge volume of data that is impossible for the human brain to process. Industries that gather a high volume of data need a solution for processing it, and extracting information that can be used for decision-making. Predictive analysis of this data enables the computer to anticipate specific situations. This is what machine learning is all about. Let us consider the financial services sector, for example. Machine learning is used to detect fraud, illegal conduct and other elements that are key for financial institutions to work properly.

The growing volume of transactional data we generate is also used by companies to target their customers based on their purchasing behaviour, by identifying repetitions. The websites and pages we visit also generate data that can be used by machine learning to set our preferences. It is therefore clear that this data processing technique, without the need for human intervention, is a major asset for companies wishing to take advantage of the mass of information available to them. It is unlikely that a human being would be able to make use of this data themselves, because the volume of data to process is just so high. Take the large companies owned by GAFAM, for example: the implementation of AI and machine learning in their processes has become a necessity, due to the large usable data stream that they generate.

With data being generated in ever-increasing volumes, a growing number of companies will also need to integrate this technology into their structure in order to make use of the information available to them. Connected devices, for example, are becoming increasingly present in our daily lives. By 2019, more than 8 billion connected objects had entered our society, allowing us to collect more data on our rhythm of life, our consumption, our habits, based on our voice recognition. All of this represents a huge mass of critical data for companies, and machine learning helps us identify the elements that are relevant and useful. Without a doubt, there is a lot at stake here. Big data plays a vital part in the development of many technologies for modern society — like facial recognition, self-driving cars, robotics, and smart home technology, for example. But to create this technology, companies must learn how to implement this asset in a suitable way. This technology isn’t just for AI-savvy development teams. Many companies are embarking on the adventure of machine learning by choosing turn-key solutions that are adapted to fit their objectives.

How machine learning works

Machine learning works based on “experience”. The computer retrieves a high volume of data, and uses it to analyse and predict situations. The goal of the process is for the machine to independently create an “internal plan”, which it can use to identify the key elements that the user wants to target. It will need to experiment with different examples and tests in order to progress. This is why we talk about learning.
To train itself and learn, the computer needs learning data. Data mining is the basis for how machine learning works, and the data used is called a training data set. The computer also needs analytical software and algorithms, as well as a deployment environment — usually a server that is adapted to meet the user’s computing needs. There are different types of learning, which can vary depending on whether or not you know the response you are looking for, the type of data being analyzed, the data environment under consideration, and the type of analytical action being taken (statistics, comparisons, image recognition, etc.). The learning algorithms differ depending on the task at hand, and the computing power they require will also be affected.

Machine learning usually involves two steps. The first is the development of the model from the set of test data, also known as observation data. This step involves defining the task that the user wants to process (detecting the presence of an element in a photo, detecting a statistical recurrence, responding to a sensor’s signal, etc.). This is the testing or "training" phase. The second stage involves putting the model into production. It can be optimised with new data. Some systems may continue learning during the production phase — but the user needs to ensure that they get feedback on the results produced, so that they can optimise the model and manage the machine. Others can continue their learning alone, and develop independently.

The quality of the learning is dependent on several factors:

The number of relevant examples that the computer can consider. The more data, the more accurate the results

The number of characteristics describing the examples. The simpler and more precise they are (size, weight, quantity, speed, etc.), the quicker and more accurate the analysis will be.

The quality of the database used. If too much data is missing, this will affect the process. False or exaggerated data can also distort results.

The prediction algorithm will be more accurate, and the analysis will be more relevant if these elements are taken into account. Once the machine learning project is defined and the databases are ready, you can start the machine learning process.

Make your machine learning project a success with OVHcloud:

We have always been committed to bringing technology to all business sectors. We believe that with the potential AI represents, it should not be reserved solely for IT giants or major companies. We want to help you and support you as much as possible in launching ambitious AI and machine learning projects. Artificial intelligence boosts efficiency for businesses, and facilitates decision-making. OVHcloud offers tools to help you meet business challenges, such as predictive analysis of datasets, and make it easy to use for all user profiles. We support our customers in developing their artificial intelligence system.

With OVHcloud, you can collect and prepare your data using our Data Analytics solutions. You can model your machine learning project step by step, and deploy your model in just a few clicks. You can choose from a range of tools and frameworks, such as TensorFlow, PMML or ONNX.

OVHcloud solutions offer a number of advantages when it comes to developing your machine learning project:

Confidentiality for your data

We are committed to keeping your personal data confidential. Data sovereignty is a vital aspect of our company philosophy, so you can recover your data whenever you need to.

Computing power

By automating deployments and our infrastructures, we can offer you unrivalled computing power at competitive prices.

OPEN SOURCE

In the world of data, open-source solutions are now the most mature and high-performance products on the market. OVHcloud values the importance of basing its solutions on open-source software, like the Apache Hadoop and Apache Spark suites.

Explore our range of Public Cloud products

AI & machine learning

Artificial intelligence (AI) is often seen as an aspect of data science reserved only for those who are experienced in the field. At OVHcloud, we believe in the outstanding potential of this practice in all business sectors. And we believe that its complexity should not stand in the way of the use of big data and machine learning.

Explore our AI and ML solutions

GPU

GPU instances integrate NVIDIA graphic processors to meet the requirements of massively parallel processing. Since they are integrated into the OVHcloud solution, you get the advantages of on-demand resources and hourly billing.

Explore our Cloud GPU instances

AI Training

Launch your AI training tasks in the cloud, without having to worry about how the infrastructure works. AI Training enables data scientists to focus on their core business, without having to worry about orchestrating computing resources.

Explore our AI Training solution