Data Processing OVHcloud

Quick, simple data analysis with Apache Spark

When you want to process your business data, you have a certain volume of data in one place, and a query in another, in the form of a few lines of code. With Data Processing, OVHcloud deploys an Apache Spark cluster in just a few minutes to respond to your query.

Parallel processing

Apache Spark is one of the most popular frameworks on the market for massively parallel data processing. You can use it to manage multiple computing nodes, while storing the operations in the RAM. This means you choose the level of parallel processing you want.

You write the code, we deploy it

Make your life easier. We manage the cluster deployment, so that you can focus on your business needs. Once you have generated your Java or Python code, it is executed directly on your cluster.

Cost reduction

Rather than keeping an Apache Spark cluster permanently for occasional computing operations, you can use Data Processing to create a dedicated cluster in just a few minutes, whenever you need one. Once the analysis is complete, the cluster is freed up.

Uses for our Data Processing solution

Performance reporting

Whether you want to process millions of lines of tabular data, analyse thousands of tweets, or calculate KPIs, you can use Data Processing to aggregate massive volumes of data for strategic reports, used in data science or in other fields.

Customer knowledge

Want more insight into what your European customers use, or what centres of interest are popular among your users? With the MLib library integrated into Apache Spark, you can learn more about your customers’ journeys, habits, distribution, and much more.

Improved buyer experience

In the e-commerce sector, it is important to recommend products to your customers that may interest them. This means you need to analyse all shopping baskets, to detect complementary services and offer them when users visit the website.

How does the Data Processing solution work?

OVHcloud carefully optimises its deployments, and this has made it able to create and destroy Apache Spark clusters on demand, which are used to process high volumes of data. Once they are deployed, Spark will directly browse through the data and load it into the memory, then process it all at once before delivering the result and freeing up the resources.

1

Startup

With your data stored in one place and your code stored in another, create a cluster that is sized to match your needs.

2

Submit your job

Apache Spark will distribute the load across the cluster that has just been deployed.

3

Retrieve the result

Once the processing is complete, you can easily retrieve the result of your analysis.

Pricing Public Cloud

Data Processing Billing