Quick, simple data analysis with Apache Spark
When you want to process your business data, you have a certain volume of data in one place, and a query in another, in the form of a few lines of code. With Data Processing, OVHcloud deploys an Apache Spark cluster in just a few minutes to respond to your query.
Apache Spark is one of the most popular frameworks on the market for massively parallel data processing. You can use it to manage multiple computing nodes, while storing the operations in the RAM. This means you choose the level of parallel processing you want.
You write the code, we deploy it
Make your life easier. We manage the cluster deployment, so that you can focus on your business needs. Once you have generated your Java or Python code, it is executed directly on your cluster.
Rather than keeping an Apache Spark cluster permanently for occasional computing operations, you can use Data Processing to create a dedicated cluster in just a few minutes, whenever you need one. Once the analysis is complete, the cluster is freed up.
Uses for our Data Processing solution
How does the Data Processing solution work?
OVHcloud carefully optimises its deployments, and this has made it able to create and destroy Apache Spark clusters on demand, which are used to process high volumes of data. Once they are deployed, Spark will directly browse through the data and load it into the memory, then process it all at once before delivering the result and freeing up the resources.
With your data stored in one place and your code stored in another, create a cluster that is sized to match your needs.
Submit your job
Apache Spark will distribute the load across the cluster that has just been deployed.
Retrieve the result
Once the processing is complete, you can easily retrieve the result of your analysis.