24,000 product categories


up to 2.5TB database containing 250,000 lines of training data and machine learning models

Machine Learning

250,000-line database

Executive summary

Created in October 2020, Customs Bridge is a “deep tech” company: a startup whose core technology relies on artificial intelligence algorithms to create an automatic product classification engine. The company is targeting European importers with this service, as each product imported into the European Union must be precisely categorised according to a nomenclature for which there are over 24,000 entries. The complicated task for importers is to choose the right category based on the description provided by the manufacturer - these descriptions can be very short or even incomplete. All products imported into the European Union must be declared with a code, which is used to calculate customs duties. This code also defines the regulations that will apply to the product. Any misallocation may result in penalties, withdrawal of the product or tax adjustments.

“We were able to join the OVHcloud Startup Program, which meant that we could start using their AI Cloud services very quickly. With the OVHcloud AI Training service, we were able to train our machine learning models in a way that we could not have done on our own on-premise machines.”
Dr. Hamza Saouli, Innovation Director at Customs Bridge

This classification can be complicated because the code must be identical across all EU countries, but internationally the categories may vary from one country to another, depending on whether the manufacturer exports its products to Europe, the United States or China. Subtleties in the description of a product can also make it switch from one category to another - for example, a watch bracelet could be classified differently to a watch chain.

The challenge

The mission of Customs Bridge is to create the most reliable product classification engine possible in order to assign the correct customs code to a product whose description is not fully formalised. This may be a relatively precise description in the case of electronic products, for example, or a few keywords for a food product - with very different volumes of data depending on whether it is a product frequently imported into the European Union or not.

“To conduct the training process for our AI models, we started by using open data, including the European Binding Tariff Information (EBTI) database,” says Hamza Saouli, Innovation Director and AI expert at Customs Bridge. “This database has 250,000 lines but covers only 10% to 15% of the complete nomenclature. We were able to run training on several learning models for this data source with initial positive results for a code, for a chapter. This training has been successful on electronics imported from China, which are generally well described, but for less frequently imported products we have not had any significant results due to a lack of large-scale, good quality data.” The models often do not have enough data on rarely imported products, since European data is not as accessible as, for example, US customs data.

In the initial phases of the project, the Customs Bridge Innovation Director mainly used AI algorithms that were best known for their efficiency and speed, such as SVM and decision trees, but with the growth of the training data set, the use of the latter proved to no longer be a good solution. This prompted the Customs Bridge AI team to adopt more advanced models such as neural networks (via the Deep Learning Keras API) and Transformers: algorithms that are now state-of-the-art in semantic classification. Saouli then improved the classification performance of his models with the help of scientific papers from AI researchers. However, the startup quickly ran into a major problem: the processing capacity that it had for training its AI models. While the 3 GPU-equipped PCs were enough to train the simplest models, this infrastructure would quickly reach its limits and cause the Customs Bridge team to opt for a cloud solution: ideal for meeting their occasional need for high computing power and RAM. This is what led Customs Bridge to OVHcloud’s AI & Machine Learning solutions.

“Initially, we thought we could train our models on our own GPU machines. This approach quickly came to a standstill when we wanted to scale up. We were hindered by a lack of RAM and available storage space, which greatly limited the training process for our models. For us, the cloud was the best possible solution, both technically and financially.”
Dr. Hamza Saouli, Innovation Director at Customs Bridge

The solution

Customs Bridge implemented OVHcloud’s model training solution, AI Training, from the functional building blocks offered by OVHcloud in its AI offer. Meanwhile, the startup utilises OVHcloud instances to deploy models into production and support the data power pipeline. “We’ve set up a pipeline that starts with a customer’s request, submits the request to the model, then processes the response received from the model,” explains Hamza Saouli. “This must be prepared before it is displayed to the customer. First, we have to process the text descriptions of the products being imported, knowing that they are small (only 3 to 5 words) and do not describe the product enough, and then these descriptions are uploaded to the cloud in order to be submitted to the deployed model that provides a set of customs codes for the importer.”

In the near future, this pipeline will become more complex. The team is currently working on a text enhancer, an algorithm that starts with an existing data set and will enrich it to optimise the model training. The algorithm will then increase the initial database from 200,000-300,000 lines and build it up to 3 to 4 million lines using automatic text generation techniques. Again, the cloud is indispensable for this task, as training models on such large volumes of data is simply no longer possible on a standard PC.

“Moving AI model training from an On-Premise approach to OVHcloud AI Training gave us a flexibility and power that we couldn’t have achieved internally. The solution is very simple to use: We can set the number of GPUs and the amount of RAM we need in advance. This is very useful when you know how many resources you will need ahead of time.”
Dr. Hamza Saouli, Innovation Director at Customs Bridge

As Dr Saouli explains, he had no problems adapting to the switch from on-premise training computing to the OVHcloud Cloud. OVHcloud provides ready-to-use containers for the main AI frameworks, so all you need to do is launch the corresponding Job to deploy them on a GPU in the Cloud. What’s more, since June 2021 it has been possible to do the same for containers running on a CPU. With this option, you can get computing resources at an even lower price for training that doesn’t require the power of a dedicated GPU. This upgrade to OVHcloud’s AI solution came about thanks to a request from Customs Bridge.

Hamza Saouli relied on around 2.5TB of data to train his first Transformers models. The data volumes are lower for the machine learning models, in the range of 30GB to 40GB of training data. “With the NVIDIA V100 GPUs provided by OVHcloud, training Transformers on 250,000 lines only takes around 30 minutes of computing time. This is both very fast and really low-cost, since one hour of computing is billed for around €1.75. That’s exactly why we do not plan to buy machines to perform these calculations in-house,” he adds.

Alongside this work on AI models, Hamza Saouli is now working on a chatbot that will interact with customers to get information about the product they are looking for. This has already led to a Rasa model, an open-source platform dedicated to chatbots on OVHcloud CPU instances. The initial results were considered very encouraging, and Saouli hopes that OVHcloud will make a Rasa container quickly available in its AI infrastructure, to further simplify its implementation.

CustomsBridge diagram


The result

“After several months of using OVHcloud AI Training and training multiple types of AI models, I’ve never experienced any installation or configuration issues,” says Saouli. “OVHcloud gives us the ability to choose the Docker image that we want to launch our training on. It’s an extremely simple and effective approach. I used these containers for Transformers and Tensorflow models for a ChatBot with the images available, and it works perfectly.”

In addition to training its models, which have traditionally been very costly in terms of memory space and computing power, Customs Bridge is now considering the scalability of its model at a production level when the startup receives its first customers. “For the moment, our most powerful model is a classic model that does not require a GPU to be deployed to production,” explains Hamza Saouli. “When we start to use larger data sets, we will increase data volumes by a factor of 100 to 1000 in the near future. The factor is not important - it will all depend on the relevance of the model. That’s the whole point of a Cloud approach: OVHcloud will allow us to grow data volumes without limiting our infrastructure. We don’t have to rein in our models; we can just experiment until we find the volume needed to achieve the precision we want. The Cloud model gives us a lot of freedom.”

Customs Bridge will then use GPU instances in production if required. The startup will then be able to run its AI models on the OVHcloud ML Serving service. “Similarly, OVHcloud’s Data Preparation service will potentially be of interest to us when we have larger volumes of data to process upstream from our models. The dynamic allocation of resources allows us to only pay for what we actually use, which is an asset for Customs Bridge,” Saouli concludes.