Benchmarking calling Oracle Machine Learning using REST
Over the past year I’ve been presenting, blogging and sharing my experiences of using REST to expose Oracle Machine Learning models to developers in other languages, for example Python.
One of the questions I’ve been asked is, Does it scale?
Although I’ve used it in several projects to great success, there are no figures I can report publicly on how many REST API calls can be serviced 😦
But this can be easily done, and the results below are based on using and Oracle Autonomous Data Warehouse (ADW) on the Oracle Always Free.
The machine learning model is built on a Wine reviews data set, using Oracle Machine Learning Notebook as my tool to write some SQL and PL/SQL to build out a model to predict Good or Bad wines, based on the Prices and other characteristics of the wine. A REST API was built using this model to allow for a developer to pass in wine descriptors and returns two values to indicate if it would be a Good or Bad wine and the probability of this prediction.
No data is stored in the database. I only use the machine learning model to make the prediction
I built out the REST API using APEX, and here is a screenshot of the GET API setup.
Here is an example of some Python code to call the machine learning model to make a prediction.
import json import requests country = 'Portugal' province = 'Douro' variety = 'Portuguese Red' price = '30' resp = requests.get('https://jggnlb6iptk8gum-adw2.adb.us-ashburn-1.oraclecloudapps.com/ords/oml_user/wine/wine_pred/'+country+'/'+province+'/'+'variety'+'/'+price) json_data = resp.json() print (json.dumps(json_data, indent=2))
—–
{ "pred_wine": "LT_90_POINTS", "prob_wine": 0.6844716987704507 }
But does this scale, as in how many concurrent users and REST API calls can it handle at the same time.
To test this I multi-threaded processes in Python to call a Python function to call the API, while ensuring a range of values are used for the input parameters. Some additional information for my tests.
- Each function call included two REST API calls
- Test effect of creating X processes, at same time
- Test effect of creating X processes in batches of Y agents
- Then, the above, with function having one REST API call and also having two REST API calls, to compare timings
- Test in range of parallel process from 10 to 1,000 (generating up to 2,000 REST API calls at a time)
Some of the results. The table shows the time(*) in seconds to complete the number of processes grouped into batches (agents). My laptop was the limiting factor in these tests. It wasn’t able to test when the number of parallel processes when above 500. That is why I broke them into batches consisting of X agents
* this is the total time to run all the Python code, including the time taken to create each process.
Some observations:
- Time taken to complete each function/process was between 0.45 seconds and 1.65 seconds, for two API calls.
- When only one API call, time to complete each function/process was between 0.32 seconds and 1.21 seconds
- Average time for each function/process was 0.64 seconds for one API functions/processes, and 0.86 for two API calls in function/process
- Table above illustrates the overhead associated with setting up, calling, and managing these processes
As you can see, even with the limitations of my laptop, using an Oracle Database, in-database machine learning and REST can be used to create a Micro-Service type machine learning scoring engine. Based on these numbers, this machine learning micro-service would be able to handle and process a large number of machine learning scoring in Real-Time, and these numbers would be well within the maximum number of such calls in most applications. I’m sure I could process more parallel processes if I deployed on a different machine to my laptop and maybe used a number of different machines at the same time
How many applications within you enterprise needs to process move than 6,000 real-time machine learning scoring per minute? This shows us the Oracle Always Free offering is capable and suitable for most applications.
Now, if you are processing more than those numbers per minutes then perhaps you need to move onto the paid options.
What next? I’ll spin up two VMs on Oracle Always Free, install Python, copy code into these VMs and have then run in parallel 🙂
One thought on “Benchmarking calling Oracle Machine Learning using REST”
Comments are closed.
March 11, 2020 at 8:46 pm
[…] 5. Benchmarking calling Oracle Machine Learning using REST […]
LikeLike