VoiceAssure: Integration Documentation

General Considerations

To get the most optimal performance from the service, we suggest you to introduce appropriate restrictions on the user experience as follows:

See the section “Constraints for Custom Data Captures” for more information.


1. Integration Alternative #1: iFrame Integration

Use this option if the following applies to your constraints:

The simplest form of integration is to use iFrames, which brings Privately’s recommended user interface into your web flow. It is also possible to whitelabel this interface and obtain a customized URL.

The integration can be achieved by adding the following to your website’s HTML:

<iframe class="responsive-iframe" src="https://<customURL>/<customRoute>?session_id=api_key&session_password=api_secret&analysis_id=optional" allow="camera;microphone"></iframe>

Where api_key and api_secret constitute your API key pair. In case you intend to perform a reverification, then an additional analysis_id parameter must be supplied - see the section “Handling the Communication”

Variable Type Requirement
session_id String - GUID Required
session_password String Required
analysis_id String - GUID Optional; required if reverification is requested

1.1 Handling communication

This is an example of cross-document messaging. The window.postMessage() method safely enables cross-origin communication between Window objects; e.g., between a page and a pop-up that it spawned, or between a page and an iframe embedded within it.

VoiceAssure’s iFrame sends results and intermediary messages as an event, with the following schema:

var pass_data = {
   iframe_message: String, // Estimation outcome or intermediate request type
   score: String, // The confidence level of the age
   liveness_result: Float, // A number between 0-1
   embedding: String, // A modified base64 string 
   verification_completed: Boolean, // Indicates whether this is a response or intermediate message
   analysis_id: String // An id for your records, to be reused in reverification
};
         
console.log(pass_data);
parent.postMessage(JSON.stringify(pass_data), "*");
         

As such it should be handled by your parent window as follows:

window.addEventListener('message', function(e) {
   try
   {
      var myobj = JSON.parse(e.data)
      
      if(authenticity_failed(e, myobj)) // involves your api key, secret, and our identifiers 
      {
         // do nothing.
      }
      else if(  myobj["iframe_message"] == "retrieve_embedding" &&
               myobj["analysis_id"] == analysis_id_to_reverify)
      {
         // iFrame is ready to receive embedding, send it.
         var embeddingMessage = {
         iframe_message: "ingest_embedding",
         session_id: your_api_key,
         session_password: your_api_secret,
         analysis_id: analysis_id_to_reverify,
         verification_completed: false,
         embedding: getYourBase64Embedding()
         }

         your_iFrameWindowObject.postMessage(JSON.stringify(embeddingMessage), "*")
      } 
      else if ( myobj["verification_completed"])
      {
         if (myobj["iframe_message"] == '25+') {
            // Handle an adult estimation (above 25)
         } else if (myobj["iframe_message"] == 'spoof') {
            // Handle a failed estimation
         } else {
            // Handle an underage estimation
         }
      }
      else 
      {
         console.log("Irrelevant message")
      }
   }
   catch(exp)
   {
      console.log("Irrelevant message")
   }
        
});
         

We recommend to load our iFrame after defining this listener. An example implementation may look like this:


   window.addEventListener('message', event => { ... });
   var iframe = document.querySelector("#iframe");
   iframe.src = "/url-to-load-in-iframe";
         

2. Integration Alternative #2: API integration with Privately’s Data Capture

Use this option if the following applies to your constraints:

In this alternative, your system will receive a custom URL for a given user, who will need to open it in their browser to complete the age estimation process. The result will be communicated back to you using the Callback URL that you provide to our system.

2.1 Generate a new session

As a first step, you perform an HTTP POST request to our endpoint

2.1.1 Sample request body:

{
   "request_type": "generate_new_session",
   "estimation_type": "voice",
   "api_key": "",
   "api_secret": "",
   "callback_url": "https://httpbin.org/post"
}

api_key and api_secret will be provided to you in advance. You should supply your own callback_url in order to get a proper response.

estimation_type can currently take the following alternatives: "voice", "multimodal". It will default to voice

2.1.2 Sample response body:

"{\"transaction_id\": \"1723a501-b2f2-40f0-add8-5c17044584f7\", \"client_url\": \"...\"}"

Please keep transaction_id for verification purposes.


2.2 Receiving the estimation result

You will receive the estimation outcomes that resemble to the current format:

{
   "age": "<age_range>",
   "age_confidence": 0.74,
   "genuineness": 0.8,
   "transaction_id": "1723a501-b2f2-40f0-add8-5c17044584f7"
}

2.3 Query prior age estimation outcomes

In case you want to explicitly retrieve the results, you may also query it from our endpoint.

2.3.1 Sample request body:

{
   "request_type": "query_transaction_result",
   "api_key": "",
   "api_secret": "",
   "transaction_id": "1723a501-b2f2-40f0-add8-5c17044584f7"
}

2.3.2 Sample response body

{
   "age": "<age_range>",
   "age_confidence": 0.74,
   "genuineness": 0.8,
   "transaction_id": "1723a501-b2f2-40f0-add8-5c17044584f7"
}

Notice that in case there were any issues in processing this transaction, you may also observe additional error fields - see the examples below.


2.4 Error Handling

In case there were some issues in any part of the flow, an HTTP 400 response will be generated with following error:

Error Object Interpretation
{"request_not_complete": <transactionID>} The system has not completed processing the result. The result might be available after some time later. Alternatively, the user may have prematurely terminated the age estimation and/or failed to do a genuine test.
{ "missing_request": } A request with transactionID was never received
{"missing_parameter": "transaction_id"} transaction_id was not supplied in an intermediate request
{"request_not_understood": } Requests of type requestType are not yet usable in this service
{"remote_server_error": "..."} We tried to perform a POST request to your callback URL, but we received a response that is not HTTP 200
{"technical_error": "..."} Our servers have experienced an internal error, please contact us immediately
{"missing_parameter": "callback_url"} Our system could not receive a callback_url, so we could not send the request back to you

3. Integration Alternative #3: API integration with Your Custom Data Capture

Use this option if the following applies to your constraints:

3.1 Usage Caveats

Endpoint: https://fwrxnwsu41.execute-api.eu-west-1.amazonaws.com/default/d-privately-audio-services


3.2 Request #1: Generate a random sentence

3.2.1 Sample request body:

{
   "request_type": "generate_phrase",
   "client_id": "",
   "client_password": "",
   "lang": "fr"
}

3.2.2 Sample response:

"{\"id\": \"1723a501-b2f2-40f0-add8-5c17044584f7\", \"phrase\": \"Selon que vous serez puissant ou miserable, Les jugements de cour vous rendront blanc ou noir\"}"

3.3 Request #2: Age estimation with Spoof check

3.3.1 Data Preparation

This request requires a base64 voice clip, which should be in wave format.

Sample recorder snippet in Javascript/Vue:

 
    var ref = this;
    if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
                navigator.mediaDevices
                .getUserMedia(this.constraints)
                .then(function (stream) {
                    ref.audioRecorder = new MediaRecorder(stream);
                    ref.audioRecorder.start();
                    console.log(ref.audioRecorder.state);
                    console.log("recorder started");
        
                    ref.audioRecorder.ondataavailable = function (e) {
                    console.log("data pushed");
                    ref.audioChunks.push(e.data);
                    };
                })
                .catch(function (err) {
                    console.log("The following getUserMedia error occurred: " + err);
                });
      } else {
            console.log("getUserMedia not supported on your browser!");
      }
                

Once the recording stops, it’s possible to convert the collected data as base64 wave format. Sample snippet in Javascript/Vue:

  
    var ref = this;
    if (this.audioRecorder.state != "inactive") {
      this.audioRecorder.stop();
    }
    this.isRecording = false;
    this.isProcessing = true;

    console.log("recording stopped");
 
    await new Promise((resolve) => setTimeout(resolve, 1000));

    var superBuffer = new Blob(this.audioChunks, { type: "video/webm" });

    var reader = new window.FileReader();
    reader.readAsDataURL(superBuffer);
    reader.onloadend = function () {
      var base64 = reader.result;
      base64 = base64.split(",")[1];
    }

3.3.2 Sample request body

{ 
   “voice_data”: “<base64string>”
   "requested_phrase": "Aujourd'hui je n'ai pas pu voir mes amis. Je suis triste. Combien d'heures est-ce qu'il me faudra pour l'oublier?",
   "transaction_id": "91ac660e-4426-49b7-9feb-afd5ff14267e",
   "client_id": "<your_id>",
   "client_password": "<your_secret>",
   "request_type": "voice_verification"
}

Notice that you will need to send the phrase generated in the previous request. The phrase should be placed in requested_phrase

3.3.3 Sample response

There are two possible responses:

  1. Direct Response will return the following response with the HTTP code 200:
{\"text\": \"ALORS UN DERNIER TEST AVANT D AVOIR UN CAF\\u00c9 AVEC DES BACS JE SUIS TR\\u00c8S CONTENT JE VEUX FAIRE\", \"emotion\": \"Emotion detection not enabled\", \"hate\": 0, \"toxicity\": 0, \"profanity\": 0, \"age\": \"adulthood\", \"ageConfidence\": 1.0, \"gender\": \"Detection not enabled\", \"genuineControlScore\": 0.11111111111111116, \"transaction_id\": \"91ac660e-4426-49b7-9feb-afd5ff14267e\"}


  1. Queued Response will return the following response with the HTTP code 202:
{"transaction_id": "91ac660e-4426-49b7-9feb-afd5ff14267e"}

Notice that the transaction_id is the same as the one in the request body. You may use this transaction_id in the polling query below:


3.4 Request #3: Poll estimation results

If you receive an HTTP 202 from Request #2, you must use the same URL with the following request body to poll the results

3.4.1 Sample Request Body

{
   "transaction_id": "91ac660e-4426-49b7-9feb-afd5ff14267e", // from the previous response from server
   "request_type": "poll_transaction",
   "client_id": "<your_id>",
   "client_password": "<your_secret>"
}

3.4.2 Sample Response

If the estimation is not yet complete, you will receive HTTP 202 with a response body identical to your request body.

If the estimation is complete, you will receive HTTP 200 with the response body similar to the Direct Response of the Request #2.


4. Constraints for Custom Data Captures

Applies to: Integrations with Custom Data Captures

The input to VoiceAssure must respect to a number of constraints in order to have the best estimation outcomes. In case you decide to implement your own data capture, please consider the following constraints as part of your development guidelines:

Reliability Metric Constraints Interpretation
Audio format Use Default Browser settings, WEBM / MP4 / WAV, 2-channel, 44100 kHz If you’re using Server-Based APIs, the format conversion will be done automatically for you. For on-browser builds, we recommend you to keep the default settings on the client browser.
Speaker count = 1 We currently do not support age estimation of multiple speakers
Audio Clip Length (seconds) > 5 Shorter audio clips do not guarantee reliable outcomes.
Root Mean Square (Loudness and Noise) Mean(RMSE) < 0.09
Max(RMSE) > 0.17
Loud music, whispering, being too distant from the microphone, second person speaking, etc. can tamper the audio quality via these metrics
Zero-Crossing Rate (Noise) Min(ZCRate) < 0.02 Especially outdoors noises can increase this metric, producing unreliable results
Voice Print Match Rate 0 - 0.5: Reject match
0.5 - 0.7: Unreliable match
0.7-1.0: Reliable match
Recommendations for reverification purposes.
Speech-to-text match rate 0 - 0.5: Reject match
0.5 - 0.6: Non-native speaker match
0.6 - 1.0: Accept
Recommendations for liveness check purposes