VoiceAssure: Integration Documentation

General Considerations

To get the most optimal performance from the service, we suggest you to introduce appropriate restrictions on the user experience as follows:

Audio Clips Length: We obtain the highest quality of estimations when the length of the clip is around 7-8 seconds.
Distance from Microphone: Ideal distance to microphone appears to be the “selfie” distance
Objects blocking the microphone or muffle effects: To be avoided.
Multiple people speaking: To be avoided
Loud background music: To be avoided.

See the section “Constraints for Custom Data Captures” for more information.

1. Integration Alternative #1: iFrame Integration

Use this option if the following applies to your constraints:

You prefer a Browser-Based solution over a mobile SDK.
You can embed our iFrame in your webpage or a web-activity in your app.
You do not want to expose additional endpoints to us.
You might use our voiceprint / embedding for reverification

The simplest form of integration is to use iFrames, which brings Privately’s recommended user interface into your web flow. It is also possible to whitelabel this interface and obtain a customized URL.

The integration can be achieved by adding the following to your website’s HTML:

<iframe class="responsive-iframe" src="https://<customURL>/<customRoute>?session_id=api_key&session_password=api_secret&analysis_id=optional" allow="camera;microphone"></iframe>

Where api_key and api_secret constitute your API key pair. In case you intend to perform a reverification, then an additional analysis_id parameter must be supplied - see the section “Handling the Communication”

Variable	Type	Requirement
`session_id`	String - GUID	Required
`session_password`	String	Required
`analysis_id`	String - GUID	Optional; required if reverification is requested

1.1 Handling communication

This is an example of cross-document messaging. The window.postMessage() method safely enables cross-origin communication between Window objects; e.g., between a page and a pop-up that it spawned, or between a page and an iframe embedded within it.

VoiceAssure’s iFrame sends results and intermediary messages as an event, with the following schema:

var pass_data = {
   iframe_message: String, // Estimation outcome or intermediate request type
   score: String, // The confidence level of the age
   liveness_result: Float, // A number between 0-1
   embedding: String, // A modified base64 string 
   verification_completed: Boolean, // Indicates whether this is a response or intermediate message
   analysis_id: String // An id for your records, to be reused in reverification
};
         
console.log(pass_data);
parent.postMessage(JSON.stringify(pass_data), "*");

As such it should be handled by your parent window as follows:

window.addEventListener('message', function(e) {
   try
   {
      var myobj = JSON.parse(e.data)
      
      if(authenticity_failed(e, myobj)) // involves your api key, secret, and our identifiers 
      {
         // do nothing.
      }
      else if(  myobj["iframe_message"] == "retrieve_embedding" &&
               myobj["analysis_id"] == analysis_id_to_reverify)
      {
         // iFrame is ready to receive embedding, send it.
         var embeddingMessage = {
         iframe_message: "ingest_embedding",
         session_id: your_api_key,
         session_password: your_api_secret,
         analysis_id: analysis_id_to_reverify,
         verification_completed: false,
         embedding: getYourBase64Embedding()
         }

         your_iFrameWindowObject.postMessage(JSON.stringify(embeddingMessage), "*")
      } 
      else if ( myobj["verification_completed"])
      {
         if (myobj["iframe_message"] == '25+') {
            // Handle an adult estimation (above 25)
         } else if (myobj["iframe_message"] == 'spoof') {
            // Handle a failed estimation
         } else {
            // Handle an underage estimation
         }
      }
      else 
      {
         console.log("Irrelevant message")
      }
   }
   catch(exp)
   {
      console.log("Irrelevant message")
   }
        
});

We recommend to load our iFrame after defining this listener. An example implementation may look like this:


   window.addEventListener('message', event => { ... });
   var iframe = document.querySelector("#iframe");
   iframe.src = "/url-to-load-in-iframe";

2. Integration Alternative #2: API integration with Privately’s Data Capture

Use this option if the following applies to your constraints:

You prefer a Browser-Based solution over a mobile SDK.
You cannot embed our iFrame in your webpage or a web-activity in your app.
You will be able to provide us Callback URLs to which our systems can send POST requests
You will not require voiceprint / embedding for reverification

In this alternative, your system will receive a custom URL for a given user, who will need to open it in their browser to complete the age estimation process. The result will be communicated back to you using the Callback URL that you provide to our system.

2.1 Generate a new session

As a first step, you perform an HTTP POST request to our endpoint

2.1.1 Sample request body:

{
   "request_type": "generate_new_session",
   "estimation_type": "voice",
   "api_key": "",
   "api_secret": "",
   "callback_url": "https://httpbin.org/post"
}

api_key and api_secret will be provided to you in advance. You should supply your own callback_url in order to get a proper response.

estimation_type can currently take the following alternatives: "voice", "multimodal". It will default to voice

2.1.2 Sample response body:

"{\"transaction_id\": \"1723a501-b2f2-40f0-add8-5c17044584f7\", \"client_url\": \"...\"}"

Please keep transaction_id for verification purposes.

2.2 Receiving the estimation result

You will receive the estimation outcomes that resemble to the current format:

{
   "age": "<age_range>",
   "age_confidence": 0.74,
   "genuineness": 0.8,
   "transaction_id": "1723a501-b2f2-40f0-add8-5c17044584f7"
}

genuineness indicates the likelihood that the tester is authentic and the audio quality is sufficient. We strongly recommend that you disregard the estimation output when this score is below 0.5.
transaction_id should be the same as the one generated to initiate the age estimation

2.3 Query prior age estimation outcomes

In case you want to explicitly retrieve the results, you may also query it from our endpoint.

2.3.1 Sample request body:

{
   "request_type": "query_transaction_result",
   "api_key": "",
   "api_secret": "",
   "transaction_id": "1723a501-b2f2-40f0-add8-5c17044584f7"
}

2.3.2 Sample response body

{
   "age": "<age_range>",
   "age_confidence": 0.74,
   "genuineness": 0.8,
   "transaction_id": "1723a501-b2f2-40f0-add8-5c17044584f7"
}

Notice that in case there were any issues in processing this transaction, you may also observe additional error fields - see the examples below.

2.4 Error Handling

In case there were some issues in any part of the flow, an HTTP 400 response will be generated with following error:

Error Object	Interpretation
{"request_not_complete": <transactionID>}	The system has not completed processing the result. The result might be available after some time later. Alternatively, the user may have prematurely terminated the age estimation and/or failed to do a genuine test.
{ "missing_request": }	A request with `transactionID` was never received
{"missing_parameter": "transaction_id"}	`transaction_id` was not supplied in an intermediate request
{"request_not_understood": }	Requests of type `requestType` are not yet usable in this service
{"remote_server_error": "..."}	We tried to perform a POST request to your callback URL, but we received a response that is not HTTP 200
{"technical_error": "..."}	Our servers have experienced an internal error, please contact us immediately
{"missing_parameter": "callback_url"}	Our system could not receive a callback_url, so we could not send the request back to you

3. Integration Alternative #3: API integration with Your Custom Data Capture

Use this option if the following applies to your constraints:

You prefer to implement your own data capture.
You do not want to expose additional endpoints to us.
You want to do a Sandbox trial, with explicit consent obtained from your end-users to share biometric data.

3.1 Usage Caveats

Warning: This integration requires you to transmit biometric data to Privately’s infrastructure, and hence can only be used within Sandbox trials.
Its input voice clip should have at least 5 seconds of audible speech.
Each session must generate a new transaction id (through generate_phrase), even when the end-user is making a retry.

Endpoint: https://fwrxnwsu41.execute-api.eu-west-1.amazonaws.com/default/d-privately-audio-services

3.2 Request #1: Generate a random sentence

3.2.1 Sample request body:

{
   "request_type": "generate_phrase",
   "client_id": "",
   "client_password": "",
   "lang": "fr"
}

3.2.2 Sample response:

"{\"id\": \"1723a501-b2f2-40f0-add8-5c17044584f7\", \"phrase\": \"Selon que vous serez puissant ou miserable, Les jugements de cour vous rendront blanc ou noir\"}"

3.3 Request #2: Age estimation with Spoof check

3.3.1 Data Preparation

This request requires a base64 voice clip, which should be in wave format.

Sample recorder snippet in Javascript/Vue:

 
    var ref = this;
    if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
                navigator.mediaDevices
                .getUserMedia(this.constraints)
                .then(function (stream) {
                    ref.audioRecorder = new MediaRecorder(stream);
                    ref.audioRecorder.start();
                    console.log(ref.audioRecorder.state);
                    console.log("recorder started");
        
                    ref.audioRecorder.ondataavailable = function (e) {
                    console.log("data pushed");
                    ref.audioChunks.push(e.data);
                    };
                })
                .catch(function (err) {
                    console.log("The following getUserMedia error occurred: " + err);
                });
      } else {
            console.log("getUserMedia not supported on your browser!");
      }

Once the recording stops, it’s possible to convert the collected data as base64 wave format. Sample snippet in Javascript/Vue:

  
    var ref = this;
    if (this.audioRecorder.state != "inactive") {
      this.audioRecorder.stop();
    }
    this.isRecording = false;
    this.isProcessing = true;

    console.log("recording stopped");
 
    await new Promise((resolve) => setTimeout(resolve, 1000));

    var superBuffer = new Blob(this.audioChunks, { type: "video/webm" });

    var reader = new window.FileReader();
    reader.readAsDataURL(superBuffer);
    reader.onloadend = function () {
      var base64 = reader.result;
      base64 = base64.split(",")[1];
    }

3.3.2 Sample request body

{ 
   “voice_data”: “<base64string>”
   "requested_phrase": "Aujourd'hui je n'ai pas pu voir mes amis. Je suis triste. Combien d'heures est-ce qu'il me faudra pour l'oublier?",
   "transaction_id": "91ac660e-4426-49b7-9feb-afd5ff14267e",
   "client_id": "<your_id>",
   "client_password": "<your_secret>",
   "request_type": "voice_verification"
}

Notice that you will need to send the phrase generated in the previous request. The phrase should be placed in requested_phrase

3.3.3 Sample response

There are two possible responses:

Direct Response will return the following response with the HTTP code 200:

{\"text\": \"ALORS UN DERNIER TEST AVANT D AVOIR UN CAF\\u00c9 AVEC DES BACS JE SUIS TR\\u00c8S CONTENT JE VEUX FAIRE\", \"emotion\": \"Emotion detection not enabled\", \"hate\": 0, \"toxicity\": 0, \"profanity\": 0, \"age\": \"adulthood\", \"ageConfidence\": 1.0, \"gender\": \"Detection not enabled\", \"genuineControlScore\": 0.11111111111111116, \"transaction_id\": \"91ac660e-4426-49b7-9feb-afd5ff14267e\"}

genuineConfidenceScore indicates our level of confidence for a bona fide test. We recommend to consider the test as spoof if this value is below 0.65
age: “adulthood” if the person is deemed as 18+. Otherwise it is a minor
ageConfidence: The confidence level at which we made the age prediction. The values will be 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0
text: the output of the French speech-to-text module. Used in spoof detection
Notice that the transaction_id is the same as the one in the request body.

Queued Response will return the following response with the HTTP code 202:

{"transaction_id": "91ac660e-4426-49b7-9feb-afd5ff14267e"}

Notice that the transaction_id is the same as the one in the request body. You may use this transaction_id in the polling query below:

3.4 Request #3: Poll estimation results

If you receive an HTTP 202 from Request #2, you must use the same URL with the following request body to poll the results

3.4.1 Sample Request Body

{
   "transaction_id": "91ac660e-4426-49b7-9feb-afd5ff14267e", // from the previous response from server
   "request_type": "poll_transaction",
   "client_id": "<your_id>",
   "client_password": "<your_secret>"
}

3.4.2 Sample Response

If the estimation is not yet complete, you will receive HTTP 202 with a response body identical to your request body.

If the estimation is complete, you will receive HTTP 200 with the response body similar to the Direct Response of the Request #2.

4. Constraints for Custom Data Captures

Applies to: Integrations with Custom Data Captures

The input to VoiceAssure must respect to a number of constraints in order to have the best estimation outcomes. In case you decide to implement your own data capture, please consider the following constraints as part of your development guidelines:

Reliability Metric	Constraints	Interpretation
Audio format	Use Default Browser settings, WEBM / MP4 / WAV, 2-channel, 44100 kHz	If you’re using Server-Based APIs, the format conversion will be done automatically for you. For on-browser builds, we recommend you to keep the default settings on the client browser.
Speaker count	= 1	We currently do not support age estimation of multiple speakers
Audio Clip Length (seconds)	> 5	Shorter audio clips do not guarantee reliable outcomes.
Root Mean Square (Loudness and Noise)	Mean(RMSE) < 0.09 Max(RMSE) > 0.17	Loud music, whispering, being too distant from the microphone, second person speaking, etc. can tamper the audio quality via these metrics
Zero-Crossing Rate (Noise)	Min(ZCRate) < 0.02	Especially outdoors noises can increase this metric, producing unreliable results
Voice Print Match Rate	0 - 0.5: Reject match 0.5 - 0.7: Unreliable match 0.7-1.0: Reliable match	Recommendations for reverification purposes.
Speech-to-text match rate	0 - 0.5: Reject match 0.5 - 0.6: Non-native speaker match 0.6 - 1.0: Accept	Recommendations for liveness check purposes