VoiceAssure: Integration Documentation
General Considerations
To get the most optimal performance from the service, we suggest you to introduce appropriate restrictions on the user experience as follows:
- Audio Clips Length: We obtain the highest quality of estimations when the length of the clip is around 7-8 seconds.
- Distance from Microphone: Ideal distance to microphone appears to be the “selfie” distance
- Objects blocking the microphone or muffle effects: To be avoided.
- Multiple people speaking: To be avoided
- Loud background music: To be avoided.
See the section “Constraints for Custom Data Captures” for more information.
1. Integration Alternative #1: iFrame Integration
Use this option if the following applies to your constraints:
- You prefer a Browser-Based solution over a mobile SDK.
- You can embed our iFrame in your webpage or a web-activity in your app.
- You do not want to expose additional endpoints to us.
- You might use our voiceprint / embedding for reverification
The simplest form of integration is to use iFrames, which brings Privately’s recommended user interface into your web flow. It is also possible to whitelabel this interface and obtain a customized URL.
The integration can be achieved by adding the following to your website’s HTML:
<iframe class="responsive-iframe" src="https://<customURL>/<customRoute>?session_id=api_key&session_password=api_secret&analysis_id=optional" allow="camera;microphone"></iframe>
Where api_key
and api_secret
constitute your API key pair. In case you intend to perform a reverification, then an additional analysis_id
parameter must be supplied - see the section “Handling the Communication”
Variable | Type | Requirement |
---|---|---|
session_id |
String - GUID | Required |
session_password |
String | Required |
analysis_id |
String - GUID | Optional; required if reverification is requested |
1.1 Handling communication
This is an example of cross-document messaging. The window.postMessage()
method safely enables cross-origin communication between Window objects; e.g., between a page and a pop-up that it spawned, or between a page and an iframe embedded within it.
VoiceAssure’s iFrame sends results and intermediary messages as an event, with the following schema:
var pass_data = {
iframe_message: String, // Estimation outcome or intermediate request type
score: String, // The confidence level of the age
liveness_result: Float, // A number between 0-1
embedding: String, // A modified base64 string
verification_completed: Boolean, // Indicates whether this is a response or intermediate message
analysis_id: String // An id for your records, to be reused in reverification
};
console.log(pass_data);
parent.postMessage(JSON.stringify(pass_data), "*");
As such it should be handled by your parent window as follows:
window.addEventListener('message', function(e) {
try
{
var myobj = JSON.parse(e.data)
if(authenticity_failed(e, myobj)) // involves your api key, secret, and our identifiers
{
// do nothing.
}
else if( myobj["iframe_message"] == "retrieve_embedding" &&
myobj["analysis_id"] == analysis_id_to_reverify)
{
// iFrame is ready to receive embedding, send it.
var embeddingMessage = {
iframe_message: "ingest_embedding",
session_id: your_api_key,
session_password: your_api_secret,
analysis_id: analysis_id_to_reverify,
verification_completed: false,
embedding: getYourBase64Embedding()
}
your_iFrameWindowObject.postMessage(JSON.stringify(embeddingMessage), "*")
}
else if ( myobj["verification_completed"])
{
if (myobj["iframe_message"] == '25+') {
// Handle an adult estimation (above 25)
} else if (myobj["iframe_message"] == 'spoof') {
// Handle a failed estimation
} else {
// Handle an underage estimation
}
}
else
{
console.log("Irrelevant message")
}
}
catch(exp)
{
console.log("Irrelevant message")
}
});
We recommend to load our iFrame after defining this listener. An example implementation may look like this:
window.addEventListener('message', event => { ... });
var iframe = document.querySelector("#iframe");
iframe.src = "/url-to-load-in-iframe";
2. Integration Alternative #2: API integration with Privately’s Data Capture
Use this option if the following applies to your constraints:
- You prefer a Browser-Based solution over a mobile SDK.
- You cannot embed our iFrame in your webpage or a web-activity in your app.
- You will be able to provide us Callback URLs to which our systems can send POST requests
- You will not require voiceprint / embedding for reverification
In this alternative, your system will receive a custom URL for a given user, who will need to open it in their browser to complete the age estimation process. The result will be communicated back to you using the Callback URL that you provide to our system.
2.1 Generate a new session
As a first step, you perform an HTTP POST request to our endpoint
2.1.1 Sample request body:
{
"request_type": "generate_new_session",
"estimation_type": "voice",
"api_key": "",
"api_secret": "",
"callback_url": "https://httpbin.org/post"
}
api_key
and api_secret
will be provided to you in advance. You should supply your own callback_url
in order to get a proper response.
estimation_type
can currently take the following alternatives: "voice", "multimodal"
. It will default to voice
2.1.2 Sample response body:
"{\"transaction_id\": \"1723a501-b2f2-40f0-add8-5c17044584f7\", \"client_url\": \"...\"}"
Please keep transaction_id
for verification purposes.
2.2 Receiving the estimation result
You will receive the estimation outcomes that resemble to the current format:
{
"age": "<age_range>",
"age_confidence": 0.74,
"genuineness": 0.8,
"transaction_id": "1723a501-b2f2-40f0-add8-5c17044584f7"
}
genuineness
indicates the likelihood that the tester is authentic and the audio quality is sufficient. We strongly recommend that you disregard the estimation output when this score is below 0.5.transaction_id
should be the same as the one generated to initiate the age estimation
2.3 Query prior age estimation outcomes
In case you want to explicitly retrieve the results, you may also query it from our endpoint.
2.3.1 Sample request body:
{
"request_type": "query_transaction_result",
"api_key": "",
"api_secret": "",
"transaction_id": "1723a501-b2f2-40f0-add8-5c17044584f7"
}
2.3.2 Sample response body
{
"age": "<age_range>",
"age_confidence": 0.74,
"genuineness": 0.8,
"transaction_id": "1723a501-b2f2-40f0-add8-5c17044584f7"
}
Notice that in case there were any issues in processing this transaction, you may also observe additional error fields - see the examples below.
2.4 Error Handling
In case there were some issues in any part of the flow, an HTTP 400 response will be generated with following error:
Error Object | Interpretation |
---|---|
{"request_not_complete": <transactionID>} | The system has not completed processing the result. The result might be available after some time later. Alternatively, the user may have prematurely terminated the age estimation and/or failed to do a genuine test. |
{ "missing_request": |
A request with transactionID was never received
|
{"missing_parameter": "transaction_id"} | transaction_id was not supplied in an intermediate request |
{"request_not_understood": |
Requests of type requestType are not yet usable in this service
|
{"remote_server_error": "..."} | We tried to perform a POST request to your callback URL, but we received a response that is not HTTP 200 |
{"technical_error": "..."} | Our servers have experienced an internal error, please contact us immediately |
{"missing_parameter": "callback_url"} | Our system could not receive a callback_url, so we could not send the request back to you |
3. Integration Alternative #3: API integration with Your Custom Data Capture
Use this option if the following applies to your constraints:
- You prefer to implement your own data capture.
- You do not want to expose additional endpoints to us.
- You want to do a Sandbox trial, with explicit consent obtained from your end-users to share biometric data.
3.1 Usage Caveats
- Warning: This integration requires you to transmit biometric data to Privately’s infrastructure, and hence can only be used within Sandbox trials.
- Its input voice clip should have at least 5 seconds of audible speech.
- Each session must generate a new transaction id (through generate_phrase), even when the end-user is making a retry.
Endpoint: https://fwrxnwsu41.execute-api.eu-west-1.amazonaws.com/default/d-privately-audio-services
3.2 Request #1: Generate a random sentence
3.2.1 Sample request body:
{
"request_type": "generate_phrase",
"client_id": "",
"client_password": "",
"lang": "fr"
}
3.2.2 Sample response:
"{\"id\": \"1723a501-b2f2-40f0-add8-5c17044584f7\", \"phrase\": \"Selon que vous serez puissant ou miserable, Les jugements de cour vous rendront blanc ou noir\"}"
3.3 Request #2: Age estimation with Spoof check
3.3.1 Data Preparation
This request requires a base64 voice clip, which should be in wave format.
Sample recorder snippet in Javascript/Vue:
var ref = this;
if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
navigator.mediaDevices
.getUserMedia(this.constraints)
.then(function (stream) {
ref.audioRecorder = new MediaRecorder(stream);
ref.audioRecorder.start();
console.log(ref.audioRecorder.state);
console.log("recorder started");
ref.audioRecorder.ondataavailable = function (e) {
console.log("data pushed");
ref.audioChunks.push(e.data);
};
})
.catch(function (err) {
console.log("The following getUserMedia error occurred: " + err);
});
} else {
console.log("getUserMedia not supported on your browser!");
}
Once the recording stops, it’s possible to convert the collected data as base64 wave format. Sample snippet in Javascript/Vue:
var ref = this;
if (this.audioRecorder.state != "inactive") {
this.audioRecorder.stop();
}
this.isRecording = false;
this.isProcessing = true;
console.log("recording stopped");
await new Promise((resolve) => setTimeout(resolve, 1000));
var superBuffer = new Blob(this.audioChunks, { type: "video/webm" });
var reader = new window.FileReader();
reader.readAsDataURL(superBuffer);
reader.onloadend = function () {
var base64 = reader.result;
base64 = base64.split(",")[1];
}
3.3.2 Sample request body
{
“voice_data”: “<base64string>”
"requested_phrase": "Aujourd'hui je n'ai pas pu voir mes amis. Je suis triste. Combien d'heures est-ce qu'il me faudra pour l'oublier?",
"transaction_id": "91ac660e-4426-49b7-9feb-afd5ff14267e",
"client_id": "<your_id>",
"client_password": "<your_secret>",
"request_type": "voice_verification"
}
Notice that you will need to send the phrase generated in the previous request. The phrase should be placed in requested_phrase
3.3.3 Sample response
There are two possible responses:
- Direct Response will return the following response with the HTTP code 200:
{\"text\": \"ALORS UN DERNIER TEST AVANT D AVOIR UN CAF\\u00c9 AVEC DES BACS JE SUIS TR\\u00c8S CONTENT JE VEUX FAIRE\", \"emotion\": \"Emotion detection not enabled\", \"hate\": 0, \"toxicity\": 0, \"profanity\": 0, \"age\": \"adulthood\", \"ageConfidence\": 1.0, \"gender\": \"Detection not enabled\", \"genuineControlScore\": 0.11111111111111116, \"transaction_id\": \"91ac660e-4426-49b7-9feb-afd5ff14267e\"}
genuineConfidenceScore
indicates our level of confidence for a bona fide test. We recommend to consider the test as spoof if this value is below 0.65age
: “adulthood” if the person is deemed as 18+. Otherwise it is a minorageConfidence
: The confidence level at which we made the age prediction. The values will be 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0text
: the output of the French speech-to-text module. Used in spoof detection- Notice that the
transaction_id
is the same as the one in the request body.
- Queued Response will return the following response with the HTTP code 202:
{"transaction_id": "91ac660e-4426-49b7-9feb-afd5ff14267e"}
Notice that the transaction_id
is the same as the one in the request body. You may use this transaction_id
in the polling query below:
3.4 Request #3: Poll estimation results
If you receive an HTTP 202 from Request #2, you must use the same URL with the following request body to poll the results
3.4.1 Sample Request Body
{
"transaction_id": "91ac660e-4426-49b7-9feb-afd5ff14267e", // from the previous response from server
"request_type": "poll_transaction",
"client_id": "<your_id>",
"client_password": "<your_secret>"
}
3.4.2 Sample Response
If the estimation is not yet complete, you will receive HTTP 202 with a response body identical to your request body.
If the estimation is complete, you will receive HTTP 200 with the response body similar to the Direct Response of the Request #2.
4. Constraints for Custom Data Captures
Applies to: Integrations with Custom Data Captures
The input to VoiceAssure must respect to a number of constraints in order to have the best estimation outcomes. In case you decide to implement your own data capture, please consider the following constraints as part of your development guidelines:
Reliability Metric | Constraints | Interpretation |
---|---|---|
Audio format | Use Default Browser settings, WEBM / MP4 / WAV, 2-channel, 44100 kHz | If you’re using Server-Based APIs, the format conversion will be done automatically for you. For on-browser builds, we recommend you to keep the default settings on the client browser. |
Speaker count | = 1 | We currently do not support age estimation of multiple speakers |
Audio Clip Length (seconds) | > 5 | Shorter audio clips do not guarantee reliable outcomes. |
Root Mean Square (Loudness and Noise) | Mean(RMSE) < 0.09 Max(RMSE) > 0.17 |
Loud music, whispering, being too distant from the microphone, second person speaking, etc. can tamper the audio quality via these metrics |
Zero-Crossing Rate (Noise) | Min(ZCRate) < 0.02 | Especially outdoors noises can increase this metric, producing unreliable results |
Voice Print Match Rate | 0 - 0.5: Reject match 0.5 - 0.7: Unreliable match 0.7-1.0: Reliable match |
Recommendations for reverification purposes. |
Speech-to-text match rate | 0 - 0.5: Reject match 0.5 - 0.6: Non-native speaker match 0.6 - 1.0: Accept |
Recommendations for liveness check purposes |