Make Audio Recordings with Actions on Google

handleSignedInUser(authUser) : handleSignedOutUser();});Once the user has signed in, we display two buttons that allow the users to sign out or delete the account for the app.

Recording audioTo get access to the microphone from a web page, we need to use the MediaDevices getUserMedia() method, which prompts the user for permission to use a media input device to record audio:navigator.

mediaDevices.

getUserMedia({audio: true, video:false}).

then(function(stream) { .

}).

catch(function(err) { .

});Most modern browsers support the Web Audio API, which provides advanced audio playback, recording, and visualization support.

WebAudioRecorder.

js is a JavaScript library that uses the Web Audio API to record audio input and supports encoding to several audio file formats, including OGG.

To use WebAudioRecorder.

js, the following script needs to be loaded in the HTML:<script src="javascripts/WebAudioRecorder.

min.

js"></script>In the getUserMedia callback, invoked after the user has given permission to access the microphone, an audio recorder object is created:var audioContext = new AudioContext();var audioSource = audioContext.

createMediaStreamSource(stream);webAudioRecorder = new WebAudioRecorder(audioSource, { workerDir: 'javascript/', encoding: 'ogg', onEncoderLoading: (recorder, encoding) => { console.

log('onEncoderLoading'); }, onEncoderLoaded: (recorder, encoding) => { console.

log('onEncoderLoaded'); }, onEncodingProgress: (recorder, progress) => { console.

log('onEncodingProgress: ' + progress); }, onComplete: (recorder, blob) => { console.

log('onComplete'); persistFile(blob); }});The audio recorder object uses a web worker to load the following WebAudioRecorder.

js encoder files from the javascript/ directory:WebAudioRecorderOgg.

min.

jsOggVorbisEncoder.

min.

js.

memThe audio recorder object is then configured to meet the OGG requirements for the Actions on Google media player:webAudioRecorder.

setOptions({ timeLimit: 180, // max number of seconds for recording encodeAfterRecord: true, // encode the audio data after recording ogg: { bitRate: 160 // 160 Hz bitrate }});Cloud StorageOnce WebAudioRecorder.

js has completed the encoding, its onComplete event handler is invoked with the blob data for the encoded file.

We will use Cloud Storage for Firebase for persisting the file and making it publicly available via HTTP.

Firebase adds client SDKs for use in mobile apps, built on top of products like Google Cloud Storage.

The following scripts are required to use Cloud Storage:<script src= "https://www.

gstatic.

com/firebasejs/5.

8.

1/firebase-app.

js"></script><script src= "https://www.

gstatic.

com/firebasejs/5.

8.

1/firebase-storage.

js"></script>We then initialize Cloud Storage access using Firebase:var storageService = firebase.

storage();var storageRef = storageService.

ref();var metadata = { contentType: 'audio/ogg' // OGG mime type};We then use an uploadTask to track the progress of the file being uploaded to Cloud Storage:var uploadTask = storageRef.

child('files/' + (new Date().

toISOString()) + '.

ogg').

put(blob, metadata);uploadTask.

on(firebase.

storage.

TaskEvent.

STATE_CHANGED, (snapshot) => { var progress = (snapshot.

bytesTransferred / snapshot.

totalBytes) * 100; console.

log('Upload is ' + progress + '% done'); switch (snapshot.

state) { case firebase.

storage.

TaskState.

PAUSED: console.

log('Upload is paused'); break; case firebase.

storage.

TaskState.

RUNNING: console.

log('Upload is running'); break; }}, (error) => { .

}, () => { // Upload completed successfully .

});});Cloud Storage security rules are used to require Firebase Authentication in order to perform any read or write operations on all files.

Once the file has been uploaded, we need to retrieve the HTTP URL to access the file and then persist that in a database to track all the recordings.

We will use Cloud Firestore, which is a cloud-hosted, NoSQL, realtime database, for persisting the recording metadata.

We’ll add a document to a collection called “files” that contains the metadata for the audio uploaded to Cloud Storage:<script src= "https://www.

gstatic.

com/firebasejs/5.

8.

1/firebase-firestore.

js"></script>For each recording we track the user ID, a durable HTTPS URL that anyone can use to download the contents of the file, and a timestamp:const db = firebase.

firestore();uploadTask.

snapshot.

ref.

getDownloadURL().

then((downloadURL) => { console.

log('File available at: ' + downloadURL); db.

collection("files").

add({ user: user.

uid, url: downloadURL, timestamp: firebase.

firestore.

FieldValue.

serverTimestamp() }) .

then((docRef) => { console.

log("Document written with ID: ", docRef.

id); }) .

catch((error) => { console.

error("Error adding document: ", error.

message); });});Notice that we’re adding the authenticated user’s UID as a field in the new document.

This informs the Firestore security rules for this project who is allowed to later modify and delete this document.

So, that covers the main features of our web app.

Now we can move on to the design of our Action.

Assistant ActionNext, we’ll implement a conversational Action using Dialogflow.

When the user invokes the Action, the latest recording is played back to the user using the Actions on Google media player.

So, the Dialogflow agent for our Action is very simple in that it mostly just needs a main welcome intent.

The welcome intent handler responds with an SSML audio tag.

For fulfillment, we use the Dialogflow inline editor, which automatically provisions a Cloud Function for your agent.

The Node.

js code for the function needs to use the Firebase Admin SDK which provides access to Firebase and Google Cloud resources in server side code.

We will use it to read the latest recording data from Cloud Firestore:const admin = require('firebase-admin');admin.

initializeApp();const db = admin.

firestore();The intent handler then reads the data from Firestore and generates a response that uses the SSML audio tag to play the latest recording.

For this prototype, we will be using a simple Cloud Firestore query to determine latest recording, but in a production quality app you will need to track uploads and their status more granularly:app.

intent('Default Welcome Intent', (conv) => { return db.

collection('files').

orderBy('timestamp', 'desc').

limit(1).

get() .

then(snapshot => { if (snapshot.

size > 0) { snapshot.

forEach(doc => { conv.

close(`<speak> <par> <media xml:id="intro"> <speak>Welcome to the Audio Demo.

Here's the latest recording:</speak> </media> <media xml:id="introSound" begin="intro.

end+0.

5s" soundLevel="5dB" fadeOutDur="1.

0s"> <audio src="${INTRO_SOUND_URL}"/> </media> <media xml:id="recording" begin="introSound.

end+0.

5s"> <audio src="${doc.

data().

url.

replace(/&/g, '&')}"/> </media> <media xml:id="endSound" begin="recording.

end+0.

5s"> <audio src="${OUTRO_SOUND_URL}"/> </media> <media xml:id="bye" begin="endSound.

end+1.

0s"> <speak>Bye for now.

Hope to see you soon.

</speak> </media> </par> </speak>`); } else { conv.

close('There are currently no recordings.

Please try again later.

'); } }) .

catch(err => { console.

log('Error getting documents', err); conv.

close('Oops! Something went wrong.

Please try again later.

'); });});Note that the HTTP URL for the audio recording contains ‘&’ characters, which will clash with the SSML syntax and need to be encoded to ‘&’.

If you want to know more about the powerful capabilities of SSML, then read our previous post “Advanced SSML for Actions on Google”.

Next stepsOur web app and Action is quite simple, but could be expanded into various other use cases — maybe a CMS for podcasters, a voice social network, or even some kind of collaborative voice game?We’ve shown technically how easy it is to let users record audio that can be used in Actions.

If you want to give users more control over the playback, you an update the fulfillment code to rather use the media player.

Now it’s up to you to take this code and turn it into something more interesting.

The code has been open sourced on Github for you to customize.

We can’t wait to see what you come up with!Want more?.Head over to the Actions on Google community to discuss Actions with other developers.

Join the Actions on Google developer community program and you could earn a $200 monthly Google Cloud credit and an Assistant t-shirt when you publish your first app.

.

. More details

Leave a Reply