Building a Chat Bot With Image Recognition and OCR

This flow is purely arbitrary and simplified for purposes of this tutorial (i.e. don’t email me about how it can be improved).Let’s try it on the following picture:And the results (truncated):description: "cuisine", score: 0.9247923493385315, confidence: 0.0, topicality: 0.9247923493385315, bounds: 0, locations: 0, properties: {}description: "sushi", score: 0.9149415493011475, confidence: 0.0, topicality: 0.9149415493011475, bounds: 0, locations: 0, properties: {}description: "food", score: 0.899940550327301, confidence: 0.0, topicality: 0.899940550327301, bounds: 0, locations: 0, properties: {}description: "japanese cuisine", score: 0.8769422769546509, confidence: 0.0, topicality: 0.8769422769546509, bounds: 0, locations: 0, properties: {}Since there were no landmarks or text, we have received the labels the API was able to detect..In this case, we see that it has been identified as “sushi.” In my experience with the label detection results, the second label (having the second highest topicality) tends to be how an average person would identify the picture.Let’s give it another go on the following:The output (again truncated):description: "wildlife", score: 0.9749518036842346, confidence: 0.0, topicality: 0.9749518036842346, bounds: 0, locations: 0, properties: {}description: "lion", score: 0.9627781510353088, confidence: 0.0, topicality: 0.9627781510353088, bounds: 0, locations: 0, properties: {}description: "terrestrial animal", score: 0.9247941970825195, confidence: 0.0, topicality: 0.9247941970825195, bounds: 0, locations: 0, properties: {}And there we see it, with “lion” being the second hit.Ok, another for good measure, let’s try some text extraction:Just a screenshot of my text editorAnd let's see what we get:2.4.2 :022 > puts analyze_image("major_general.png")I am the very model of a modern Major-General,I've information vegetable, animal, and mineral,I know the kings of England, and I quote the fights historicalFrom Marathon to Waterloo, in order categorical;I'm very well acquainted, too, with matters mathematical,I understand equations, both the simple and quadratical,About binomial theorem I'm teeming with a lot o' news, (bothered for a rhyme)With many cheerful facts about the square of the hypotenuse..=> nil2.4.2 :023 >Not bad.Ok, last one for completeness’ sake..Let’s try a landmark:And our method gives us:2.4.2 :030 > puts analyze_image(“statue_of_liberty.jpg”)Statue of LibertyOk, so our method is working as intended, now let’s actually use it with our chatbot.When we send an image to our chatbot through our client (Line), the client returns the image data in the response body (along with other relevant information) to our callback..Because our image recognition method needs a file path, we will have to save the aforementioned image data to our local machine.Let’s modify our method to do that..Change the relevant parts of your callback method to the following:There is a bit going on here..First, we are creating a new Tempfile, and using the response body (image data) as its content..We’re then passing the tempfile’s path to the analye_image method we just tested in the console..Let’s try it with our bot, just as a sanity check.Such a nice bot…And it was able to successfully identify a landmark for us.Our bot is now just working as a glorified console print line, and that’s not very chatty at all..We want this thing to sound more natural, let’s clean up our method a bit to make it sound more “human”.In fact, it is.Let’s make the necessary changes to our code..We’ll be modifying an existing method analyze_image and creating a new method get_analyze_image_response..Here it is below:Again, this is not a tutorial about Ruby, but rather concepts; however, let’s go over what we’ve just done.. More details

Leave a Reply