Here's an example of the result of an analysis performed by Foyer Insight:

"classifications": [
"confidence": 0.9996496438980103,
"name": "kitchen",
"rank": 1
"confidence": 0.9999802112579346,
"name": "indoor",
"rank": 2
"detections": [
"class": "dishwasher",
"area": 0.055518,
"boundingBox": [
"confidence": 0.8126423358917236,
"attributes": []
"class": "floor",
"area": 0.66851,
"boundingBox": [
"confidence": 0.7137435674667358,
"attributes": [{
"name": "floor_type",
"value": "hardwood",
"confidence": 1

Let's break down the sections we're seeing.


Each classifier has three fields: confidence, name, and rank.

  • Confidence is how certain Foyer Insight is of the classification. This is a number between 0 and 1, which alternatively can be thought of as a percentage. In this case, Insight is almost certain we've provided an image of an indoor kitchen.

  • Name is the full name of the image's classification.

  • Rank is a number greater than zero. Classifications are ranked in order of highest to lowest confidence, with an exception for indoor and outdoor classifications. Indoor and outdoor classifications will always be ranked last.


Each detection has five fields: area, attributes, boundingBox, class, and confidence. A detection is an individual object or group of similar objects that Insight found on the image.

  • Class is the full name of the detection's class.

  • Area is the percentage of pixels that a segmentation covers out of the entire image. For example, a floor segmentation with an area of 0.53 means that Insight has detected the floor covers 53% of the pixels in the image.

  • Attributes are extra details about the detection. It could be an empty array or contain pertinent information about the segmentation. For example, all floor detections will contain a floor_type attribute.

  • BoundingBox is the smallest box that contains all pixels of a segmentation. It is made up of four numbers, which are really two (x, y) coordinate pairs representing the upper-left and bottom-right corners, respectively.

    • The coordinates assume the upper-left hand corner of the image as the origin point.

    • As with all of Insight's size-based outputs, the points are percentage-based. Multiply the x positions by your image's final width and the y positions by your image's final height to get the coordinates of a segmentation's bounding box.

    • For example, the values [0.3, 0.156, 0.564, 0.356] represent a point at (0.3*w, 0.156*h) and (0.564*w, 0.356*h).

  • Confidence is how certain Insight is of the class for a detection. As with a classification's confidence, it is a number between 0 and 1. In our example, Insight is about 81% certain that its detection of a table is correct.