Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • sensors: Camera (stereo or mono)

  • actuators: None

  • services: X

  • parameters (note that the following parameters are hard-coded at the top of the file object_detection_service.py):

    • Threshold: float, sets the confidence level threshold. Default: 0.7

    • DPI: int, sets the number of Detections Per Image. Default: 100

    • MODEL: str, path to the model file (.pkl). Default: model_final_f10217.pkl

    • MODEL_PATH: str, path to the model configuration file (.yaml). Default: 'COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml'

...

Protobuf output The Protobuf output is used to output the segmentation masks. The Protobuf is build as follows:

'source': str

...

Code Block
languagepy
{'intent': '[YOUR_INTENT]', 
'parameters': 
    {'[YOUR_PARAMETER]': '[PARAMETER_RESPONSE]'}, 
'confidence': [CONFIDENCE_VALUE], 
'text': '[RESPONSE_TEXT]', 
'source': 'audio'}

...

'intent': str

  • the intent on which the audio was recognised, corresponding to the intent set on the agent

...

‘parameters’: dict

  • the parameters defined in the agent

  • each parameter is a str key, with the its response as str value pairing

...

'confidence': int

  • number ranging from 0 to 100 that defines how confident the API is with the intent and text detection

...

‘text’: str

  • speech-to-text response from the API

image_masks = ImageMasks()
image_masks.timestamp_ms  # timestamp of image in miliseconds
image_masks.mask_width  # width in pixels of mask
image_masks.mask_height  # height in pixels of mask
image_masks.mask_count  # number of detected objects
image_masks.masks  # Python array (list) of booleans

Such a Protobuf object can be 'unpacked' to obtain the original masks again:

Code Block
orginal_masks = array(image_masks.masks).reshape((image_masks.mask_count, image_masks.mask_height, image_masks.mask_width))
orginal_masks = orginal_masks.astype(bool)

As you can see the shape of orginal_masks is (N, H, W), where N is the number of masks, H the height in pixels, and W the width in pixels.

This Protobuf output is added to the zrange of the segmentation_stream as a serialized Protobuf object. A zrange is the redis-implementation of a Python dictionary. The timestamp_ms is used as key, where the serialized Protobuf is the value.

Initialisation

Using the service

In order to use our service for your purposes, an instance of the BasicSICConnector class has to be created. You can find the details of this class here. You may also need a class to manage speech_recognition attempts and want to write a callback function for retrieving a recognized entity object from the detection result.

...

  1. You have the relevant services and drivers running.

  2. To pass your local IP address, Dialogflow key file pathinstance of BasicSIC connector, and Dialogflow agent ID, when creating an instance of BasicSIC connectorActionRunner.

  3. A partial callback function is set up for retrieving a recognized entity object from the detection result.

...

  • onAudioIntent

    • a new intent is detected

  • IntentDetectionDone

    • a new intent has finished being detected

  • onAudioLanguage

    • the audio language has been changed

  • LoadAudioDone

    • if an audio file is used, the event is raised when the file has finished being loaded

Known Issues

  • There is a rare bug where sometimes Dialogflow will suddenly only respond with ‘UNAUTHENTICATED’ errors. Restarting Docker and/or your entire machine seems to be the only way to resolve this.None