Solvedtensorflow deeplab resnet Error on evaluate.py (Tensorflow 1.3)

Hi,

I can train the model flawlessly, but I get the following errors when I run evaluate.py . I use Tensorflow 1.3:

Restored model parameters from data/model.ckpt-900
2017-07-31 04:54:58.479363: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: assertion failed: [`labels` out of bound] [Condition x < y did not hold element-wise:x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_2:0) = ] [2]
	 [[Node: mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch/_1111, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1/_1113, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_3, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2/_1115)]]
2017-07-31 04:54:58.479431: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: assertion failed: [`labels` out of bound] [Condition x < y did not hold element-wise:x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_2:0) = ] [2]
	 [[Node: mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch/_1111, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1/_1113, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_3, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2/_1115)]]
Traceback (most recent call last):
  File "evaluate.py", line 127, in <module>
    main()
  File "evaluate.py", line 119, in main
    preds, _ = sess.run([pred, update_op])
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1118, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1315, in _do_run
    options, run_metadata)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [`labels` out of bound] [Condition x < y did not hold element-wise:x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_2:0) = ] [2]
	 [[Node: mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch/_1111, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1/_1113, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_3, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2/_1115)]]

Caused by op u'mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert', defined at:
  File "evaluate.py", line 127, in <module>
    main()
  File "evaluate.py", line 98, in main
    mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(pred, gt, num_classes=args.num_classes, weights=weights)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/contrib/metrics/python/ops/metric_ops.py", line 2245, in streaming_mean_iou
    updates_collections=updates_collections, name=name)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/metrics_impl.py", line 915, in mean_iou
    num_classes, weights)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/metrics_impl.py", line 285, in _streaming_confusion_matrix
    labels, predictions, num_classes, weights=weights, dtype=cm_dtype)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/confusion_matrix.py", line 176, in confusion_matrix
    labels, num_classes_int64, message='`labels` out of bound')],
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/check_ops.py", line 401, in assert_less
    return control_flow_ops.Assert(condition, data, summarize=summarize)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py", line 175, in wrapped
    return _add_should_use_warning(fn(*args, **kwargs))
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 132, in Assert
    condition, no_op, true_assert, name="AssertGuard")
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 296, in new_func
    return func(*args, **kwargs)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1838, in cond
    orig_res_f, res_f = context_f.BuildCondBranch(false_fn)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1704, in BuildCondBranch
    original_result = fn()
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 130, in true_assert
    condition, data, summarize, name="Assert")
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_logging_ops.py", line 37, in _assert
    summarize=summarize, name=name)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2619, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/hesam/Downloads/TF/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1205, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): assertion failed: [`labels` out of bound] [Condition x < y did not hold element-wise:x (mean_iou/confusion_matrix/control_dependency:0) = ] [0 0 0...] [y (mean_iou/ToInt64_2:0) = ] [2]
	 [[Node: mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert = Assert[T=[DT_STRING, DT_STRING, DT_INT64, DT_STRING, DT_INT64], summarize=3, _device="/job:localhost/replica:0/task:0/cpu:0"](mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch/_1111, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_0, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_1, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_1/_1113, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/data_3, mean_iou/confusion_matrix/assert_less/Assert/AssertGuard/Assert/Switch_2/_1115)]]
15 Answers

✔️Accepted Answer

Thanks for the info @DrSleep ! I had the same problem and I've used your tip and everything works fine in evaluate.py. These are the changes I've made in evaluate.py:

I've replaced lines 97 and 98:

weights = tf.cast(tf.less_equal(gt, args.num_classes - 1), tf.int32) # Ignoring all labels greater than or equal to n_classes.
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(pred, gt, num_classes=args.num_classes, weights=weights)

for:

indices = tf.squeeze(tf.where(tf.less_equal(gt, num_classes - 1)), 1)  # ignore all labels >= num_classes
gt = tf.cast(tf.gather(gt, indices), tf.int32)
pred = tf.gather(pred, indices)
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(pred, gt, num_classes=num_classes)

I hope this comment will be useful!

Other Answers:

The year is 2018 and yet that problem exist even in the latest deeplab version in tensorflow/models. Thanks to @amlarraz I managed to overcome this issue.

Looks like an explicit assertion has been added in TF1.3 when computing mIoU that checks whether ground truth labels are less than the number of classes.
It can be overcome using the same strategy as in train scripts:

pred = tf.reshape(pred, [-1,])
gt = tf.reshape(label_batch, [-1,])
indices = tf.squeeze(tf.where(tf.less_equal(raw_gt, args.num_classes - 1)), 1) ## ignore all labels >= num_classes
gt = tf.cast(tf.gather(gt, indices), tf.int32)
pred = tf.gather(pred, indices)
mIoU, update_op = tf.contrib.metrics.streaming_mean_iou(pred, gt, num_classes=args.num_classes)

More Issues: