Loss Functions for Multi-label and Multi-class Classification

Choosing the right TensorFlow loss for multi-label vs. multi-class tasks; A guide

If you are using Tensorflow and confused with dozen of loss functions for multi-label and multi-class classification, Here you go : In both cases, classes should be one hot encoded

For Multi-label classification

import tensorflow as tf

cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=tf.cast(targets,tf.float32))
loss = tf.reduce_mean(tf.reduce_sum(cross_entropy, axis=1))
prediction = tf.sigmoid(logits)
output = tf.cast(prediction > threshold, tf.int32)
train_op = tf.train.AdamOptimizer(0.001).minimize(loss)

For Multi-class classification

cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=logits, labels=one_hot_y
)
loss = tf.reduce_mean(cross_entropy)

optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)


predictions = tf.argmax(logits, axis=1, name="predictions")
y_true = tf.argmax(true_labels, axis=1, name="tru_pre")
accuracy = tf.reduce_mean(tf.cast(tf.equal(predictions, y_true), tf.float32))


# or

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits, labels=Y_)
cross_entropy = tf.reduce_mean(cross_entropy)

# accuracy of the trained model, between 0 (worst) and 1 (best)
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


# more detailed

cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=logits, labels=tf.cast(targets, tf.float32)
)
loss = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(oss)

logit_soft = tf.nn.softmax(logits, name="prob")
predictions = tf.argmax(logit_soft, axis=1, name="predictions")

y_true = tf.argmax(targets, axis=1, name="ground_truth")
accuracy = tf.reduce_mean(tf.cast(tf.equal(predictions, y_true), tf.float32))

There is still doubt between :

From Stack Exchange here is really clear explanation

In supervised learning, one doesn’t need to backpropagate to labels. They are considered fixed ground truth and only the weights need to be adjusted to match them.

But in some cases, the labels themselves may come from a differentiable source, another network. One example might be adversarial learning. In this case, both networks might benefit from the error signal. That’s the reason why tf.nn.softmax_cross_entropy_with_logits_v2 was introduced

Note that when the labels are the placeholders (which is also typical), there is no difference if the gradient through flows or not, because there are no variables to apply the gradient to.

So when you are dealing with simple multi-class classification, Go with tf.nn.softmax_cross_entropy_with_logits_v2

Resources