Choosing the right TensorFlow loss for multi-label vs. multi-class tasks; A guide
If you are using Tensorflow and confused with dozen of loss functions for multi-label and multi-class classification, Here you go : In both cases, classes should be one hot encoded
import tensorflow as tf
cross_entropy = tf.nn.sigmoid_cross_entropy_with_logits(logits=logits, labels=tf.cast(targets,tf.float32))
loss = tf.reduce_mean(tf.reduce_sum(cross_entropy, axis=1))
prediction = tf.sigmoid(logits)
output = tf.cast(prediction > threshold, tf.int32)
train_op = tf.train.AdamOptimizer(0.001).minimize(loss)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(
logits=logits, labels=one_hot_y
)
loss = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss)
predictions = tf.argmax(logits, axis=1, name="predictions")
y_true = tf.argmax(true_labels, axis=1, name="tru_pre")
accuracy = tf.reduce_mean(tf.cast(tf.equal(predictions, y_true), tf.float32))
# or
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=Ylogits, labels=Y_)
cross_entropy = tf.reduce_mean(cross_entropy)
# accuracy of the trained model, between 0 (worst) and 1 (best)
correct_prediction = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# more detailed
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(
logits=logits, labels=tf.cast(targets, tf.float32)
)
loss = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(oss)
logit_soft = tf.nn.softmax(logits, name="prob")
predictions = tf.argmax(logit_soft, axis=1, name="predictions")
y_true = tf.argmax(targets, axis=1, name="ground_truth")
accuracy = tf.reduce_mean(tf.cast(tf.equal(predictions, y_true), tf.float32))
There is still doubt between :
From Stack Exchange here is really clear explanation
In supervised learning, one doesn’t need to backpropagate to labels. They are considered fixed ground truth and only the weights need to be adjusted to match them.
But in some cases, the labels themselves may come from a differentiable source, another network. One example might be adversarial learning. In this case, both networks might benefit from the error signal. That’s the reason why tf.nn.softmax_cross_entropy_with_logits_v2 was introduced
Note that when the labels are the placeholders (which is also typical), there is no difference if the gradient through flows or not, because there are no variables to apply the gradient to.
So when you are dealing with simple multi-class classification, Go with tf.nn.softmax_cross_entropy_with_logits_v2