Suggested solution mandatory assignment 2

You may look at the suggested solution here. Most of you had the implementation correct, or almost correct. Unfortunately some of you missed the correction of the assigment about negating the policy loss. You then ended up trying to minimize the return, and not surprisingly, did not get the results you wanted! A few people used numpy operations in calculating the losses, this does not unfortunately not work as TensorFlow is then not able to calculate the gradients with respect to the loss.

Publisert 9. nov. 2019 14:35 - Sist endret 9. nov. 2019 14:37