Abstract:
This paper analyses a simple epsilon-greedy exploration approach to train models with Deep Q-Learning algorithm to involve randomness that helps prevail the agent over conforming to a single solution. This allows the agent to explore different solutions for a problem even after finding a solution. This helps the agent find the global optimum solution without being stuck in a local optimum. A simple block environment is built and used to assess the agent’s ability to reach the destination, block A to reach block B. The model is trained repeatedly by feeding the game image and rewarding it based on the decisions made. The weights of the Neural Network of the Reinforcement Learning model are then adjusted by training the model after every iteration to improve the result. Furthermore, two different environments from the Gym library in Python is used to corroborate the results obtained. Here we have used TensorFlow to build and implement the model on the GPU for better and accelerated computation.