XaiJu
deeplizard
deeplizard

patreon


Code for Reinforcement Learning course

Full access to code files and resources from the Reinforcement Learning course.
Find out more about this code at https://deeplizard.com/resources

Code for Reinforcement Learning course

Comments

Ok, I see. Yes, as others are mentioning in the github link, batchnorm is not appropriate with a batch size of 1 during training. However, note that when we call agent.select_action(), which returns the line you mentioned: policy_net(state).argmax(dim=1).to(self.device) # exploit Note that this is not occurring during training. This call is occurring during inference, and it is fine to have a batch size of 1 during inference, even when batchnorm is present in the network. During training, we are using a larger batch size by sampling from replay memory, and batchnorm would be appropriate to experiment with here. The reason why the error is occurring when we call agent.select_action() is because the batchnorm layers are still in β€œtraining mode,” and so it fails when we pass a batch of 1 here, even though we’re attempting to only use the model for inference. To turn training mode off, we need to set the model to eval mode. Code summary to implement this change: I added batchnorm layers to the existing model, as shown below. class DQN(nn.Module): def __init__(self, img_height, img_width): super().__init__() self.fc1 = nn.Linear(in_features=img_height*img_width*3, out_features=24) self.bn1 = nn.BatchNorm1d(24) self.fc2 = nn.Linear(in_features=24, out_features=32) self.bn2 = nn.BatchNorm1d(32) self.out = nn.Linear(in_features=32, out_features=2) def forward(self, t): t = t.flatten(start_dim=1) t = self.bn1(F.relu(self.fc1(t))) t = self.bn2(F.relu(self.fc2(t))) t = self.out(t) return t Then, in our Agent class select_action() function, I modified the else block, also shown below, to put the model in eval mode before passing a single state to the network, and then setting it back to training mode after doing so, so that the batchnorm layers will continue to be trained during the actual training process that occurs after the call to select_action() in our main program. else: with torch.no_grad(): policy_net.eval() action = policy_net(state).argmax(dim=1).to(self.device) # exploit policy_net.train() return action

It was an error. I don’t the the code right here, can generate it this evening. However it was the same issue as this. https://github.com/pytorch/pytorch/issues/4534

Does it give you an error when you pass a sample in to this model, or are you just not seeing good performance with it? If it gives an error, what does it say?

i'm confused again. in the DQN example, this line return policy_net(state).argmax(dim=1).to(self.device) # exploit passes one state into the policy_net. after watching https://www.youtube.com/watch?v=bCQ2cNhUWQ8&list=PLZbbT5o_s2xrfNyHZsM6ufI0iZENK9xgG&index=40&t=88s, i thought i'd change the architecture to # model class DQN(nn.Module): def __init__(self, n_inputs, n_action, hidden_dim=24): super().__init__() self.fc1 = nn.Linear(n_inputs, out_features=hidden_dim) self.bn1 = nn.BatchNorm1d(hidden_dim) self.fc2 = nn.Linear(in_features=24, out_features=32) self.bn2 = nn.BatchNorm1d(32) self.out = nn.Linear(in_features=32, out_features=n_action) def forward(self, t): t = self.bn1(F.relu(self.fc1(t))) t = self.bn2(F.relu(self.fc2(t))) t = self.out(t) return t however, that does not work when passing just 1 sample in. is BN just not a good candidate in this case?

Yes, you would just gauge the performance across your different your different models to see which model yields best results. Mean reward over n episodes would be a good metric to use. There is no test set equivalent in this scenario. .

in general, if I change the model and want to see if I'm improving, Is there an equivalent of a test set and what metic is generally used? mean of the reward over n number of episodes?

Yes, this is the render problem I was previously referring to. There appears to be some "hacks" that people have attempted to get around this when using OpenAI's Gym in a Colab notebook, but unfortunately there doesn't appear to be any straight forward solution for this.

heres the error. is that the same gym error you're referring to? NoSuchDisplayException Traceback (most recent call last) in () 2 em = CartPoleEnvManager(device) 3 em.reset() ----> 4 screen = em.render('rgb_array') 5 6 plt.figure() 9 frames /usr/local/lib/python3.6/dist-packages/pyglet/canvas/xlib.py in __init__(self, name, x_screen) 121 self._display = xlib.XOpenDisplay(name) 122 if not self._display: --> 123 raise NoSuchDisplayException('Cannot connect to "%s"' % name) 124 125 screen_count = xlib.XScreenCount(self._display) NoSuchDisplayException: Cannot connect to "None"

Hey Matthew - What errors are you getting and on which line(s) of code? There are general issues (not specific to this code) with attempting to render a Gym environment in a Colab notebook. This is because Gym attempts to render the environment to a pop-up window on the local server where the code is executing, not the browser where you are running the notebook. Is this the issue you're running into?

I tried running the notebook for the deep q network in colab but it gave errors. Has it been run recently? If it does work, is it possible to make it a public colab notebook?

Hey Ten5ei - You're welcome! The in_features parameter in a nn.Linear layer requires the size of *each* input sample. Therefore, we can pass a batch of data to the network, and the network understands that within that batch, the shape of each sample will be em.height*em.width*3.

@deeplizard thank you that helped me alot! One more question though about calculating q values. When using the "get_current" method where a specified batch of states is passed into the network, does the network take in the batch of states(256 states) at once and gives a predition? If so, how does the network adjust its inputs because the initial inputs were set to (em.height* em.width*3) and considering it takes in a batch of states the input from the "get_current" seems to be taking (256 *em.height* em.width*3) more than the expected input of (em.height* em.width*3)

Hey Ten5ei - In an earlier episode of the RL course, we discuss passing a stack of multiple frames from an environment (the game breakout) as an example, as opposed to one single image. This is to allow the agent to take into account the movement that is going on in the environment, which cannot be seen from one single frame. When we develop the code for the cart and pole environment, we use two frames to represent a single state. We take the difference between these two frames and compose one single image, which gets passed to the network. This single image will show information about the velocity of the pole and in which direction it is moving . This is all explained in this episode: https://deeplizard.com/learn/video/jkdXDinWfo8 In the blog, you can read about it starting in the section called Getting The State Of The Environment, and reading further on you will see examples of what a single state (the difference between the last two frames) looks like before being passed to the network.

Hi, I have a question regarding the deep reinforcement learning tutorial. On the video, the network takes 3 framesfrom the current environment which will then be used to predict the output. However I was looking through the code tring to figure out where exactly the 3 frames are passed. I was quite confused as the network is seemingly seen to take only 1 frame..

Hey Yichen - In the video, the OpenAI Gym cart and pole environment is rendered in a separate pop-up window, not actually within the Jupyter notebook. I have not tried to render a Gym environment within a notebook before, but I just googled for it, and it appears there are some answers at the link below that may be helpful for you. https://stackoverflow.com/questions/50107530/how-to-render-openai-gym-in-google-colab

Hi, I'm trying to run the notebook in Google Colab, the display is not working. Inside the plot function, plot.figure(2) set up two figures to plot, only the moving average was plotted, the cartPloe image was not. How do you plot the figures side by side in the vedio?

Hi Deeplizard! Thanks for the great work! can't wait for your next videos on RL &DQN!

Can I please get them ? And also, is there any plans for more tutorials ? it would be really good if you could focus on robotics

Hey guys it looks like DQN Cart Pole is missing some parts, the code stops at agent class and there is no code for the last 2 videos of the series (17 & 18 )

Hey Wassim - I myself haven't spent enough time on model-free RL implementation to have any solid resources I can vouch for. If you end up implementing this for Frozen Lake, I would love to hear how it goes.

If I want to implement Frozen Lake *without* a model, what would you recommend the states and the actions to be? Can you point me to some RL examples that are model-free?

If you look at our example of Frozen Lake, for example, we can see that our agent is learning how to perform a specific task (reaching the frisbee) without using any explicit instructions. This task was accomplished using value iteration. Therefore, I'd say that this is indeed a machine learning problem.

Is the method of using value iteration an example of machine learning?

Hey David - I believe that the full 10,000 episodes only took a few minutes to complete on my side. What are the specs of the machine you're running? Perhaps too little memory?

So how much time should training for the Frozen Lake game take. I've copied the code but it only does about 10 episodes per 6 seconds. So 10000 episodes is going to take about 6000 (< 2hrs) seconds. Does this sound reasonable?

Hey Wassim - The next videos coming to the RL series will be on DQN code implementation. At that point, the RL code here will be updated with the DQN code.

Is there code for the DQN mechanisms? If not, is there a plan to share these?

Yes, it's completely down 😭 Twitter is going crazy!

Oops! Youtube is down!

I get "500 Internal Server Error" trying to watch this video. I tried two dif. browsers, any suggestion?


More Creators