'model.learn(total_timesteps=500000) not causing model improvement in a custom open ai gym environment
I am trying to follow along a tutorial made by a popular youtuber about custom openai gym environments, but unable to replicate his results.
I initially setup my model as
model = PPO("MlpPolicy", env, verbose=1, tensorboard_log=log_path)
trained it for 500K steps
model.learn(total_timesteps=500000)
but it doesn't seem to improve at all, with reward staying at 0, and std ranging between 58-60 I checked this with
episode_result = evaluate_policy(model, env, n_eval_episodes=10)
print("reward: {} std: {} ".format(episode_result[0], episode_result[1]))
the custom env is
class ShowerEnv(Env):
def __init__(self):
# Actions we can take, down, stay, up
self.action_space = Discrete(3)
# Temperature array
self.observation_space = Box(low=np.array([0]), high=np.array([100]))
# Set start temp
self.state = 38 + random.randint(-3,3)
# Set shower length
self.shower_length = 60
def step(self, action):
# Apply action
self.state += action -1
# Reduce shower length by 1 second
self.shower_length -= 1
# Calculate reward
if self.state >=37 and self.state <=39:
reward =1
else:
reward = -1
# Check if shower is done
if self.shower_length <= 0:
done = True
else:
done = False
# Set placeholder for info
info = {}
# Return step information
return self.state, reward, done, info
def render(self, mode):
pass
def reset(self):
# Reset shower temperature
self.state = np.array([38 + random.randint(-3,3)]).astype(float)
# Reset shower time
self.shower_length = 60
return self.state
any help would be greatly appreciated!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|