Brian Tanner :: brian@tannerpages.com
This document describes how to use the Python RL-Glue Codec, a software library that provides socket-compatibility with the RL-Glue Reinforcement Learning software library.
For general information and motivation about the RL-Glue1 project, please refer to the documentation provided with that project.
This codec will allow you to create agents, environments, and experiment programs in Python.
This software project is licensed under the Apache-2.02 license. We're not lawyers, but our intention is that this code should be used however it is useful. We'd appreciate to hear what you're using it for, and to get credit if appropriate.
This project has a home here:
http://glue.rl-community.org/Home/Extensions/python-codec
Compiling and running components with this codec requires Python.
The code has some optimizations that detect if you have NumPy3 installed, and will speed up some parts of the code automatically in that case. We're working on looking at more ways to transparently make this codec faster.
Possible Contribution: Someone with Python experience could help us find out what version of Python is required to use this codec, and could help us update the codec to be as robust as possible to older versions. Also, if someone wants to speed up this codec, we have some ideas but we are not really Python programmers.
The .tar.tz distribution can be found here:
http://code.google.com/p/rl-glue-ext/wiki/Python
To check the code out of subversion:
svn checkout http://rl-glue-ext.googlecode.com/svn/trunk/projects/codecs/Python Python-Codec
There are many options for exactly how to build and where to install the codec. If you are interested in a custom install or installing to a custom location, we recommend reading the Python installation docs.
To see a list of the available options:
>$ cd /path/to/codecs/Python >$ python setup.py --help
To install the Python codec into your system (you may need sudo or root access to do this):
>$ cd /path/to/codecs/Python >$ python setup.py install
If you are not able to install the codec because it requires root access on your system and you do not have these privileges, you can do a local install of the codec in your user directory. Please see the Python installation docs for more details.
After installing the Python codec, you will be able to access all of the RL-Glue Python classes and functions without needing to specify them in your $PYTHONPATH environment variable.
Removing the Codec
Distutils does not provide an automated way to remove the Python modules that it installs, because in general there may be dependencies between modules and it does not want to break your system. We recommend that you only remove the installed Python codec from your system manually if you know what you are doing. Future versions of the codec will install over the existing one.
For the rest of this section, we'll assume you've put this project in a subdirectory of your home directory called PythonCodec, so the src directory is at: ~/PythonCodec/src.
Python will want to know where the codec source files are, so we'll frequently use code like:
>$ PYTHONPATH=~/PythonCodec/src python do_something_with_codec.py
You can make your life easier by adding the path to the codec source to your PYTHONPATH environment variable, something like (in Bash):
>$ export PYTHONPATH=~/PythonCodec/src:$PYTHONPATH
Now your commands can be less cluttered, the same as if you had installed the codec:
>$ python do_something_with_codec.py
The skeleton contains all of the bare-bones plumbing that is required to create an agent/environment/experiment with this codec and might be a good starting point for creating your own components.
The mines-sarsa-sample contains a fully functional tabular Sarsa learning algorithm, a discrete-observation grid world problem, and an experiment program that can run these together and gather results. More details below in Section 2.5.
In the following sections, we will describe the skeleton project. Running and using the mines-sarsa-sample is analogous.
The pertinent file is:
examples/skeleton/skeleton_agent.py
This agent does not learn anything and randomly chooses integer action 0 or 1.
You can compile and run the agent like:
>$ cd examples/skeleton >$ python skeleton_agent.py
You will see something like:
RL-Glue Python Agent Codec Version: 2.0 (Build 250) Connecting to 127.0.0.1 on port 4096...
This means that the skeleton_agent is running, and trying to connect to the rl_glue executable server on the local machine through port 4096!
You can kill the process by pressing CTRL-C on your keyboard.
The Skeleton agent is very simple and well documented, so we won't spend any more time talking about it in these instructions. Please open it up and take a look.
The pertinent file is:
examples/skeleton/skeleton_environment.py
This environment is episodic, with 21 states, labeled
. States {0, 20} are terminal and return rewards of {-1, +1} respectively. The other states return reward of 0.
There are two actions, {0, 1}. Action 0 decrements the state number, and action 1 increments it. The environment starts in state 10.
You can compile and run the environment like:
>$ cd examples/skeleton >$ python skeleton_environment.py
You will see something like:
RL-Glue Python Environment Codec Version: 2.0 (Build 250) Connecting to 127.0.0.1 on port 4096...
This means that the skeleton_environment is running, and trying to connect to the rl_glue executable server on the local machine through port 4096!
You can kill the process by pressing CTRL-C on your keyboard.
The Skeleton environment is very simple and well documented, so we won't spend any more time talking about it in these instructions. Please open it up and take a look.
The pertinent files are:
examples/skeleton/skeleton_experiment.py
This experiment runs RL_Episode a few times, sends some messages to the agent and environment, and then steps through one episode using RL_step.
>$ cd examples/skeleton >$ python skeleton_experiment.py
You will see something like:
Experiment starting up! RL-Glue Python Experiment Codec Version: 2.0 (Build 250) Connecting to 127.0.0.1 on port 4096...
This means that the skeleton_experiment is running, and trying to connect to the rl_glue executable server on the local machine through port 4096!
You can kill the process by pressing CTRL-C on your keyboard.
The Skeleton experiment is very simple and well documented, so we won't spend any more time talking about it in these instructions. Please open it up and take a look.
>$ cd examples/skeleton >$ rl_glue & >$ python skeleton_agent.py & >$ python skeleton_environment.py & >$ python skeleton_experiment.py &
If RL-Glue is not installed in the default location, you'll have to start the rl_glue executable server using its full path (unless it's in your PATH environment variable):
>$ /path/to/rl-glue/bin/rl_glue &
You should see output like the following if it worked:
>$ rl_glue & RL-Glue Version 3.0-beta-1, Build 848:856 RL-Glue is listening for connections on port=4096 >$ PYTHONPATH=~/PythonCodec/src python skeleton_agent.py & RL-Glue Python Agent Codec Version: 2.0 (Build 250) Connecting to 127.0.0.1 on port 4096... Agent Codec Connected RL-Glue :: Agent connected. >$ PYTHONPATH=~/PythonCodec/src python skeleton_environment.py & RL-Glue Python Environment Codec Version: 2.0 (Build 250) Connecting to 127.0.0.1 on port 4096... Environment Codec Connected RL-Glue :: Environment connected. >$ PYTHONPATH=~/PythonCodec/src python skeleton_experiment.py & Experiment starting up! RL-Glue Python Experiment Codec Version: 2.0 (Build 250) Connecting to 127.0.0.1 on port 4096... RL-Glue :: Experiment connected. RL_init called, the environment sent task spec: VERSION RL-Glue-3.0 PROBLEMTYPE episodic DISCOUNTFACTOR 1.0 OBSERVATIONS INTS (0 20) ACTIONS INTS (0 1) REWARDS (-1.0 1.0) EXTRA skeleton_environment(Python) by Brian Tanner. ----------Sending some sample messages---------- Agent responded to "what is your name?" with: my name is skeleton_agent, Python edition! Agent responded to "If at first you don't succeed; call it version 1.0" with: I don't know how to respond to your message Environment responded to "what is your name?" with: my name is skeleton_environment, Python edition! Environment responded to "If at first you don't succeed; call it version 1.0" with: I don't know how to respond to your message ----------Running a few episodes---------- Episode 0 42 steps 1.0 total reward 1 natural end Episode 1 28 steps 1.0 total reward 1 natural end Episode 2 96 steps -1.0 total reward 1 natural end Episode 3 52 steps 1.0 total reward 1 natural end Episode 4 100 steps 0.0 total reward 0 natural end Episode 5 1 steps 0.0 total reward 0 natural end Episode 6 82 steps 1.0 total reward 1 natural end ----------Stepping through an episode---------- First observation and action were: 10 and: 1 ----------Summary---------- It ran for 66 steps, total reward was: -1.0
More details about the mines-sarsa sample project can be found at their RL-Library home:
http://library.rl-community.org/packages/mines-sarsa-sample
The task specification string5 is manually because there is not yet a task spec builder for Python.
The SARSA agent parses the task specification string using the Python task spec parser. This agent can receive special messages from the experiment program to pause/unpause learning, pause/unpause exploring, save the current value function to a file, and load the the value function from a file.
The sample experiment then tells the agent to save the value function to a file, and then resets the experiment (and agent) to initial conditions. After verifying that the agent's initial policy is bad, the experiment tells the agent to load the value function from the file. The agent is evaluated again using this previously-learned value function, and performance is dramatically better.
Finally, the experiment sends a message to specify that the environment should use a fixed (instead of random) starting state, and runs the agent from that fixed start state for a while.
The new Python task spec parser can also be used by agents to decode the task spec string for agent_init. The sample sarsa agent in Section 2.5.2 demonstrates how to do this. The Python parser (unlike Java) is not a task spec builder, so task specs must be constructed manually (for now) when using Python to implement environments.
In these cases, you can tell your Python agent, environment, or experiment program to connect on a custom port and/or to a custom host using the environment variables RLGLUE_PORT and RLGLUE_HOST.
For example, try the following code:
> $ RLGLUE_PORT=1025 RLGLUE_HOST=yahoo.ca python skeleton_agent.py
That command could give output like:
RL-Glue Python Agent Codec Version: 2.0-RC1 (Build 446) Connecting to yahoo.ca on port 1025...
This works for agents, environments, and experiments. In practice, yahoo.ca probably isn't running an RL-Glue server.
You can specify the port, the host, neither, or both. Ports must be numbers, hosts can be hostnames or ip addresses. Default port value is 4096 and host is 127.0.0.1.
If you don't like typing these variables every time, you can export them so that the value will be set for future calls in the same session:
> $ export RLGLUE_PORT=1025 > $ export RLGLUE_HOST=mydomain.com
Remember, on most *nix systems, you need superuser privileges to listen on ports lower than 1024, so you probably want to pick one higher than that.
Instead of re-creating information that is readily available in the PythonDocs, we will give pointers were appropriate.
The class is defined as:
class RL_Abstract_Type: def __init__(self,numInts=None,numDoubles=None,numChars=None): self.intArray = [] self.doubleArray = [] self.charArray = [] if numInts != None: self.intArray = [0]*numInts if numDoubles != None: self.doubleArray = [0.0]*numDoubles if numChars != None: self.charArray = ['']*numChars def sameAs(self,otherAbstractType): return self.intArray==otherAbstractType.intArray and self.doubleArray==otherAbstractType.doubleArray and self.charArray==otherAbstractType.charArray
The other types that inherit from RL_Abstract_Type but add no specialization are:
class Action(RL_Abstract_Type) class Observation(RL_Abstract_Type)
The structure of the composite types are listed below. Note that this code is not accurate in terms of the available constructors, it is just meant to illustrate the member names.
class Observation_action: def __init__(self,theObservation,theAction): self.o = theObservation self.a = theAction class Reward_observation_terminal: def __init__(self,reward, theObservation, terminal): self.r = reward self.o = theObservation self.terminal = terminal class Reward_observation_action_terminal: def __init__(self,reward, theObservation, theAction, terminal): self.r = reward self.o = theObservation self.a = theAction self.terminal = terminal
The full definition are available in types.py.
# () -> string def RL_init(): # () -> Observation_action def RL_start(): # () -> Reward_observation_action_terminal def RL_step(): # () -> void def RL_cleanup(): # (string) -> string def RL_agent_message(message): # (string) -> string def RL_env_message(message): # () -> double def RL_return(): # () -> int def RL_num_steps(): # () -> int def RL_num_episodes(): # (int) -> int def RL_episode(num_steps):
The old strategy was:
>$ python -c \"import rlglue.agent.AgentLoader\" agentName
That didn't seem as easy as it should be, so we changed things for this release, much like we did in the Java codec. Now, Python agents and environments can become self loading by adding a bit of code at the bottom of their source files, like:
#skeleton_agent.py #top of file from rlglue.agent import AgentLoader as AgentLoader ... #bottom of file if __name__=="__main__": AgentLoader.loadAgent(skeleton_agent())
Now, (as you recall) we can load the agent like:
>$ python agentfile.py
See skeleton_environment for instructions about how to do a similar thing for environments.
We feel that this is a useful step forward, and will be encouraging this approach.
The online FAQ may be more current than this document, which may have been distributed some time ago.
We're happy to answer any questions about RL-Glue. Of course, try to search through previous messages first in case your question has been answered before.
Brian Tanner has since grabbed the torch and has continued to develop the codec.
Jose Antonio Martin H. was kind enough to create the current task spec parser. Thanks!
James Bergstra added a bit of NumPy magic to get the Python codec speed comparable to the other languages. Thanks James.
Revision Number: $Rev: 613 $ Last Updated By: $Author: brian@tannerpages.com $ Last Updated : $Date: 2009-02-05 00:28:23 -0700 (Thu, 05 Feb 2009) $ $URL: https://rl-glue-ext.googlecode.com/svn/trunk/projects/codecs/Python/docs/PythonCodec.tex $
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.71)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html PythonCodec.tex -split 0 -dir html -mkdir -title 'RL-Glue Python Codec' -local_icons -math
The translation was initiated by Brian Tanner on 2009-02-09