Class InfiniteQLearning
QLearning learning algorithm with infinite number of states.
Inheritance
System.Object
InfiniteQLearning
Namespace: Mars.Components.Services.Learning
Assembly: Mars.Components.dll
Syntax
public class InfiniteQLearning : object
Remarks
The class provides implementation of Q-Learning algorithm, known as
off-policy Temporal Difference control.
Constructors
InfiniteQLearning(Int32, Int32, IExplorationPolicy)
Initializes a new instance of the InfiniteQLearning class.
Declaration
public InfiniteQLearning(int states, int actions, IExplorationPolicy explorationPolicy)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | states | Amount of possible states. |
System.Int32 | actions | Amount of possible actions. |
IExplorationPolicy | explorationPolicy | Exploration policy. |
Remarks
The randomize parameter specifies if initial action estimates should be randomized
with small values or not. Randomization of action values may be useful, when greedy exploration
policies are used. In this case randomization ensures that actions of the same type are not chosen always.
Properties
ActionsCount
Amount of possible actions.
Declaration
public int ActionsCount { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
DiscountFactor
Discount factor, [0, 1].
Declaration
public double DiscountFactor { get; set; }
Property Value
Type | Description |
---|---|
System.Double |
Remarks
Discount factor for the expected summary reward. The value serves as
multiplier for the expected reward. So if the value is set to 1,
then the expected summary reward is not discounted. If the value is getting
smaller, then smaller amount of the expected reward is used for actions'
estimates update.
ExplorationPolicy
Exploration policy.
Declaration
public IExplorationPolicy ExplorationPolicy { get; set; }
Property Value
Type | Description |
---|---|
IExplorationPolicy |
Remarks
Policy, which is used to select actions.
LearningRate
Learning rate, [0, 1].
Declaration
public double LearningRate { get; set; }
Property Value
Type | Description |
---|---|
System.Double |
Remarks
The value determines the amount of updates Q-function receives
during learning. The greater the value, the more updates the function receives.
The lower the value, the less updates it receives.
StatesCount
Amount of possible states.
Declaration
public BigInteger StatesCount { get; }
Property Value
Type | Description |
---|---|
BigInteger |
TriedStatesCount
Gets the number of states that have already been explored by the algorithm.
Declaration
public int TriedStatesCount { get; }
Property Value
Type | Description |
---|---|
System.Int32 |
Methods
GetAction(Int32)
Get next action from the specified state.
Declaration
public int GetAction(int state)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | state | Current state to get an action for. |
Returns
Type | Description |
---|---|
System.Int32 | Returns the action for the state. |
Remarks
The method returns an action according to current
ExplorationPolicy.
UpdateState(Int32, Int32, Double, Int32)
Update Q-function's value for the previous state-action pair.
Declaration
public void UpdateState(int previousState, int action, double reward, int nextState)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | previousState | Previous state. |
System.Int32 | action | Action, which leads from previous to the next state. |
System.Double | reward | Reward value, received by taking specified action from previous state. |
System.Int32 | nextState | Next state. |