Class BoltzmannExploration
Boltzmann distribution exploration policy.
Inheritance
System.Object
BoltzmannExploration
Implements
Namespace: Mars.Components.Services.Explorations
Assembly: Mars.Components.dll
Syntax
public class BoltzmannExploration : object, IExplorationPolicy
Remarks
The class implements exploration policy base on Boltzmann distribution. According to the policy, action a at state s is selected with the next probability:
exp( Q( s, a ) / t )
p( s, a ) = -----------------------------
SUM( exp( Q( s, b ) / t ) )
b
where Q(s, a) is action's a estimation (usefulness) at state s and t is Temperature.
Constructors
BoltzmannExploration(Double)
Initializes a new instance of the BoltzmannExploration class.
Declaration
public BoltzmannExploration(double temperature)
Parameters
Type | Name | Description |
---|---|---|
System.Double | temperature | Temperature parameter of Boltzmann distribution. |
Properties
Temperature
Temperature parameter of Boltzmann distribution. Should be greater than 0.
Declaration
public double Temperature { get; set; }
Property Value
Type | Description |
---|---|
System.Double |
Remarks
The property sets the balance between exploration and greedy actions. If temperature is low, then the policy tends to be more greedy.
Methods
ChooseAction(Double[])
Choose an action.
Declaration
public int ChooseAction(double[] actionEstimates)
Parameters
Type | Name | Description |
---|---|---|
System.Double[] | actionEstimates | Action estimates. |
Returns
Type | Description |
---|---|
System.Int32 | Returns selected action. |
Remarks
The method chooses an action depending on the provided estimates. The
estimates can be any sort of estimate, which values usefulness of the action
(expected summary reward, discounted reward, etc).