Jolt.Automata : Finite State Machines

A finite state machine (FSM) is a representation of behavior. It is denoted by a set of states, transitions between such states, and actions that occur as part of such transitions. In computer science, one of the more common applications of an FSM is to validate a string of input characters and translate them into a token or single symbol (in fact, this is the very purpose of the tokenizer subroutine of a compiler). Another use of an FSM may be to determine which state a program is in after processing an unbounded sequence of symbols, or to decompose a regular expression into its "states" (and vice versa (1)).

While the applications of an FSM are many and different, each FSM can be represented generically, which is the primary goal of the Jolt.NET FSM implementation.

Table of Contents

  1. Generic FSM Representation
  2. Design Scope
    1. Serialization Limitations
  3. Usage Examples
    1. Using an FSM to accept a sequence of symbols, using state enumeration
    2. Using an FSM to accept a sequence of symbols
    3. Using an FSM to tokenize an sequence of symbols
    4. Creating and consuming a non-deterministic FSM
    5. Obtaining contextual information when an FSM reaches the error state
    6. Converting to/from QuickGraph data structure
    7. Serializing an FSM to GraphML
    8. Deserializing GraphML to an FSM
    9. Serializing an FSM to a binary stream
    10. Deserializing a binary stream to an FSM
    11. Serializing an FSM to GraphViz
    12. Converting an FSM to GLEE

Generic FSM Representation

The mathematical representation of an FSM is the quintuple (Σ,Γ,S,s0,δ,F) where:
  • Σ is a finite, non-empty set of symbols representing the input alphabet
  • S is a finite, non-empty set of states
  • s0 is the initial state, an element of S
  • δ is the state-transition function δ : S x Σ -> S
  • F, a subset of S, represents a set of final states

Additionally, an FSM may contain the following parameters to support an output language:
  • Γ is a finite, non-empty set of symbols representing the output alphabet
  • ω is the output function ω : S x Σ -> Γ

In order to acheive a generic representation of an FSM, Σ must be parameterized so that an FSM class may operate on any language. If Σ consists of ASCII characters, then a type of FSM<char> suffices. Furthermore, the type FSM<int> may be used to consume a set of numbers (note: more than one character per symbol).

States contain no useful information other than a string that uniquely identifies and describes them. Transitions connect states by a set of predicates that depend on Σ, where each predicate returns true if the state transition is valid for a given input symbol. Thus Transition<char> connects states from FSM<char> and similarly Transition<int> connects states from FSM<int>.

Since ω is an optional implementation of an FSM, and the definition of Γ is open-ended, the FSM provides an event for which subscribers can define ω and Γ. The event lets the user subscribe to a state transition and perform some additional processing based on the conditions of the transition (source state and input symbol).

Design Scope

The FSM implementation uses QuickGraph as its backing store. A graph is a natural data structure to use when representing an FSM as it avoids the sparse-array problem that is inherent in a state-table implementation.

The FSM implementation of the Jolt library is designed to support the following features.
  • Generically represent the FSM and its transitions.
  • Programmatic construction of an FSM, for a user-defined alphabet
  • Subscribing to events that fire on a valid transition, allowing for user-defined processing
  • Using an FSM to determine if it accepts a sequence of input symbols
  • Using a state enumerator to consume a sequence of input symbols, one symbol at a time
  • Export the FSM to a QuickGraph data structure, allowing analysis and visualization using QuickGraph algorithms
  • Non-deterministic FSMs and their detection

Furthermore, the following aspects of persistence are supported.
  • Declarative construction of an FSM, via an external data file
  • Serialization, deserialization, and persistence of an FSM

Serialization Limitations

Please consider the following limitations when designing your FSM for serialization and deserialization.
  • GraphML serialization and deserialization of the OnTransition event is not supported
  • GraphML serialization and deserialization of the TransitionPredicate delegate is limited to static methods
    • Instance methods will not be serialized
    • Detection of an invalid method during deserialization results in the use of the default predicate that returns false for all input values

Usage Examples


The following code snippet demonstrates how to navigate an FSM using a state enumerator. The context of the example is to validate a sequence of input symbols, searching for the string int followed by any amount of whitespace. The figure preceding the code block depicts the FSM used in the example.

int-validator.png
An FSM accepting the string int followed by any amount of whitespace.


using Jolt.Automata;
 
bool IsValid(string inputSymbols)
{
  FiniteStateMachine<char> fsm = new FiniteStateMachine<char>();
 
  fsm.AddStates( new string[] { "start", "i", "n", "t", "whitespace" });
  fsm.StartState = "start";
  fsm.SetFinalState("whitespace");
 
  fsm.AddTransition(new Transition<char>("start", "i", ch => ch == 'i'));
  fsm.AddTransition(new Transition<char>("i", "in", ch => ch == 'n'));
  fsm.AddTransition(new Transition<char>("in", "int", ch => ch == 't'));
  fsm.AddTransition(new Transition<char>("int", "whitespace", Char.IsWhiteSpace));
  fsm.AddTransition(new Transition<char>("whitespace", "whitespace", Char.IsWhiteSpace));
 
  IFsmEnumerator<char> enumerator = fsm.CreateStateEnumerator(EnumerationType.Deterministic, fsm.StartState);
  foreach (char symbol in inputSymbols)
  {
    if (!enumerator.NextState(symbol)) { return false; }  // no possible transition
  }
 
  return fsm.FinalStates.Contains(enumerator.CurrentState);
}


The following code block modifies the preceding example to use the built-in FSM support for validating a sequence of input symbols.

using Jolt.Automata;
 
bool IsValid(string inputSymbols)
{
  FiniteStateMachine<char> fsm = new FiniteStateMachine<char>();
 
  fsm.AddStates( new string[] { "start", "i", "n", "t", "whitespace" });
  fsm.StartState = "start";
  fsm.SetFinalState("whitespace");
 
  fsm.AddTransition(new Transition<char>("start", "i", ch => ch == 'i'));
  fsm.AddTransition(new Transition<char>("i", "n", ch => ch == 'n'));
  fsm.AddTransition(new Transition<char>("n", "t", ch => ch == 't'));
  fsm.AddTransition(new Transition<char>("t", "whitespace", Char.IsWhiteSpace));
  fsm.AddTransition(new Transition<char>("whitespace", "whitespace", Char.IsWhiteSpace));
 
  return fsm.Consume(EnumerationType.Deterministic, inputSymbols).IsAccepted;  
}


The following code snippet demonstrates how to tokenize a sequence of input symbols representing a variable declaration into enumeration values. The figure preceding the code block depicts the FSM used in the example.

declaration-tokenizer.png
An FSM accepting one or more integer declarations in the C language.


using Jolt.Automata;
 
interface ILanguageParser
{
  void NextToken(LanguageToken token);
  void NextToken<T>(LanguageToken token, T tokenValue);
  void Reset();
}
 
enum LanguageToken
{
  IntegerType,
  Identifier,
  Semicolon
}
 
void Tokenize(string sourceCode, ILanguageParser syntaxParser)
{
  FiniteStateMachine<char> fsm = new FiniteStateMachine<char>();
 
  // initialize states
  fsm.AddStates(new string[] { "start", "i", "n", "t", "int", "identifier", "int id", "semicolon" });
  fsm.StartState = "start";
  fsm.SetFinalState("semicolon");
 
  // initialize transitions
  fsm.AddTransition(new Transition<char>("start", "start", Char.IsWhiteSpace));
  fsm.AddTransition(new Transition<char>("start", "i", ch => ch == 'i'));
  fsm.AddTransition(new Transition<char>("i", "n", ch => ch == 'n'));
  fsm.AddTransition(new Transition<char>("n", "t", ch => ch == 't'));
 
  Transition tr = new Transition<char>("t", "int", Char.IsWhiteSpace);
  tr.OnTransition += (s, a) => syntaxParser.NextToken(LanguageToken.IntegerType);
  fsm.AddTransition(tr);
 
  fsm.AddTransition(new Transition<char>("int", "int", Char.IsWhiteSpace));
  fsm.AddTransition(new Transition<char>("int", "identifier", Char.IsLetter));
  fsm.AddTransition(new Transition<char>("identifier", "identifier", Char.IsLetterOrDigit));
 
  tr = new Transition<char>("identifier", "int id", Char.IsWhiteSpace);
  tr.OnTransition += (s, a) => syntaxParser.NextToken(LanguageToken.Identifier);
  fsm.AddTransition(tr);
 
  fsm.AddTransition(new Transition<char>("int id", "int id", Char.IsWhiteSpace));
 
  tr = new Transition<char>("int id", "semicolon", ch => ch == ';');
  tr.OnTransition += (s, a) => syntaxParser.NextToken(LanguageToken.Semicolon);
  fsm.AddTransition(tr);
 
  fsm.AddTransition(new Transition<char>("semicolon", "start", Char.IsWhiteSpace));
 
  // begin tokenization and parsing
  IFsmEnumerator enumerator = fsm.CreateStateEnumerator(EnumerationType.Deterministic, fsm.StartState);
  foreach (char inputSymbol in sourceCode)
  {
    if (!enumerator.NextSymbol(inputSymbol)) { throw new SyntaxErrorException(); }
  }
}


Construction of a nondeterministic FSM is accomplished by adding an ambiguous transition. That is, one or more transitions in the FSM accept the same input symbol. Creation and consumption of a trivial nondeterministic FSM are demonstrated in the following example.

using System.Diagnostics;
using System.Linq;
using Jolt.Automata

void CreateAndConsumeFSM()
{
  FiniteStateMachine<char> fsm = new FiniteStateMachine<char>();
  
  fsm.AddStates(new[] { "start", "first", "second", "final" });
  fsm.StartState = "start";
  fsm.SetFinalState("final");
  
  fsm.AddTransition(new Transition<char>("start", "first", 'a'.Equals));
  fsm.AddTransition(new Transition<char>("start", "second", 'a'.Equals));
  fsm.AddTransition(new Transition<char>("first", "final", 'b'.Equals));
  fsm.AddTransition(new Transition<char>("second", "second", 'b'.Equals));
  
  IFsmEnumerator<char> enumerator = fsm.CreateStateEnumerator(EnumerationType.Nondeterministic, fsm.StartState);
  foreach (char inputSymbol in "ab")
  {
    if(enumerator.MoveNext(inputSymbol)
    {
      Console.Write("Number of current states: ");
      Console.WriteLine(enumerator.CurrentStates.Count());	  
    }
  }
  
  Debug.Assert(fsm.Consume(EnumeratorType.NonDeterministic, "ab").IsAccepted);
}



When an enumerator detects an invalid input symbol (i.e., no valid transition from the current state for the given input symbol), it moves to the implicit error state. This state has no outbound transitions and thus any further input symbol consumption will not move the enumerator to another state. The implicit error state is named Jolt.FSM.ImplicitErrorState, and information about the symbol-sequence rejection is obtained as depicted in the following example.

using System;
using Jolt.Automata;

void WriteFinalState(IEnumerable<int> inputSymbols)
{
  FiniteStateMachine<int> fsm = new FiniteStateMachine<int>();
 
  fsm.AddStates(new string[] { "start", "final" });
  fsm.AddState("Jolt.FSM.ImplicitErrorState");  // Throws -- state name is reserved.
  fsm.StartState = "start";
  fsm.SetFinalState("final");
 
  fsm.AddTransition(new Transition<int>("start", "final", n => n == 100));
 
  ConsumptionResult result = fsm.Consume(EnumerationType.Nondeterministic, new int[] { 200 });
  if (!result.IsAccepted)
  {
    foreach(string state in result.LastStates)
    {
        Console.Error.WriteLine(state);
    }

    Console.Error.WriteLine(result.LastSymbol);
    Console.Error.WriteLine(result.NumberOfConsumedSymbols);
  }
}


The FiniteStateMachine class uses a QuickGraph graph for data storage, allowing natural conversion between the two data types. The following example demonstrates how to construct an FSM from a corresponding graph. Note that this operation creates a copy of the graph, allowing for changes between the two objects to progress independently.

using System;
using System.Linq;
using QuickGraph;
using Jolt.Automata;
 
void CreateFsm<TAlphabet>()
{
  // Create a graph.
  BidirectionalGraph<string, Transition<TAlphabet>> graph = new BidirectionalGraph<string, Transition<TAlphabet>>();
  graph.AddVerticesAndEdge(new Transition<TAlphabet>("start", "finish", inputSymbol => true));
  graph.AddEdge(new Transition<TAlphabet>("finish", "start", inputSymbol => true));
 
  // Convert to FSM.
  FiniteStateMachine<TAlphabet> fsm = new FiniteStateMachine<TAlphabet>(graph);
  fsm.StartState = "start";
  fsm.SetFinalState("finish");
 
  // Manipulate graph and test FSM.
  graph.ClearEdges();
  Console.WriteLine(fsm.Consume(Enumerable.Repeat(default(TAlphabet), 4)).IsAccepted);
}

The following example demonstrates how to convert an FSM to a graph. Note that this operation does not create a copy of FSM prior to conversion. Hence, a change made to the resulting graph object is reflected in the FSM.

using System;
using System.Linq;
using QuickGraph;
using Jolt.Automata;
 
void CreateGraph<TAlphabet>()
{
  // Create an FSM.
  FiniteStateMachine<TAlphabet> fsm = new FiniteStateMachine<TAlphabet>();
  fsm.AddStates(new string[] {"start", "finish"})
  fsm.AddTransition(new Transition<TAlphabet>("start", "finish", inputSymbol => true));
  fsm.AddTransition(new Transition<TAlphabet>("finish", "start", inputSymbol => true));
  fsm.StartState = "start";
  fsm.SetFinalState("finish");
 
  // Convert to graph.
  IBidirectionalGraph<string, Transition<TAlphabet>> graph = fsm.AsGraph;
 
  // Manipulate graph and test FSM.
  graph.ClearEdges();
  graph.ClearVertices();
  Console.WriteLine(fsm.StartState == "start");
  Console.WriteLine(fsm.RemoveState("finish"));
}


GraphML serialization is accomplished through the FsmConverter class, as depicted in the following example.

using System.IO;
using Jolt.Automata;
 
void Serialize<TAlphabet>(FiniteStateMachine<TAlphabet> fsm)
{
  using (XmlWriter writer = XmlWriter.Create(Path.GetRandomFileName()))
  {
    FsmConverter.ToGraphML(fsm, writer);
  }
}

Serialization of the FSM from the error state example (less the erroneous operation) yields the following GraphML.

<?xml version="1.0" encoding="utf-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns">
  <key id="stateName" for="node" attr.name="stateName" attr.type="string" />
  <key id="isStartState" for="node" attr.name="isStartState" attr.type="boolean">
    <default>false</default>
  </key>
  <key id="isFinalState" for="node" attr.name="isFinalState" attr.type="boolean">
    <default>false</default>
  </key>
  <key id="description" for="edge" attr.name="description" attr.type="string" />
  <key id="transitionPredicate" for="edge" attr.name="transitionPredicate" attr.type="string" />
  <graph id="G" edgedefault="directed" parse.nodes="2" parse.edges="1" parse.order="nodesfirst" parse.nodeids="free" parse.edgeids="free">
    <node id="-1587697316">
      <data key="stateName">start</data>
      <data key="isStartState">true</data>
    </node>
    <node id="-1915270904">
      <data key="stateName">final</data>
      <data key="isFinalState">true</data>
    </node>
    <edge id="0" source="-1587697316" target="-1915270904">
      <data key="description">&lt;WriteFinalState&gt;b__0</data>
      <data key="transitionPredicate">&lt;WriteFinalState&gt;b__0;ClassName, AssemblyName, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null</data>
    </edge>
  </graph>
</graphml>

The transitionPredicate attribute is serialized in the following format: methodName;assemblyQualifiedDeclaringTypeName. In the preceding example, the method ClassName.<WriteFinalState>b__0 from the AssemblyName assembly is the predicate for the only transition in the FSM. Take note that this predicate originates from a lambda expression, and that the resulting text representation of the method name is indeed the name of the compiler-generated method for the expression.

The Description attribute of the transition is controlled by the corresponding Description property on the Transition class. If a value is not specified at construction time (as demonstrated in the corresponding code example), then the class will default to using the name of the predicate.


Deserialization of GraphML is accomplished through the FsmConverter class, as depicted in the following example. As noted in the serialization limitations section, a transition predicate will be deserialized if it can be located, and thus deserializing a method created by a lambda expression requires knowledge of its compiler-generated name.

using System.IO;
using Jolt.Automata;
 
void Deserialize(Stream graphMLstream)
{
  using (TextReader reader = new StreamReader(graphMLstream))
  {
    FiniteStateMachine<char> fsm = FsmConverter.FromGraphML<char>(reader);
  }
}


Binary serialization is accomplished through the FsmConverter class, as depicted in the following example.

using System.IO;
using Jolt.Automata;
 
void Serialize<TAlphabet>(FiniteStateMachine<TAlphabet> fsm)
{
  using (Stream targetStream = File.Open(@"C:\fsm.bin", FileMode.Create))
  {
    FsmConverter.ToBinary(fsm, stream);
  }
}

When serializing to binary, take note that the .NET binary serializer operates on the entire object graph that is reachable from the given object (in this case, an FSM). So, if your FSM makes reference to non-FSM objects via instance-bound delegates in a Transition or OnTransition event handler, then those non-FSM objects must be serializable and their state will also be serialilzed as part of the serialization operation. The following example demonstrates the concepts of this behavior.

using System;
using System.IO;
using Jolt.Automata;
 
class NonSerializableType
{
  public bool IsValid(char ch) { return true; }
  private ComplexObject m_object;
}
 
[Serializable]
class SerializableType
{
  public bool IsValid(char ch) { return false; }
  private ComplexObject m_object;  // field will be serialized as part of serialization of SerializableType.
}
 
void Main()
{
  FiniteStateMachine<char> fsm = new FiniteStateMachine<char>();
  fsm.AddStates(new string[] { "start", "finish" });
  fsm.AddTransition(new Transition<char>("start", "finish", new SerializableType().IsValid));
  fsm.AddTransition(new Transition<char>("start", "finish", new NonSerializableType().IsValid));
 
  Serialize(fsm);  // Throws SerializationException because NonSerializableType is not serializable.
}
 
void Serialize<TAlphabet>(FiniteStateMachine<TAlphabet> fsm)
{
  using (Stream targetStream = File.Open(@"C:\fsm.bin", FileMode.Create))
  {
    FsmConverter.ToBinary(fsm, stream);
  }
}


Binary deserialization is accomplished through the FsmConverter class, as depicted in the following example.

using System.IO;
using Jolt.Automata;
 
FiniteStateMachine<TAlphabet> Deserialize<TAlphabet>(string filename)
{
  using (Stream binaryStream = File.Open(filename, FileMode.Open))
  {
    return FsmConverter.FromBinary<TAlphabet>(binaryStream);
  }
}


GraphViz serialization is accomplished through the FsmConverter class, as depicted in the following example.

using System.IO;
using Jolt.Automata;
 
void Serialize<TAlphabet>(FiniteStateMachine<TAlphabet> fsm)
{
  using (TextWriter writer = new StreamWriter(@"C:\fsm.dot"))
  {
    FsmConverter.ToGraphViz(fsm, writer);
  }
}

Here is an example of the output produced by the serializer, when applied to an FSM that verifies if the length of an input sequence is divisible by three. Note the use of default edge labels, denoted by the name of an anonymous method that was used in the creation of a transition. You may visualize this graph by pasting the GraphViz text in an online visualizer.

digraph G {
0 [label="mod3(len) = 2", shape=circle];
1 [label="mod3(len) = 1", shape=circle];
2 [label="mod3(len) = 0", style=bold, shape=doublecircle];
0 -> 1 [ label="<.cctor>b__0"];
1 -> 2 [ label="<.cctor>b__0"];
2 -> 0 [ label="<.cctor>b__0"];
}



Converting a FiniteStateMachine to Microsoft's GLEE representation is accomplished through the FsmConverter class, as depicted in the following example. Please note that since the GLEE binaries are not distributed with Jolt.NET, you must obtain them and place them in the same directory as Jolt.Automata.dll or the global assembly cache.

using Jolt.Automata;
using GleeConverter=Jolt.Automata.Glee.FsmConverter;
using Microsoft.Glee.Drawing;
 
void ConvertToGlee<TAlphabet>(FiniteStateMachine<TAlphabet> fsm)
{
  // Obtain a GLEE graph for drawing/visualization.
  Graph drawingGraph = GleeConverter.ToGleeGraph(fsm);
 
  // Obtain a GLEE graph for analysis
  Microsoft.Glee.GleeGraph graph = drawingGraph.GleeGraph;
}


[1] See "Regular language" for more information.

Last edited Aug 7, 2010 at 6:25 PM by SteveGuidi, version 23

Comments

No comments yet.