How it works

Prompt generator

The prompt generator is more of a utility than an essential component of SaxaMLL. Power users can forgo the prompt generator completely since they likely have customized prompts that work much better for their use cases.

However, for simple applications, the prompt generator will get you up and running.

Events

Broadly, there are two classes of events:

  1. 🔵 events that change "scope"

  2. 🔴 events that don't change "scope"

By "scope", we mean how deep you are in the AST. Some examples to illustrate the two classes of events:

  • 🔵 A non-self-closing tag is opened e.g. <tweet>

  • 🔵 A non-self-closing tag is closed e.g. </tweet>

  • 🔴 A self-closing tag is placed e.g. <tweet />

  • 🔴 Any text is added e.g. hello my name is alex

Now, there are three types of events - each of which can be placed within the two classes above:

  1. 🔵 tagOpen = when a tag is opened.

  2. 🔵 tagClose = when a tag is closed.

  3. 🔴 update = when a child to the current node has been added.

The update event is useful really only for when you need results immediately. Otherwise, the tagOpen and tagClose events are sufficient.

🔵 Events of the "scope-changing" type accept callbacks of the form (node: XMLNode) => {}. The node parameter is the node that we are currently in the scope of in the tagOpen case. In the tagClose case, the node parameter is the node that we just left the scope of.

🔴 Events of the "non-scope-changing type" accept callbacks of the form ([parent: XMLNode, child: XMLNode, isCommitted: boolean]) => {}. The parent parameter is the node that we're currently adding children to. The child parameter is the node that we're currently constructing.

The isCommitted flag is true when the update event actually adds the child to the parent. For example, text nodes will always have isCommitted as true.

On the other hand, the update event might fire when the child node hasn't been fully constructed yet (see Example with self-closing tag). In this case, isCommitted will be false - but you can still check out what the partial child node looks like when the update was requested.

Example with text updates at the root level

Suppose our parser is chomping through this text:

I really need lobsters.

Let's define a simple callback:

executor.upon('update').do(([parent, child, isCommitted]) => {
    console.log(getText(parent));
});

If you want to be explicit about which level you want to fire on, you can also define:

executor.upon('update').for('root').do(([parent, child, isCommitted]) => {
    console.log(getText(parent));
});

Remember, for update events, the for(...) refers to whichever parent you're adding children.

We parse the incoming stream, token-by-token:

while (true) {
    parser.parse(textDelta);
    // Without this `update` call, the update callback 
    // will not trigger FOR TEXT specifically.
    parser.update();
}

The events are fired at these points:

Example with text updates at an inner level

Suppose our parser is chomping through this text:

I really need <keyword>Maine lobsters</keyword>.

Let's define a simple callback:

executor.upon('update').for('keyword').do(([parent, child, isCommitted]) => {
    console.log(getText(parent));
});

Remember, for update events, the for(...) refers to whichever parent you're adding children.

We parse the incoming stream, token-by-token:

while (true) {
    parser.parse(textDelta);
    // Without this `update` call, the update callback 
    // will not trigger FOR TEXT specifically.
    parser.update();
}

The events are fired at these points:

Example with text updates at nested levels

Suppose our parser is chomping through this text:

I really need <keyword>Maine lobsters</keyword>.

Let's define a simple callback:

executor.upon('update').for('root').do(([parent, child, isCommitted]) => {
    if (child.type === "text") console.log(getText(child));
});

executor.upon('update').for('keyword').do(([parent, child, isCommitted]) => {
    if (child.type === "text") console.log(getText(child));
});

Remember, for update events, the for(...) refers to whichever parent you're adding children.

We parse the incoming stream, token-by-token:

while (true) {
    parser.parse(textDelta);
    parser.update();
}

The events are fired at the following points: 🟡 for keyword updates, and 🟣 for root updates.

Example with scope changes

Suppose our parser is chomping through this text:

I really need <imageSearch query="lobster">lobsters right now.</imageSearch>

Let's define these callbacks:

// tagOpen
executor.upon('tagOpen').for('imageSearch').do((node) => {
    console.log(node.tag);
});

// tagClose
executor.upon('tagClose').for('imageSearch').do((node) => {
    console.log(node.tag);
});

At the time of the tagOpen callback, you will have access to all of the attributes.

At the time of the tagClose callback, you will have access to all of the attributes, plus all the children of the imageSearch node.

We parse the incoming stream, token-by-token:

while (true) {
    parser.parse(textDelta);
}

Then the events are fired at these points:

Example with self-closing tag

Suppose our parser is chomping through this text:

Nothing going on.<imageSearch query="lobster" />

Let's define a callback on the self-closing tag:

// update
executor.upon('update').for('root').do(([parent, child, isCommitted]) => {
    if (isCommitted && child.type === "element") console.log(child.attributes);
});

Remember, for update events, the for(...) refers to whichever parent you're adding children.

We parse the incoming stream, token-by-token:

while (true) {
    parser.parse(textDelta);
}

Then the events are fired here:

However, suppose we parsed the stream by calling the .update() method at every update:

while (true) {
    parser.parse(textDelta);
    parser.update();
}

Then, all these events are fired, but since we only fire on element children that have been committed to the parent, we only do anything on the last update fire (the blue one):

Errors

There are two types of errors:

  1. UNEXPECTED_TOKEN

  2. BAD_CLOSE_TAG

UNEXPECTED_TOKEN errors occur on malformed XML e.g. <twee<t>. On UNEXPECTED_TOKEN, an error node is created, and the rest of the input is collected into the content field of the error node.

BAD_CLOSE_TAG errors occur on mismatched opening and closing tags e.g. <tweet></question>. In this case, an error node is collected, but the rest of the input is parsed as if the </question> was never encountered. In other words, the opening tag is always stronger than the closing tag.

When either error is encountered, an error event is emitted.

Last updated