Jawk - Extension Facility

AWK, while an excellent text processing language, is limited as a general purpose language. For example, it would be impossible to create a socket or display a simple GUI without external assistance either from the shell or via extensions to AWK itself (i.e., gawk). To overcome this limitation, an extension facility is added to Jawk .

The Jawk extension facility allows for arbitrary Java code to be called as AWK functions in a Jawk script. These extensions can come from the user (developer) or 3rd party providers (i.e., the Jawk project team). And, Jawk extensions are opt-in. In other words, the -ext flag is required to use Jawk extensions and extensions must be explicitly registered to the Jawk instance via the -Djawk.extensions property (except for core extensions bundled with Jawk ).

Jawk extensions support blocking. You can think of blocking as a tool for extension event management. A Jawk script can block on a collection of blockable services, such as socket input availability, database triggers, user input, GUI dialog input response, or a simple fixed timeout, and, together with the -ni option, action rules can act on block events instead of input text, leveraging a powerful AWK construct originally intended for text processing, but now can be used to process blockable events. A sample enhanced echo server script is included in this article. It uses blocking to handle socket events, standard input from the user, and timeout events, all within the 47-line script (including comments).

Extensions must operate within Jawk 's memory model. Therefore, extensions must use strings, numbers, and associative arrays to interface with Jawk scripts. For example, the socket creation extension (bundled with Jawk ) passes a string handle back to the caller for referal to the newly created socket.

We will first go over an example of using the extensions bundled with Jawk . Then, we'll cover creating a new extension from scratch.

Example - Bundled Extensions

As of V1.02, Jawk comes bundled with the following extensions: Please refer to their individual JavaDocs for a description of their APIs.

Instead of constructing three seperate examples for each extension, we will attempt to use a single example script showcasing each extension module.

Sample Script

The example script implements a simple echo server which also allows broadcast messaging via stdin input from the server process:
## to run: java ... -jar jawk.jar -ext -ni -f {filename}
	css = CServerSocket(7777);
	print "(echo server socket created)"
## note: default input processing disabled by -ni
$0 = SocketAcceptBlock(css,
		SocketCloseBlock(css, sockets,
				## note: default action { print } disabled by -ni
# $1 = "SocketAccept", $2 = socket handle
$1 == "SocketAccept" {
	socket = SocketAccept($2)
	sockets[socket] = 1

# $1 = "SocketInput", $2 = socket handle
$1 == "SocketInput" {
	## echo server action:
	socket = $2
	line = SocketRead(socket)
	SocketWrite(socket, line)

# $1 = "SocketClose", $2 = socket handle
$1 == "SocketClose" {
	socket = $2
	delete sockets[socket]
## display a . for every second the server is running
$0 == "Timeout" {
	printf "."
## stdin block is last because StdinGetline writes directly to $0
## $0 == "Stdin"
$0 == "Stdin" {
	## broadcast message to all sockets
	retcode = StdinGetline()
	if (retcode != 1)
	for (socket in sockets)
		SocketWrite(socket, "From server : " $0)
	print "(message sent)"
Each extension function used in the script above is covered in some detail below: As stated by the comments, -ni disables stdin processing (as provided by Jawk itself, not the StdinExtension) and the default blank rule of { print } . Disabling stdin processing is paramount to extension processing because, otherwise, it would be confusing, if not completely impossible, to multiplex extension blocking with Jawk 's default stdin processing. And, disabling the default blank rule allows for easy-to-read blocking statements (like the one provided in the sample script) without the wierd side effect of printing the result.

Example Extension

Here, we build "FileExtension.java". The extension module consists of the following extensions: The extension's FileCreationBlock will poll to check for the existence of a file via a separate thread, but will appear to simply block from the Jawk script perspective. And, a FileExists function would naturally fit into this extension. However, it is already implemented in the CoreExtension module.

The code for FileExtension.java is as follows:

package org.jawk.ext;

import org.jawk.ext.JawkExtension;
import org.jawk.jrt.*;
import org.jawk.NotImplementedError;

import java.io.*;
import java.util.*;

// to run:
// java -Djawk.extensions="org.jawk.ext.FileExtension"
// 		... -jar jawk.jar -ext -ni -f {script.awk}

public class FileExtension extends AbstractExtension implements JawkExtension {

  private static final int POLLING_DELAY = 300;

  public String getExtensionName() { return "File Support"; }
  public String[] extensionKeywords() {
	return new String[] {
		"FileCreationBlock",	// i.e., $0 = FileCreationBlock(aa, str, etc)
		"FileInfo",		// i.e., FileInfo(map, "filename")

  private final Map<String,FileWatcher> file_watchers = new HashMap<String,FileWatcher>();
  private BulkBlockObject file_creation_blocker;

  public final void init(VariableManager vm, JRT jrt) {
	super.init(vm, jrt);
	file_creation_blocker = new BulkBlockObject("FileCreation", file_watchers, vm);

  private final class FileWatcher extends Thread implements Blockable {
	private final String filename;
	private final File file;
	private boolean found = false;
	private FileWatcher(String filename) {
		this.filename = filename;
		this.file = new File(filename);
	public final void run() {
		while(true) {
			if (file.exists())
				synchronized(file_creation_blocker) {
					found = true;
			try{Thread.sleep(POLLING_DELAY);}catch(InterruptedException ie){}
	public final boolean willBlock(BlockObject bo) {
		return ! found;

  public final int[] getAssocArrayParameterPositions(String extension_keyword, int arg_count) {
	if (extension_keyword.equals("FileInfo"))
		return new int[] {0};
		return super.getAssocArrayParameterPositions(extension_keyword, arg_count);

  public Object invoke(String keyword, Object[] args) {
	if (false)
	else if (keyword.equals("FileCreationBlock")) {
		for (Object arg : args)
		return file_creation_blocker.populateHandleSet(args, vm);
	else if (keyword.equals("FileInfo")) {
		checkNumArgs(args, 2);
		return fileinfo((AssocArray) args[0], toAwkString(args[1]));
		throw new NotImplementedError(keyword);
	// never reached
	return null;

  private final void populateFileWatchers(Object arg) {
	if (arg instanceof AssocArray) {
		AssocArray aa = (AssocArray) arg;
		for (Object o : aa.keySet())
	} else {
		String str = arg.toString();
		if (! str.equals(""))
			if (file_watchers.get(str) == null || ! file_watchers.get(str).isAlive())
				file_watchers.put(str, new FileWatcher(str));

  private final int fileinfo(AssocArray aa, String filename) {
	File file = new File(filename);
	long last_modified = file.lastModified();
	if (last_modified > 0L) {
		Date date = new Date(last_modified);
		aa.put("last modified", date.toString());
		return 1;
	} else {
		aa.put("last modified", "");
		return 0;
Most of the code registering itself to Jawk via the JawkExtension interface is fairly easy to follow. extensionKeywords() returns the set of extension functions to be accepted by the Jawk parser. And, invoke() maps extension keywords to Java methods which do the work. init() and getAssocArrayParameterPositions() require some context with regard to FileCreationBlock and will be discussed below.

Likewise, the code handling "FileInfo" is easy to follow. "FileInfo" maps to the fileinfo() function via invoke. The fileinfo() method, then, computes File.lastModified() for the specified file and populates this result into an associative array. Then, 1 is returned if successful, 0 is returned otherwise (i.e., when lastModified() returns 0L).

How did invoke() know the first parameter is an associative array? getAssocArrayParameterPositions() enforces this by returning int[] {0} for the "FileInfo" extension, telling the Jawk parser to treat the first (0th) argument as an associative array. As a result, passing in a string as the first parameter causes a SemanticException. Note that for all other extension keywords, getAssocArrayParameterPositions() refers to the superclass implementation, which simply returns an empty array (meaning, there are no assumptions on which parameters are associative arrays).

Walking through the code for "FileCreationBlock", on the other hand, is more involved because "FileCreationBlock" requires the following supporting containers and classes:

With all of these support classes, not much more is required of filecreationblock(), except to simply populate file_watchers with FileWatcher objects for files the user wishes to monitor and to trigger file_creation_block to block on the set of FileWatchers in file_watchers.

Using BulkBlockObject makes developing blocking services easier than if it were done using BlockObjects and providing your own block() method. Extensions requiring multiplexing of potentially many blockables, like FileCreationBlock and Socket*Block, use BulkBlockObject, while blockable services which don't (like StdinBlock and Timeout) use BlockObject.

This was just a high level description of the FileExtension, giving a brief introduction to some of the extension constructs and services required to formulate a useful blocking extension. If you wish to develop Jawk extensions, we recommend reading some of the extension source code bundled with Jawk , such as StdinExtension and SocketExtension, to get a better feel for writing extensions.

Jawk 1.0x vs. Jawk 0.1x

The initial release of Jawk (0.1x) allowed arbitrary Java code to execute within a Jawk script. For reasons outlined in the Jawk rewrite page, inline Java execution was abandoned to provide a Jawk that adheres to nearly 100% of the AWK specification. Therefore, it may seem like we're reverting back to what we had originally. However, this is not the case. Jawk 1.0x differs from 0.1x in that all extension activity is encapsulated within the _EXTENSION_ intermediate code instruction. Little of the other code within Jawk deals with the execution of extensions. This prohibits Jawk 1.0x to suffer from the scope creep it endured during the development of Jawk 0.1x.

Extension Naming

Please refrain from using the following extension names / $1-prefix values: It is our intent to release implementations of these extensions at a later date. And, using these names/prefixes will prohibit your application from operating as designed once these extensions are released.