Jawk - Overview

Jawk is a combination of Awk and Java such that Awk scripts can utilize Java services. That is, you may now contruct Awk scripts This is all achieved with a natural expansion of Awk syntax and semantics, along with some additional keywords and operators. However, the overall flavor of the scripts are still Awk.

Awk

Jawk implements as much of Awk as it can. I tried keeping it in line with Awk standard set forth by The Open Group (http://www.opengroup.org/onlinepubs/007908799/xcu/awk.html). However:

Java Support

As mentioned above, Jawk exposes Java services such that they can be used in Jawk scripts. For example, to create a socket connection to www.google.com port 80:
BEGIN {
  s = new java.io.Socket("www.google.com", 80);
}
And, to obtain a reference to the current execution thread:
BEGIN {
  t = Thread.currentThread()
  ## similarly:
  t2 = THREAD
  ## THREAD is a special variable which is equivalent to Thread.currentThread()
}
Other supported constructs are:

Exceptions

Unlike Java, Jawk does not require non-runtime exceptions to be handled. Therefore, the following code segment:
BEGIN {
  Class.forName("DBDriver")
}
will execute successfuly if the DBDriver is found. Otherwise, the exception will propagate up the callstack, and the Jawk runtime environment will dump the stacktrace and terminate the thread.

Java Threads

In a previous section, we obtained a reference to the current execution thread via Thread.currentThread() and THREAD. However, to create a Thread and make it do something useful, Awk does not easily lend itself to have its scripts subclass Thread or implement a Runnable. Jawk overcomes this by allowing the script to implement interfaces and to refer to its itself (as an interface implementation) with the this reference. The following code demonstrates these language enhancements:

implements Runnable

BEGIN {
  t = new Thread(this, "A new thread.")
  t.start();
}
function run(	i) {
  ## i is a local variable to run()
  for (i=0;i<5;i++)
	print "i = " i
}
It would be nice, however, if Awk input rules could be used to process input by other threads. Jawk answers this by providing a special READER variable, along with the new begin keyword (note, it is the lowercase version of BEGIN). As threads are spawned, the READER variable can be assigned a subclass of the java.io.Reader class. Then, a run() function issues the begin statement to process input available on the READER. Note, BEGIN and END blocks are not executed by threads other than the main thread. Here is an example of a broadcast server:

import java.io.*;
import java.net.*;

implements Runnable;

BEGIN {
  ss = new ServerSocket(3456);
  ss_t = new Thread(this, "server socket accept thread");
  ss_t.start();
print ss_t;
  pss[THREAD] = ps = System.out
}
function run() {
  if (THREAD == ss_t)
        runAcceptThread()
  else
        runReaderThread()
}
function runAcceptThread(        s, t) {
  while((s = ss.accept()) != null) {
        sockets[t = new Thread(this, s.toString())] = s;
        t.start();
  }
}
function runReaderThread(        s, ps) {
  s = sockets[THREAD];
  READER = new InputStreamReader(s.getInputStream())
  pss[THREAD] = ps = new PrintStream(s.getOutputStream())
  begin
  delete pss[THREAD]
  delete sockets[THREAD]
}

/^quit$/ {
  if (s) s.close(); else exit
  next
}
/^dump$/ {
  dump ps ; next
}
{
  broadcast($0)
}
function broadcast(line,        t) {
  synchronized(this) {
          for (t in pss)
                pss[t].println("-- " line);
  }
}

Threads and Variables

All variables are global except for the following variables, which are thread-local:

Threads and getline

The getline keyword allows reading from files or output of commands. But, how is getline to behave in a multi-threaded environment? For example, consider the following code fragment:
implements Runnable;
BEGIN {
  new Thread(this, "test 0").start();
  new Thread(this, "test 1").start();
  new Thread(this, "test 2").start();
}
function run() {
  for (i=0;i<2;i++) {
	getline a < "input.txt" ; print a
  }
}
Does Jawk print the 1st two lines of "input.txt" 3 times, or does Jawk print the 1st six lines of "input.txt"?

The answer is that Jawk treats this as a file open local to the thread. Therefore, Jawk will print the 1st two lines of "input.txt" 3 times.

To obtain the other behavior (print the 1st six lines of "input.txt", replace < with the new << operator. This is a file reference available to all threads.

Note: upon thread termination, local file reads are not automatically closed. It is the responsibility of the script to close these files. Also note that in a single threaded environment, both operators (< and <<) behave the same.

getline and commands

The same dilema holds true for getline when reading input from a command. For example:
implements Runnable;
BEGIN {
  new Thread(this, "test 0").start();
  new Thread(this, "test 1").start();
  new Thread(this, "test 2").start();
}
function run() {
  for (i=0;i<2;i++) {
	"cat input.txt" | getline a ; print a
  }
}
Like the file variety, the command variety behaves the same way (commands are executed local to the thread). And, as with files, a new operator |> is used to indicate that the command is opened global to all executing threads.

And, as with files, the data input stream is not closed when threads terminate. Therefore, it is the responsibility of the script to close them.

Threads and Exceptions

Unhandled exceptions are treated by Jawk by dumping their stacktrace to stdout and terminating the thread. This is identical behavior to how the JVM handles unhandled runtime exceptions.


http://jawk.sourceforget.net