Jawk - Overview
Jawk is a combination of Awk and Java such that Awk scripts can utilize
Java services. That is, you may now contruct Awk scripts
- that have GUI front ends (via AWT or Swing).
- that can connect to servers via Sockets.
- that can utilize Java Security libraries.
- that can access databases via JDBC.
- that are multithreaded.
This is all achieved with a natural expansion of Awk syntax and semantics,
along with some additional keywords and operators. However, the overall
flavor of the scripts are still Awk.
Awk
Jawk implements as much of Awk as it can. I tried keeping it in line
with Awk standard set forth by The Open Group
(http://www.opengroup.org/onlinepubs/007908799/xcu/awk.html).
However:
- Jawk uses a recursive descent parser. The above Awk reference
provides an Awk grammer. However, I only used it to verify certain
constructs of the Jawk parser.
- The reference defines how Awk should treat its variables
with respect to whether the variable is a string, integer, double, etc.
Jawk does its best to mimic this behavior. However, it is not guaranteed
to conform 100% to this specification.
- Regular expressions are 100% Java regular expressions. Under most common
usages, they are the same. But, under certain circumstances, they can be
vastly different.
- The $ENVIRON variable is not supported because
java.lang.System.getenv() is deprecated. Use java.lang.System.getenv()
to circumvent this deficiency.
- The spec refers to various Posix standards that are simply not
exposed or available in the Java environment. This could cause
some incompatability in certain behaviors.
- Operator precedence is not confirmed, nor exact. If operator
precendence is an issue, please use parentheses to force the desired order.
- printf and sprintf in Awk do not report errors with respect to
whether parameter types match the parameters in the format string
(i.e., an Integer is provided to a %d argument, etc.).
Jawk, on the other hand, uses the java.util.Formatter class for
formatting, which is not forgiving, and throws
exceptions when format parameter types are mismatched. The script
can guard against this by wrapping calls with try/catch if necessary.
- Jawk makes heavy use of the 1.5 autoboxing feature. It is important
to keep this in mind when dealing with Java function parameter types.
For example, to create a server socket:
ss = new java.net.ServerSocket(3456)
and
ss = new java.net.ServerSocket(new Integer(3456))
behave the same.
- Jawk treats references of Strings, Integers, or Doubles as Awk variables,
and converts between these types regularly in order to make using them
as parameters to Java constructors/methods as easy as possible (without
requiring lots of type conversions). However, these conversion attempts
do not always produce expected type conversions. And, exceptions will be
thrown if formal and actual parameter types are mismatched. Therefore,
particular care to variable types is paramount when utilizing Java services.
(Please see the section below on Java Support for more information.)
Java Support
As mentioned above, Jawk exposes Java services such that they
can be used in Jawk scripts. For example, to create a socket
connection to www.google.com port 80:
BEGIN {
s = new java.io.Socket("www.google.com", 80);
}
And, to obtain a reference to the current execution thread:
BEGIN {
t = Thread.currentThread()
## similarly:
t2 = THREAD
## THREAD is a special variable which is equivalent to Thread.currentThread()
}
Other supported constructs are:
- import
- try/catch/finally
- this - to be used with the implements keyword, explained later
- implements - explained later
- new
- throw
- true/false/null
- synchronized
- default array values
Exceptions
Unlike Java, Jawk does not require non-runtime exceptions to be
handled. Therefore, the following code segment:
BEGIN {
Class.forName("DBDriver")
}
will execute successfuly if the DBDriver is found. Otherwise,
the exception will propagate up the callstack, and the Jawk runtime
environment will dump the stacktrace and terminate the thread.
Java Threads
In a previous section, we obtained a reference to the current
execution thread via Thread.currentThread() and THREAD.
However, to create a Thread and make it do something
useful, Awk does not easily lend itself to have its scripts subclass Thread
or implement a Runnable. Jawk overcomes this by allowing the script
to implement interfaces and to refer to its itself (as an
interface implementation) with the this reference.
The following code demonstrates these language enhancements:
implements Runnable
BEGIN {
t = new Thread(this, "A new thread.")
t.start();
}
function run( i) {
## i is a local variable to run()
for (i=0;i<5;i++)
print "i = " i
}
It would be nice, however, if Awk input rules could be used to process
input by other threads. Jawk answers this by providing a special
READER variable, along with the new begin keyword
(note, it is the lowercase version of BEGIN).
As threads are spawned, the READER variable can be assigned a subclass
of the java.io.Reader class. Then, a run() function issues
the begin statement to process input available on the READER.
Note, BEGIN and END blocks are not executed by threads other than the
main thread. Here is an example of a broadcast server:
import java.io.*;
import java.net.*;
implements Runnable;
BEGIN {
ss = new ServerSocket(3456);
ss_t = new Thread(this, "server socket accept thread");
ss_t.start();
print ss_t;
pss[THREAD] = ps = System.out
}
function run() {
if (THREAD == ss_t)
runAcceptThread()
else
runReaderThread()
}
function runAcceptThread( s, t) {
while((s = ss.accept()) != null) {
sockets[t = new Thread(this, s.toString())] = s;
t.start();
}
}
function runReaderThread( s, ps) {
s = sockets[THREAD];
READER = new InputStreamReader(s.getInputStream())
pss[THREAD] = ps = new PrintStream(s.getOutputStream())
begin
delete pss[THREAD]
delete sockets[THREAD]
}
/^quit$/ {
if (s) s.close(); else exit
next
}
/^dump$/ {
dump ps ; next
}
{
broadcast($0)
}
function broadcast(line, t) {
synchronized(this) {
for (t in pss)
pss[t].println("-- " line);
}
}
Threads and Variables
All variables are global except for the following variables, which
are thread-local:
- NR
- NF
- RS - default = "
?
"
- RS
- FS - default = "[ ]+"
- OFS - default = " "
- ORS - System.getProperty("line.separator")
- OFMT - "%.6g"
- CONVFMT - "%.6g"
- EOF
- READER
Threads and getline
The getline keyword allows reading from files or output of commands.
But, how is getline to behave in a multi-threaded environment? For example,
consider the following code fragment:
implements Runnable;
BEGIN {
new Thread(this, "test 0").start();
new Thread(this, "test 1").start();
new Thread(this, "test 2").start();
}
function run() {
for (i=0;i<2;i++) {
getline a < "input.txt" ; print a
}
}
Does Jawk print the 1st two lines of "input.txt" 3 times, or does
Jawk print the 1st six lines of "input.txt"?
The answer is that Jawk treats this as a file open local to the thread.
Therefore, Jawk will print the 1st two lines of "input.txt" 3 times.
To obtain the other behavior (print the 1st six lines
of "input.txt", replace < with the new << operator. This is
a file reference available to all threads.
Note: upon thread termination, local file reads are not
automatically closed. It is the responsibility of the script to close
these files. Also note that in a single threaded environment,
both operators (< and <<) behave the same.
getline and commands
The same dilema holds true for getline when reading input from
a command. For example:
implements Runnable;
BEGIN {
new Thread(this, "test 0").start();
new Thread(this, "test 1").start();
new Thread(this, "test 2").start();
}
function run() {
for (i=0;i<2;i++) {
"cat input.txt" | getline a ; print a
}
}
Like the file variety, the command variety behaves the same way
(commands are executed local to the thread). And, as with files,
a new operator |> is used to indicate that the command is
opened global to all executing threads.
And, as with files, the data input stream is not closed when threads
terminate. Therefore, it is the responsibility of the script to
close them.
Threads and Exceptions
Unhandled exceptions are treated by Jawk by dumping their stacktrace
to stdout and terminating the thread. This is identical behavior to
how the JVM handles unhandled runtime exceptions.
http://jawk.sourceforget.net