Jawk - AWK for Java


Jawk is the implementation of AWK in Java. Jawk parses, analyzes, and interprets and/or compiles AWK scripts. Compilation is targetted for the JVM.

Jawk runs on any platform which supports, at minimum, J2SE 5.


To use, simply download the application, copy the release jar to the jawk.jar file and execute the following command:
java -jar jawk.jar {command-line-arguments}
If executing from an environment which does not support the -jar argument, then you may use the following command instead of the one above:

For Mac/Unix:
java -cp "$CLASSPATH:jawk.jar" org.jawk.Awk {command-line-arguments}

for Windows:
java -cp "%classpath%;jawk.jar" org.jawk.Awk {command-line-arguments}

for Windows with Awk script compilation:
java -cp "%classpath%;jawk.jar;bcel.jar" org.jawk.Awk {command-line-arguments}
java org.jawk.Awk {command-line-arguments}
if you already have jawk.jar in your classpath. For brevity, the document will continue to use the -jar argument version.

To view the command line argument usage summary, execute

java -jar jawk.jar -h
The output of this command is shown below:
java ... org.jawk.Awk [-F fs_val] [-f script-filename] [-o output-filename] [-c] [-z] [-Z] [-d dest-directory] [-S] [-s] [-x] [-y] [-r] [-ext] [-ni] [-t] [-v name=val]... [script] [name=val | input_filename]...

 -F fs_val = Use fs_val for FS.
 -f filename = Use contents of filename for script.
 -v name=val = Initial awk variable assignments.

 -t = (extension) Maintain array keys in sorted order.
 -c = (extension) Compile to intermediate file. (default: a.ai)
 -o = (extension) Specify output file.
 -z = (extension) | Compile for JVM. (default: AwkScript.class)
 -Z = (extension) | Compile for JVM and execute it. (default: AwkScript.class)
 -d = (extension) | Compile to destination directory.  (default: pwd)
 -S = (extension) Write the syntax tree to file. (default: syntax_tree.lst)
 -s = (extension) Write the intermediate code to file. (default: avm.lst)
 -x = (extension) Enable _sleep, _dump as keywords, and exec as a builtin func.
                  (Note: exec enabled only in interpreted mode.)
 -y = (extension) Enable _INTEGER, _DOUBLE, and _STRING casting keywords.
 -r = (extension) Do NOT hide IllegalFormatExceptions for [s]printf.
-ext= (extension) Enable user-defined extensions. (default: not enabled)
-ni = (extension) Do NOT process stdin or ARGC/V through input rules.
                  (Useful for blocking extensions.)
                  (Note: -ext & -ni available only in interpreted mode.)

 -h or -? = (extension) This help screen.

Jawk supports all of the standard AWK command line parameters: To enhance development and script execution over traditional AWK, Jawk also supports the following command-line parameter extensions: If -f is not provided, a script argument is expected here.

Finally, one or more of the following parameters are consumed by Jawk and provided to the script via the ARGV/ARGC variables. The script can add/remove to this array to modify the behavior of the interpreter/compiled result.

If the parameter contains an =, Jawk treats it like a variable assignment. Otherwise, its a filename.

Note: Parameters passed into the command-line which result in non-execution of the script (i.e., -S, -s, -h, -? and -z) cause Jawk to ignore filename and name=value parameters.

Jawk employs the org.jawk.util.AwkParameters for command-line parameter management. Please refer to the Javadocs for more details.

If an invalid command-line parameter is provided, Jawk will throw an IllegalArgumentException and terminate execution.

Java Scripting API (JSR 223)

Jawk can be invoked via the JSR 223 scripting API (J2SE 6). The script API access mechanism was provided by Sun for previous versions of Jawk (0.14). To continue this support, Jawk implements a constructor similar to that used by previous versions.

Compilation to JVM Byte Code

Jawk provides compilation of AWK scripts to Java bytecode. In short, you'll need to download the Byte Code Engineering Library from Apache and add the bcel.jar file to your classpath. Also, to run the compiled result (by default, the class is named "AwkScript" and it is located in the AwkScript.class file), you'll need to add the jrt.jar file to the classpath. For example:
For Mac/Unix:
java -cp "jrt.jar:$CLASSPATH" AwkScript

for Windows:
java -cp "jrt.jar;%CLASSPATH%" AwkScript
Note that you do not need the BCEL to execute the compiled result. The BCEL is necessary only to compile the script.

Please refer to Jawk Compiler Module for more detailed information on the compiler implementation. You may download the BCEL from http://jakarta.apache.org/bcel/.


As stated earlier, Jawk interprets AWK scripts in Java. This is a full implementation of AWK, which includes: Jawk also offers the following features which the original AWK does not provide: Because were using Java, the following differences exist in order to blend easily within the J2SE environment:

Code Quality Assurance

Jawk employs various methods to ensure software quality, several of which are listed below:

Regression Testing

All builds are executed against a suite of regression tests developed by the author. The original goal for developing these scripts was to cover as many of the intermediate opcodes as possible. However, the following opcodes are not covered for reasons which are described below: _SLEEP_ and _DUMP_ are extensions which cannot be tested via the regression test script mechanism that is utilized, _CHECK_CLASS_ exists only when assertions are turned on (to verify that a KeyList exists on the operand stack during a for(x in y) statement), and the rest involve executing commands on the host operating system. Again, this cannot be tested via the existing regression test script environment.

As for _EXEC_, we have not decided if the exec() extension is in its final form, or if we'll change it to, perhaps, include the current script context (variable space and runtime stack). Until major design decisions are made and implemented, it is premature to implement a test case for this opcode.

In the near future, we plan to construct a Jawk Extension Facility regression test suite, to avoid coupling the existing regression test framework with extension semantics. Until then, this opcode will remain out of the existing regression test framework.

As of this writing, there are 127 opcodes. Therefore, even with 7 opcodes not covered by the test suite, the regression process still covers 94% of the opcodes used by Jawk .

A future goal of the regression test suite is to exercise all of the abstract syntax tree classes. Currently, this is not considered in the regression test suite.

Semantic Analysis

Other versions of AWK will run through a script and issue a "runtime error" if a user-defined function is not found. Jawk does not. It attempts to resolve all function calls to defined functions at compile-time (after parsing the script and prior to assembling the intermediate code from the abstract syntax tree). This is necessary in order to produce intermediate code with branch statements fully resolved.

Other versions of AWK provide command-line parameters to choose compile-time or run-time checks for function name resolution. Jawk does not, mainly to ensure semantic analysis is done for the reasons stated above. Also, to undo these semantic checks will result in unresolved references, most likely resulting in NullPointerExceptions.

Other semantic checks include formal/actual parameter analysis and array/scalar operation verification. Again, these are necessary to produce coherent intermediate code.