Jawk
accepts the -z argument to compile the script and the -Z
to compile and execute the script. You may provide the -o argument
to specify the output class name. By default, the name
used is "AwkScript
". Also by default, Jawk
assumes
no package name and writes the class file into the present
working directory. To override this behavior, use the
-d argument. This results in the assignment of a package
name to the script, along with creation of the script class
in the package-named directory.
For the compiler to work, you must have the Apache Byte Code Engineering Library (BCEL) JAR file in your classpath. Click here and follow the links to download the JAR file. Remember that the JAR file is imbedded within the ZIP file you download from the site. Then, augment your classpath to contain the JAR file. Without the BCEL JAR file in your path, you will get a NoClassDefFoundError because the Awk Compiler implementation is looking for the BCEL support classes and services, which it cannot find.
The script class relies on the Jawk
runtime environment for execution. As a result,
you must include jrt.jar
in the classpath
when executing the script class. (Note: jawk.jar
includes
classes contained in jrt.jar
. Therefore, either jar file
will do. However, jrt.jar
is smaller, and thus, a more
favorable choice.)
This results in the compilation ofFor Mac/Unix: java -cp "jawk.jar:bcel.jar:$CLASSPATH" org.jawk.Awk -Z -f script.awk For Windows: java -cp "jawk.jar;bcel.jar;%classpath%" org.jawk.Awk -Z -f script.awk
script.awk
into AwkScript.class
,
and if there were no errors, the subsequent execution of
AwkScript.class
. The side-effect of the -Z argument is the resultant
AwkScript.class
in the present working directory. Thus, to execute
the script without reparsing and compiling, type:
For Mac/Unix: java -cp "jrt.jar:$CLASSPATH" AwkScript For Windows: java -cp "jrt.jar;%classpath%" AwkScript
Extensions, such as _sleep, _dump, _INTEGER, etc are supported. Keep in mind, however, that arguments which enable most extensions must be provided at compile time. For example, to compile scripts which contain the _dump and _sleep keywords, use:
Executingjava -cp "jawk.jar;bcel.jar;%classpath%" org.jawk.Awk -z -x script.awk
outputs (within a usage statement) the extensions which are enabled.java -cp "jrt.jar;%classpath%" AwkScript -h
Note: the -t extension is the only runtime, not compile-time, extension.
Therefore, to use sorted key maps for associative arrays, the -t argument
is needed along with the -Z argument, or when executing
AwkScript
itself.
All italicized variable/function names are placeholders. The actual variable/function names are described in comments which are generally above the placeholders. Please refer to these comments to understand exactly what Jawk uses as the variable/function names.import org.jawk.jrt.*; import org.jawk.util.AwkParameters; import java.util.*; import java.util.regex.*; import java.io.*; public class AwkScript implements VariableManager { // use this field as the third argument to the AwkParameters // constructor when executing ScriptMain directly public static final String EXTENSION_DESCRIPTION = extension-description-string; private static final Integer ZERO = new Integer(0); private static final Integer ONE = new Integer(1); private static final Integer MINUS_ONE = new Integer(-1); public static void main(String args[]) { AwkScript as = new AwkScript(); // this is why org.jawk.util.AwkParameters is in jrt.jar ... AwkParameters ap = new AwkParameters(AwkScript.class, args, EXTENSION_DESCRIPTION); // to send the error code back to the calling process System.exit(as.ScriptMain(ap)); } // to satisfy the VariableManager interface // Note: field names here correspond to a global_N field // which are assigned upon compilation of the script. public final Object getARGC() { if (argc_field == null) return ""; else return argc_field; } public final Object getCONVFMT() { if (convfmt_field == null) return ""; else return convfmt_field; } public final Object getFS() { if (fs_field == null) return ""; else return fs_field; } public final Object getARGV() { return argv_field; } public final Object getOFS() { if (ofs_field == null) return ""; else return ofs_field; } public final Object getRS() { if (rs_field == null) return ""; else return rs_field; } public final void setFILENAME(String arg) { filename_field = arg; } public final void setNF(String arg) { nf_field = arg; } private final Object getNR() { if (nr_field == null) return ""; else return nr_field; } private final Object getFNR() { if (fnr_field == null) return ""; else return fnr_field; } public final void incNR() { nr_field = (int) JRT.toDouble(JRT.inc(getNR())); } public final void incFNR() { fnr_field = (int) JRT.toDouble(JRT.inc(getFNR())); } public final void resetFNR() { fnr_field = ZERO; } public final void assignField(String name, Object value) { if (name.equals("scalar1")) scalar1_field = value; else if (name.equals("scalar2")) scalar2_field = value; else if (name.equals("scalar3")) scalar3_field = value; ... else if (name.equals("scalarN")) scalarN_field = value; else if (name.equals("funcName1")) throw an exception; else if (name.equals("funcName2")) throw an exception; ... else if (name.equals("funcNameX")) throw an exception; else if (name.equals("assocArrayName1")) throw an exception; else if (name.equals("assocArrayName2")) throw an exception; ... else if (name.equals("assocArrayNameM")) throw an exception; } private JRT input_runtime; private HashMap regexps; private HashMap pattern_pairs; private int exit_code; private int oldseed; private Random random_number_generator; // global_N refers to all the global variables, // those which are defined by default // (i.e., ARGC, ARGV, ENVIRON, NF, etc.) // and vars declared by the script. // The _SET_NUM_GLOBALS_ opcode allocates // these fields. private Object global_0; private Object global_1; private Object global_2; ... private Object global_N; // Call this method to invoke the Jawk script. // Refer to the static main method implementation // and Javadocs on how to build the AwkParameters. // Use the public static String EXTENSION_DESCRIPTION // field (within AwkScript) as the third parameter // to the AwkParameters constructor to ensure proper // extension description in the usage statement. public final int ScriptMain(AwkParameters awk_parameters) { // local variables double dregister; StringBuffer sb = new StringBuffer(); // Field Allocation // ---------------- // Could have be done in the class constructor, // but placed here to ensure proper repeat initialization // if repeat execution is required // within the same JVM instance. Because // if these were within the class constructor, each of these // data structures / int values would have to be // reinitialized in some way anyway. input_runtime = JRT(this); // this = VariableManager regexps = new HashMap(); pattern_pairs = new HashMap(); oldseed = 0; random_number_generator = new Random(null); exit_code = 0; // script execution try { /// /// Compiled BEGIN and input rule blocks code here. /// (EndException is thrown when exit() is encountered.) /// } catch (EndException ee) { // do nothing } try { runEndBlocks(); } catch (EndException ee) { // do nothing } return exit_code; } public void runEndBlocks() { double dregister; StringBuffer sb = new StringBuffer(); /// /// Compiled END blocks code here. /// (EndException is thrown when exit() is encountered.) /// } // One of these exists for every function definition. // Arguments are reversed from its Jawk source. public Object FUNC_function_name(Object oN, Object oN-1, ... Object o2, Object o1) { Object _return_value_ = null; StringBuffer sb = new StringBuffer(); double dregister = 0.0; /// /// Compiled function function_name code here. /// (A return() sets the _return_value_ and falls out of this /// function code block.) /// (EndException is thrown when exit() is encountered.) /// return _return_value_; } // The following is created for every optarg version of the function // call that exists within the script for this function_name. // X > 0 && N > X public final Object FUNC_function_name(Object oX, Object oX-1, ... Object o2, Object o1) { return FUNC_function_name(null, null, ..., null, oX, oX-1, ..., o2, o1); } }
Here are several points that should be considered:
org.jawk.jrt.JRT
class contains helper functions
which deals with AWK file / command IO management and most of the
built-in function (i.e., rand()) functionality. Most of this (if not all)
could have been inlined into the compiled result. However, this would
have crippled maintainability, and thus, is kept within a separate
Java service class. jrt.jar
contains this and other service
classes refered to by AwkScript
.
org.jawk.util.AwkParameters
.
FUNC_
function calls are reversed.
This is because Jawk uses recursive decent parsing.
AwkScript
is not declared final. Therefore,
it can be subclassed. And, all FUNC_
(full-arg version)
definitions can be overridden. Note, however, that
all FUNC_
parameters are of type Object
.
Therefore, the overriding class must police itself
(during runtime) to ensure correct typecasting.
org.jawk.jrt.AssocArray
.
It does not subclass a Collections class, nor does it implement a Collections
interface. Refer to the Javadocs for its API.
(It does, however, implement the java.lang.Comparator interface.)
AwkScript
is not thread-safe. Therefore,
it is not recommended that multiple threads access the same
AwkScript
instance.
For the following AWK script:
it takes 30 seconds to interpret the script while it takes a little over a second to execute the compiled script. This dramatic result is possible because this AWK script compiles a collection of repeated (recursive) function calls and simple arithmetic operations, actions the JVM performs very quickly. However, if we run the grep.awk script (supplied in the examples) to count the number of "public" keywords in the Jawk source, the interpreted version takes roughly 17 seconds while the compiled version takes 14 seconds. The reason why we received only a slight efficiency increase is because grep.awk spends a bulk of its time in IO operations, which is almost identical to what happens in the interpreted verison.BEGIN { print fib(30) } function fib(i) { if (i<2) return i return fib(i-1) + fib(i-2) }
The bottom line is that, on average, you'll see a noticeable increase in speed in the compiled version. If the script spends most of its time doing computations and calling AWK functions, then you'll see large gains in execution efficiency. If the script spends most of its time doing IO, then you wont see much gain.
org.jawk.backend
to org.jawk.jrt
.
The exception to this is AwkParameters
.
It is the only class in jrt.jar
that is
not part of org.jawk.jrt
(it is part of the org.jawk.util
package).
_THIS_
is added to support function calling
from within a the compiled script. It is necessary because
the JVM operand stack cannot be randomly accessed. And, a local
FUNC_
call requires the "this" pointer on the operand stack
prior to its arguments.
_GOTO_
and _IFFALSE_
.
_REGEXP_PAIR_
,
_SUB_FOR_DOLLAR_REFERENCE_
,
and _SUBSTR_
did not pop all opcodes off the operand stack. To guard against
these issues, assertions in the AVM check upon termination
if the stack is empty. If not, the contents of
the stack is dumped to stdout and an AssertionError is thrown.
-Djawk.forceGreedyRS=true
, or tailor the
RS regex to not accept input in an ambiguous manner.
Greedy RS consumption is turned off by default.