Yorick FAQ

Introduction.
File I/O
Graphics
Yorick programming
Yorick development environment
Non-Unix Yorick
- Mac Gist doesn't "see" my CGM files!

Introduction.

This is a start on a "FAQ" file for Yorick. It has been edited from Dave Munro's e-mail logs from answers to puzzled users like myself. I've skipped install problems, as Dave has fixed the install scripts as the problems come in. If there are remaining problems, you should email him directly. I'm open to suggestions, or even better, contributions, to this file. You can reach me, Ed Williams, at williams@icf.llnl.gov. Dave Munro can be reached at munro@icf.llnl.gov. Steve Langer, who is responsible for the Mac and Windows' ports is at shl@icf.llnl.gov

Since the FAQ is quite large, you might consider copying to your own file space if you intend to refer to it frequently- and periodically updating it from the version on ftp-icf.

How do I get the real part of a complex number?

Yorick data types like char, short, int, long, etc., all act as type converters when you use them as functions. Hence, if

   z= array(complex, some_dimensions);

then

   zre= double(z);

will be the real part of z. This programming construct has the great advantage that if z is *not* complex, it will also do the right thing.

If you know that z is complex, you can use the fact that complex numbers are also data structures with two members named "re" and "im". Thus,

   zre= z.re;
   zim= z.im;

extracts the real and imaginary parts of z. Even when you know z is complex, z.re is slower than double(z), so I encourage you to use the latter. I've never timed it, but double(-1i*z) might conceivably be faster than z.im, if you care about that.

Q Do index ranges HAVE to start at 1?

A I'm going to show you my reasoning on this one, and the tirade is going to last a while, so let me say that there *is* a Yorick programming trick that might work for you. I assume that the subtext of your complaint is that you do a lot of ffts, and you want to index from -n to n. Or that occasionally you want an index origin of 0. Yorick allows zero and negative indices in many (but not all) situations. Index zero means the last element of an array, and negative indices count backwards from that last element, so -1 is the next to last, -2 is before that, and so on. So if you are willing to store your array in this order:

[x(1), x(2), ..., x(N), x(-N), ..., x(-1), x(0)]

It will work exactly as you expect in many situations. So you can ask yourself whether it's more important for x(-N) to be the *first* element of your array, or for it to be *called* "-N". The roll function can be used to move around among these various choices. (You need it for plotting ffts anyway, since the canned compiled routines never use a nice order for plotting.)

I tried and tried and tried to "fix" this. You can, in fact look at the use_origins and orgsof functions, which is my "last best offer" technique for offering variable index origins. It does exactly what it is supposed to do (close to what I think you want), but I advise you strongly not to use it. If you do, you risk failure of any Yorick library function which indexes an array. And at some unspecified future time I intend to remove that "functionality" entirely.

The problem is that with unknown index origins, you are forced to check the origin of every array you ever index in every function you ever write. For example, if you write an integration routine that needs to reference the first, second, third, etc. elements of the array you are integrating, that function must check to see what number it needs to use as an index to get the first element of the array. The use_origins function attempts to simplify this by allowing you to say, "No, no, in *this* routine, I'll say 1 to get the first index, so ignore how the array is marked." It fails in practice because I just don't have the discipline to use it in nearly every routine I write. And since you need it so frequently, and it looks so strange as a sort of preamble to 90% of all library functions, I had to ask myself what it was giving me. It gets down to haggling over a name -- do I really want to spend the rest of my life checking to see whether the *name* of the first element of the second index of the Ylm array is -l or 1 or 0? My answer is no. I really don't care. I don't want to think about it. I want the *first* element of every array to be called "1".

There are actually two issues here. The first is whether every array dimension is actually a pair [length,origin], or just a length. I'm sure that's what you want. That's what the use_origins mess is all about. The second is, if you ignore the stated index origin or have no origins at all (as I claim you must to write any library routine that uses array indices), what do you choose as the origin, 0 or 1? I also experimented heavily with that choice (in fact, my old Plan code was 0-origin). At one point, I had a Yorick function which allowed you to choose whether the default index origin was 0 or 1. It turns out that one was a complete disaster, making it nearly impossible to write any code at all -- a typical problem is that the list of index origins is itself an array, and you can't index into that one without knowing how to get its first element! So you need *another* function returning a scalar value, which you *also* need to call in nearly every library function...

Another major design problem arises from the use of array index origins: What rule would you like to apply for the index origin of the result of a binary operation between two arrays of the same length, but different index origins (or even the same non-default index origin, for that matter)? When you broadcast a dimension of an array to match an existing array, does the broadcast dimension inherit index origin as well as length? How about for a subset of the indices of an array? How many times have you been burned in Basis by passing x(4:9) into a function as its pp parameter and having the function code blow up because you referred to pp(1), when pp(4) was the first element of the parameter on this call (I've gotten nailed on that one a lot)? How about for the result of a finite difference index range function like psum -- or worse, like dif or zcen or cum, where the length of the index changes by one? And if even x(4:9) is a problem, what are the indices of x(4:9:2)? All of these need ad hoc rules which are difficult to remember -- but which you *have* to remember and use all the time. Again, if the rule is "the first element of an array is called 1", there is never any question -- in fact, I claim that's the *only* rule which doesn't automatically require a dozen other arbitrary rules that every user needs to keep track of.

And what counterbalances all that complexity? I get to type Ylm(3,-3) to see l=3, m=-3? Or when I do an fft, I get to think type a negative frequency index instead of a post-Nyquist frequency index? Sorry, it just doesn't weigh enough to increase the size and complexity of the interpreter, slow it down noticeably, and make it dramatically harder to comprehend and use into the bargain.

I agonized long over the choice between 0 and 1 as the index origin. Zero is far and away the best choice for programming, since multidimensional index calculations are reduced to their most obvious form with that choice. Yorick originally used 0-origin indexing. On the other hand, calling the first element of an array "0", the second element "1", and so on, seems to me to require slightly more explanation to a novice. Furthermore, computing multidimensional index offsets is exactly the sort of exercise I *don't* want to do much in Yorick (I have to admit I'm forced to do it sometimes anyway). The clincher was that 1 is the default choice in Fortran, and I consciously decided to make Yorick indexing resemble Fortran indexing in several other ways (more agony, different story, so I'll spare you).

Now that I've given you a (small) sample of the problems you buy into when you introduce index origins into Yorick, let me explain why what is so easy and problem free in Fortran 77 is such a serious -- I believe unsolvable -- problem for Yorick:

Yorick has no declarative statements.

The reason Fortran succeeds is that each subroutine gets to have its own view of the data arrays which are passed in as arguments. Each is in fact *required* to state how every parameter is to be indexed. As a side effect, a library routine routine that needs to integrate your array never knows how *you* index it -- it picks the index origin it wants to use, and even decides how many elements it thinks your array has. It is common Fortran programming practice to lie about both (you often call library functions to operate on fragments of larger arrays).

A profound design goal in Yorick was to let every array carry the knowledge about its shape with it, rather than forcing every routine to calculate it, and forcing every interface to provide parameters for that shape calculation. Ultimately, that design goal conflicts with the notion of index origins, and I decided to sacrifice index origins rather than eroding the "every array knows its own type and dimensions" principle. The organization of a multidimensional array is indispensible (3 groups of 4 items is completely different that 4 groups of 3 items, and you need to know which it is), but the name of the first item in each row or column is not (everybody can find the first item, whether they call it "1" or "-5"). When every variable carries around its own description, that description must be the absolute bare minimum -- else you risk the possibility that it will be rejected as a candidate for an operation because its *description* didn't match the model required by the operation, and you need to provide lots of baggage for altering descriptions. I want to concentrate on data, not descriptions of data.

Another profound consequence is that in a Yorick program, I want you to avoid using any array indices at all. This is actually possible a large fraction of the time. You should be thinking about the array as a shape, not as a function that takes a sequence of integers into a corresponding sequence of values. A lot of the fiddling around with indices you would do in a Fortran program is better done with Yorick's finite difference range functions like dif and zcen. I admit that this ultimate goal of index-free programming will never, should never, actually be reached -- but I strive in that direction. Given that view of an array, I felt it was extremely important to have a uniform way to refer to the first and last elements of an array. I settled on interpreting non-positive indices as relative to the last element. I've tried to be as symmetric as I can -- there's almost always a way in Yorick to refer to the *last* element of an array, or the *final* dimension of a multidimensional array, that is about as easy as referring to the first, and which does not require calling any additional functions to check on the length or dimensionality.

I've thought hard about a compromise position, as well. Namely, I could provide Yorick with optional declarative statements, so that you could locally tell Yorick about the shape and index origin, like in Fortran. In the future, I can imagine this would also pave the way for a higher performance Yorick, since if you knew more about data types and dimensionality, you could potentially precompute which virtual functions had to be called to perform the various operations. I doubt the optional declaratives are worth the additional complexity. The way to improve Yorick is to make it smaller and simpler, not bigger and more complicated. As far as higher performance, I want to spend my time improving and simplifying the process of linking compiled Fortran and C routines into Yorick. I *know* I can do that, whereas I don't see exactly where declarative statements would lead, and I have a strong suspicion that I wouldn't like it much when I got there.

I hope I've convinced you that I'm not just being obstinate about adding a simple feature you want. If you have any thoughts on index origins you think I haven't considered, I'd be glad to listen. I'm sorry I can't provide you with a feature you think is crucial, and I'd be glad to further clarify any of the preceding problems I've had doing what you think you want. My advice is to learn to think about your arrays a little differently in Yorick than you do in Fortran. My position is that I'm trying to save the deepest features of your Fortran arrays in the simplest, easiest to use, most efficient interpreted environment I can provide. After years of effort studying and experimenting with this question, my conclusion is that you are better off letting go of index origins, and concentrating on what you can have instead. I hope that Yorick offers you a few other features that will compensate you for the loss.

Q Writing structs and pointers to binary files.


>1) I want to save a C struct consisting of several members to the file
>
>struct block {
>  int is, ie, js, je, ks, ke;
>  double *x;    /* dynamical storage for three dimensional array
>                   acsessed like fortran */
>  double *y;
>  double *z;
>}
>
>Can you give me just a short outline of how to do it using yoricks
>binary IO?


  struct block {
    int is, ie, js, je, ks, ke;
    pointer x, y, z;  /* could say double *x, etc, but this makes
                         it clear that Yorick pointers are all the
                         same type -- the object itself is the only
                         thing that knows its own data type */
  }

  x= y= z= array(something, somedims);
  one_block= block(is=15, ie=16, js=17, je=18, ks=19, ke=20,
                   x=&x, y=&y, z=&z);
  f= createb("myfile.pdb");
  save, f, one_block;

should do the trick. With the current implementation of pointers, you need to let Yorick allocate the x, y, and z arrays, not your compiled code. You could get around this limitation using the reshape function, but letting the interpreter do all your memory management is closer to my own model of how the world should work. You can find further examples of writing pointers and structs in Yorick/include/testb.i; if your struct will only appear in records, I believe you need to "write" the struct itself to your file before you add the first record (i.e.- before the first add_record call -- the problem is that once you start writing records, you can't add any new data structure types to the file). There should be an example of this in testb.i.

Two caveats about writing pointers to files: (1) Yorick's pointers can be read only by Yorick; they do not conform to the "PDB standard" (although I don't think the official PDB software is in very wide use). (2) My own thoughts about pointers and their implementation are in a state of flux.

The combination of pointers and structs that is so powerful in C just does not work quite right in the Yorick language. The problem is mostly that there is no obvious way to "vectorize" the address-of and derefence operators & and * -- you wind up needing explicit loops all the time, which flushes Yorick's performance down the toilet. Furthermore, Yorick's extract member indexing operation (the . or -> operator) is inherently much slower than its other indexing operators, since the parser has no idea how to make the correspondence between the member name and its offset in the struct -- that expensive association has to be deferred to runtime. So I recommend you not get too carried away with structs and pointers in your Yorick code. The Yorick extern/local scoping rule and its high performance with arrays of numbers drive the design of my own Yorick programs much more than any superficial resemblance to C (which is still my favorite compiled programming language by far).

The point is, that although you *can* use identical data structures in the interpreted Yorick and compiled C parts of your program, you rarely *should*. What's convenient in C is cumbersome in Yorick and vice-versa, and the amount of work to convert between the two representations is trivial.

Why am I blithering to you about all this theory?

I hope you will avoid writing pointers to your binary files. Yorick will do it, and I'm even working to improve that capability, but pointers should be written to files only in cases of dire emergency. The only emergency I know of is the need to write history records containing arrays whose shape varies from one record to the next. Instead of writing your block struct to the file, I urge you to consider writing the x, y, and z arrays directly. Assuming that the int members of the block struct describe the dimensions of x, y, and z, you may not need to write them to the file at all -- you are better off using the dimsof operator to retrieve array dimensions, which are (correctly) stored in the file's metadata.

(As long as the struct doesn't contain pointers, go ahead and write it to the file -- but of course, essentially all useful C structs contain pointers!)

If you start writing pointers to your files, it will be painful to convert them to another self-descriptive format, should the need arise; if you avoid pointers, you are keeping your options open. So if you are permanently archiving results, don't use pointers, while if you are just making temporary files for use by Yorick in the next few months, you can feel a little freer.

Q Can I read from a binary file using strides?


>2) How can i access the saved data using a stride?
>For example read of x only (2:8, 4:5, 3)


   f= openb("myfile.pdb")
   ...
   xpart= f.x(2:8, 4:5, 3);

reads just the selected elements of x; index lists are fine as well, and you can put f.x(2:8,4:5,3) on the LHS of an assignment to write just those elements of x. (You will need to use the add_variable function instead of save if you want to declare a variable without writing any of it. You can then write it in pieces using assignment statements.)

Yorick is currently not perfect about reading the minimum possible number of disk blocks when it retrieves parts of variables. In particular, if you are trying to read a very sparse subset of a very large array, you might be better off using an explicit loop to be sure Yorick doesn't access too much more of your file than necessary. If you do that, remember that the final index has the longest stride -- loop over that, don't mess around looping over the first couple of indices.

In general, if you work with extremely large arrays (compared to the amount of memory on your machine), you may want to consider putting explicit loops into your Yorick functions in order to break the calculations into reasonable sized chunks -- Yorick expressions need lots of memory for temporaries, and you need to be careful not to utter a simple Yorick statement that exhausts your memory. (Typical complaint: "I have 64 Mbyte of memory on my machine. Why does Yorick run out of memory -- all my arrays total only 49 Mbyte, and I turned off all my other programs...")

Q Can I get Yorick's portable binary file capability into my own code?

I just reread this paragraph, and it sounds like you want to have your code write out a binary file for Yorick to read. I have to confess that Yorick's binary file software is tangled up in the interpreter pretty thoroughly (I plan to fix that, but it won't happen for quite a while -- it's a big project). You can check out netCDF (unidata.ucar.edu: pub/netcdf/netcdf.tar.Z) or PDB (part of coral.llnl.gov: pub/pact/pact*) for subroutine packages that you could load into your code to write a portable self-descriptive binary file Yorick can read. Both these packages have problems (though fewer than the third, more popular, HDF package from NCSA). Your other alternatives are to continue teaching Yorick to read your own format (in which case you need to abandon any hope of writing pointers), or to imbed your whole code into Yorick. The latter solution is a drastic step, but you can then forget about I/O entirely and just let the interpreter do it.

Q Tell me more about Yorick's internals

What is a Yorick variable? That is, when you say foo=27, then later refer to foo, what's going on in the C world? The answer is that it is a "Symbol", a data structure defined in ydata.h. There is a single array of symbols defined as:


extern Symbol *globTab;       /* global symbol table, contains any
			         variable referenced by a Function */

which is actually Symbol globTab[number_of_variable_names]. Actually, I allocate globTab in multiples of a reasonably large number of variables, so globTab is usually a little longer than this. There is a unique element of this table for every identifier the Yorick parser has ever encountered; if you type foo and the parser has never seen the name "foo" before, globTab grows by one. If necessary, it allocates a larger globTab array, copies the entire old one into the first part of the new one, and discards the old one.

Because the entire globTab might move whenever you define a new variable name, it does you no good to remember a Symbol* -- instead you remember the index of your variable in globTab. That index will never change because Yorick never forgets a variable name. Thus, a variable name is equivalent to an index into globTab:


   variable name <--> index into globTab
       foo       <-->       1492

Like Atoms in X, it pays to remember the globTab index (1492) instead of the name string ("foo"); comparisons of integers (longs, actually) are much easier than strings. There is a hash table (called globalTable) that stores the name-index correspondence. You use the Globalize function to find the index if you know the name:


extern long Globalize(const char *name, long n);

The n parameter to Globalize should be 0 if name is an ordinary 0-terminated string. However, if name is a part of a larger text buffer, you can specify n>0 to consider exactly n characters as part of the name, ignoring any that follow. Thus,

   Globalize("foo",0L)  -->  1492

The Globalize function does more than just look up foo -- if foo is not found, it defines a new variable named "foo" (with value [] or nil initially). For compiled code that is looking for something it needs, this is always what you want. (If you wanted a "look but don't touch" lookup, you'd have to use hash lookup routines in hash.h on the globalTable -- I don't have an encapsulated version of that operation.) After this call to Globalize, you know that globTab[1492] is always the current Symbol for the variable foo (but the base globTab pointer may change whenever new variables are encountered).

Besides globTab, Yorick maintains one other open-ended array of Symbol instances:

extern Symbol *sp;            /* virtual machine stack pointer */
extern Symbol *spBottom;      /* current bottom of stack */

Here, spBottom is just like globTab; it points to a multiple of some fairly large number of Symbols in an array. When the stack overflows, Yorick allocates a longer array, copies the old stack into the new, moves the spBottom and sp pointers, then discards the old stack -- just like it did for globTab (except two pointers need to be changed when the stack moves). Of course there is no hash table associated with the stack, just as there is no "top" of the globTab list (or rather the top of globTab can only move up, while the top of the stack bounces up and down all the time).

Once again, if you need to remember a stack element for a long time -- so long that it might overflow and move -- you need to remember the index sp-spBottom. To just peek at the stack, you want to use sp directly. The routine:


extern int CheckStack(int n);

checks to make sure there are at least n elements available above the current sp -- if there aren't, it reallocates a larger stack. If the stack moves, CheckStack returns 1; if it didn't move, CheckStack returns 0. Every Yorick interpreted function knows the maximum stack depth it needs to evaluate all its expressions; each interpreted function call causes a single call to CheckStack to grab all the space needed. In this calculation, the parser assumes that any compiled function will use at most one stack element; if your function needs more than that, therefore, you need to call CheckStack explicitly.

What happens when you invoke a function which has a local variable called foo? While that function is running there are two foo's -- the external foo=27, and the local foo of the function. But I promised that variable name <--> globTab index, and I there obviously isn't a one-to-one corespondence any more.

The answer is that Yorick interpreted functions push all the external values of their local variables onto the stack as they start -- that's part of the stack space they need -- and set all the globTab Symbols to nil. At any given time, therefore, there really is only one variable named "foo" -- namely the one which is "most local" or "of innermost scope". The other foo's literally do not exist -- they are somewhere on the stack. When the interpreted function returns, it destroys all its local variables, and pops the stack elements holding their external values back into place.

I don't have any way for you to "see" the "global" value of foo; only the current most local value exists. In principle, you could figure it out, but it would be a very expensive operation (you have to look through every element of the stack trying to find it). More importantly, I take Yorick's scoping rule seriously, and having the external value of a variable truly masked by local value of a function is a critical feature of that rule. I don't want compiled routines "going behind my back" to get around it.

Now, suppose you have a compiled routine that wants to use the current value of the variable foo, rather than accepting that value as one of its parameters. The best way to do this is to avoid the cost of the name<-->index lookup on every pass, and to have an initialization routine that does the lookup once and for all. Often it's a good idea to have an explicit initialization routine invoked from one of your startup .i files (like _pl_init at the end of graph.i), but you can also do it like this:


static long foo_index= -1;

void my_initialization(void)
{
  ...
  foo_index= Globalize("foo",0);
  ...
}

void Y_my_function(int nArgs)
{
  if (foo_index<0) my_initialization();
  ...
}

(Of course, if you don't care about a hash lookup, you could just call Globalize every time. Your choice.)

Now that you have foo_index, how do you get the value of foo? The answer depends on what data type you expect the result to have. You have many choices:


extern long YGetInteger(Symbol *s);
extern double YGetReal(Symbol *s);
extern char *YGetString(Symbol *s);

return scalar values. YGetReal will convert any non-complex scalar number to double; YGetInteger will convert any integer type to long; YGetString will fail unless the Yorick value had type string. Thus, if you expected that foo were an integer, you could do this in Y_my_function:


  long foo= YGetInteger(&globTab[fooIndex]);

If you don't have codger build a wrapper function for you (no PROTOTYPE comment in your startup .i file), then you can use these functions to retrieve your arguments as well. The Y_my_function above is an example (declared extern my_function in the startup .i file, with no PROTOTYPE comment). On entry, nArgs is the number of arguments you are being called with (although keywords count double here); your first argument is sp[1-nArgs], second sp[2-nArgs], and last argument sp[0].

The YGetInteger and YGetReal have an important side effect: the actual data type of the variable foo is promoted to long or double, respectively, if it was not of that type. I haven't needed anything less obtrusive yet, so no "high level" non-destructive routines exist. (I can't think of any functions where I use a YGet on anything except stack items -- YGetInteger(sp) instead of YGetInteger(&globTab[i]).)

You can also do an explicit test to see whether a variable is nil using:


extern int YNotNil(Symbol *s);

(The other "YGet" routines would blow up if foo were nil; maybe that's too drastic.)

If you expect foo to be an array, you call another YGet routine:


extern char *YGet_C(Symbol *s, int nilOK, Dimension **dims);
extern short *YGet_S(Symbol *s, int nilOK, Dimension **dims);
extern int *YGet_I(Symbol *s, int nilOK, Dimension **dims);
extern long *YGet_L(Symbol *s, int nilOK, Dimension **dims);
extern float *YGet_F(Symbol *s, int nilOK, Dimension **dims);
extern double *YGet_D(Symbol *s, int nilOK, Dimension **dims);
extern double *YGet_Z(Symbol *s, int nilOK, Dimension **dims);
extern char **YGet_Q(Symbol *s, int nilOK, Dimension **dims);
extern void **YGet_P(Symbol *s, int nilOK, Dimension **dims);

The mnemonic suffix letters are C, S, I, L, F, D, Z, Q, and P for char, short, int, long, float, double, complex, string, and pointer. Like the scalar YGets, the array version are intrusive -- if the object didn't have the type you requested, but it can be converted, then the conversion is silently performed. If s was in globTab, the implicit type change actually changes the type of your variable.

Obviously, if you intend to set the value of foo from within your compiled routine, you need to use the "array" version of YGet to get the pointer rather thean the value, even if the object is a scalar.

If nilOK is non-zero, and the Symbol is [], then a 0 pointer is returned; otherwise a nil variable causes the YGet to blow up. If the dims parameter is non-zero, *dims is set to the Dimension* for the array. Here is the Dimension struct; only the next and number members are useful:


struct Dimension {
  Dimension *next;   /* list goes from slowest to fastest varying */
  long number;       /* length of this dimension */
  long origin;       /* index of first element along this dimension */
  int references;    /* reference counter */
};

You can get the product of all the number members and the rank of the array with these:


extern long TotalNumber(const Dimension *dims);
extern int CountDims(const Dimension *dims);

You can also copy the list of number members into your own local array to avoid having to mess with the linked list (declare long dlist[maxDims] first):


extern int YGet_dims(const Dimension *dims, long *dlist, int maxDims);

If you want to create your own array, you need to constantly guard against the possibility that the user types ^C and interrupts what you are doing -- you don't want to keep "unprotected" pointers to arrays lying around. The proper thing to do is to build your arrays on the stack -- that way, if you are interrupted Yorick will know how to clean up after you. (Another strategy is to keep static copies of "under construction" pointers, which you can check the next time you are called in order to detect whether you were interrupted last time.) Here is the idiom for creating an array and pushing it onto the stack that gives minimum "exposure" against unexpected interruption:


  /* tmpDims is a global temporary for Dimension lists under
     construction -- you should always use it, then just leave
     your garbage there when you are done for the next guy to
     clean up -- your part of the perpetual cleanup comes first */
  {
    Dimension *tmp= tmpDims;
    tmpDims= 0;        /* better to never free than free twice! */
    FreeDimension(tmp);
  }


  /* make dimension list for a 3D array */
  tmpDims= NewDimension(fastest_dimension, 1L, (Dimension *)0);
  tmpDims= NewDimension(second_dimension, 1L, tmpDims);
  tmpDims= NewDimension(slowest_dimension, 1L, tmpDims);

  {
    Array *result= PushDataBlock(NewArray(&doubleStruct, tmpDims));
    double *values= result->value.d;  /* d=double, l=long, etc */
    ......
  }

Remember to call CheckStack if you push more than one item onto the stack without popping off any of your own arguments. The function Drop(n) drops the top n elements of the stack -- obviously, you shouldn't drop more than nArgs+1 elements (the extra 1 always points to you -- the compiled function Yorick just called).

If you want to just push a scalar value onto the stack, you can use one of the simpler Push routines:


extern void PushIntValue(int i);
extern void PushLongValue(long l);
extern void PushDoubleValue(double d);

You can pop the top of the stack into a variable using the PopTo function:


extern void PopTo(Symbol *s);

Thus, after you filled up the values of your 3D array, you can pop if off the stack into the foo variable like this:


   PopTo(&globTab[foo_index]);

Whatever you leave on top of the stack when you return from your compiled routine is the return value of your function.

So how can I simplify all this? Obviously, the business with tmpDims, PushDataBlock, and NewArray could be given a simpler wrapper, but I think most of the rest of it is pretty much as simple as it's going to get...

Q Yorick pointers don't seem to behave like C pointers. What's going on?


>5. We have some problem with the "&" operator, even in the original 
>   yorick 1.2:
>
>   > a=array(char,10,100)
>   > &a
>   113840
>   > &a(1,1)
>   10d248 
>   > &a(2,1)
>   10d270
>   > &a(3,1)
>   10d248
>
>   Why are the addresses (as returned by &) of a(1,1) and a(3,1) equal ?
>   The elements being clearly different, (bug in & ?)

Again, this is a feature, not a bug. As I try to warn people, Yorick pointers are very different (and not nearly as useful) as C pointers. Your surprise would have been even greater had you typed this:


   > &(cos(a(1,1)*pi/2)+3.1)

This is perfectly legal and meaningful in Yorick, but a C compiler wouldn't have any idea what you were talking about!

The deepest and most sacred design principle in Yorick is that every object "knows" its own data type and dimensions. That's why you can say "a=expression" and have 'a' just *become* the expression, no matter what it is. (If I were a computer scientist, I would wax eloquent about object oriented design at this point. I'll spare you my philosophy.)

Every Yorick expression or part of an expression has a data type and dimensions. A Yorick variable name simply points to an expression; the variable name has no data type or dimensions attached to it. This is very very very different than in compiled languages like C or Fortran, where all knowledge of data type or dimensions are attached to the name of the variable -- the same address will function differently depending on what variable has that address. That is impossible (okay, not quite) in Yorick -- if I point you to an address, you can figure out the type and dimensions of what is there. (You saw how that worked if you looked at the binobj.c:Pointee function.) Therefore, the address of any expression is a perfectly reasonable construct in Yorick. The pointer itself holds a "use" of the expression, so its memory won't be reused until the pointer is discarded or reassigned. (If you had redefined a variable to the expression, the variable would hold a "use" in the same way. The memory for an expression is freed only when its use count drops to zero.)

The down side of this is that not every address is a meaningful pointer to Yorick -- only the addresses it "knows" about make any sense. Therefore, there is no such thing as address arithmetic in Yorick. When you type &a(index list), what you are getting is the address of the temporary created as the result of the gather operation a(index list) -- it has nothing to do with the address of the array named 'a'. Yorick is pretty prompt about reusing memory, so new temporaries tend to have the same addresses as recently released temporaries, which explains why &a(3,1) and &a(1,1) came up with the same address. If you had typed &a(1,1) a second time, you would probably have gotten a different result from the first time.

Pointers are not too useful in Yorick interpreted code. Mostly they are a cheap way to make heterogeneous lists of arrays. They are, however, useful in interfacing to C code, for which pointers really are useful. I'm not sure why you wanted things like &a(3,1) -- an address in the middle of an array can't possibly be useful in a Yorick program. (If you're thinking to get improved performance by handing around pointers like in C, forget it. That kind of performance tweek is totally meaningless in interpreted code -- very harmful, in fact.)


>6. building an array of struct {long a; float b;} seems to store all
>   elements of the array at different places in memory (one malloc per
>   struct element) even if the element is of fixed size. The array is 
>   then only an array of pointers. Why not store all elements in only
>one
>   memory location ? This would allow to pass a complete table to a 
>   function.

You are wrong. The Yorick struct {long a; float b;} is layed out absolutely identically to the same struct in C. I guarantee it. You've fooled yourself by misinterpreting Yorick expressions like &x.a, which are nothing at all like the same C expression.

Yorick is an interpreter, not a compiler. Things happen *immediately* in Yorick -- it's not messing around waiting to see the entire subroutine module. When you tell it to do something, it happens *now*. Try "disassemble;&x.a". You'll see that when Yorick saw x.a in your expression, it *happened*. Yorick went into x and yanked "a" out, forming a new array with the value x.a, at a new location in memory. Just like any other indexing operation. Much later, it notices that the next operation you want is & -- so it extracts the address of the temporary expression x.a. The variable x is long forgotten. Forgetting quickly is a feature, not a bug. Yorick is not interpreted C. Interpreted C would be hundreds of times slower than Yorick, and it's not what I want to type interactively anyway.

By the way, the two C statements:


   x.a= y;

and


   y= x.a;

are totally different meanings for the same expression fragment "x.a". In the assignment, the lvalue x.a is interpreted as the address &x.a; the contents at that address are irrelevant. In the expression, you get the contents, not the address. (Think about the difference in C between *x=y and y=*x -- it's the same thing.)

Yorick's use of the same fragment "x.a" for the two different meanings is actually consistent with this. But in Yorick, unlike C, "x" is just a name; the parser has no idea even whether it is a struct which has a member "a". All of that is deferred to runtime, and the "feel" of the language is different from C as a result. I'm sorry I couldn't make your experience with this nuance of C carry over to Yorick.

I should also tell you that I experimented extensively in Yorick and its predecessors with schemes that made indexing expressions like x.a or a(1,1) hold on to a memory of the original array from which they were extracted. This turns out to do all sorts of strange things -- for example, after you say y=x.a, then later you change the value of y, and find that x.a has changed because y remembered where x.a originally came from. The weirdness you found with too quickly forgetting doesn't begin to compare with the weirdnesses of too slowly forgetting. If you want to toy with an interpreter that has a much longer memory (and associated wild side effects), try out numerical Python.

Q Can I start emacs in shell-mode and start yorick with a script?

Insert the following lines into your .emacs file:


(defun yorick ()
 (shell) 
 (comint-simple-send (get-buffer-process (current-buffer)) "yorick"))

and then start emacs with


   emacs -f yorick

This will start up emacs and yorick. If you have a custom version of yorick, say myyor, then just replace "yorick" above with "myyor" .

Q How do I use the functions in yorick.el?

I actually put this in my .emacs (I use 19.25):


(setq after-load-alist
      (append after-load-alist
	      '(("shell"
		 (define-key shell-mode-map "\C-c\C-y" 'yorick-pop)))))

(autoload 'yorick-pop "yorick"
  "*Pop up the file for the most recent Yorick error message." t)

Then I start shell-mode in Emacs with M-x shell (I've actually modified this quite a bit, but I don't think it will affect yorick.el). Then I start yorick in the shell by typing "yorick" as usual. Here is a session:



icf.llnl.gov[1] yorick
 Copyright (c) 1994.  The Regents of the University of California.
 All rights reserved.  Yorick ready.  For help type 'help'
> help,exp
 /* DOCUMENT exp(x)
      returns the exponential function of its argument (inverse of log).
   SEE ALSO: log, log10, sinh, cosh, tanh, sech, csch
  */
 defined at:  LINE: 547  FILE: /env/Yorick/1.0/Yorick/startup/std.i
>

With the cursor after the final prompt, I type C-c C-y and std.i pops up in the other window positioned to line 547. Note that the search is a backward search, looking for the line


 defined at:  LINE: 547  FILE: /env/Yorick/1.0/Yorick/startup/std.i

If I put the cursor before this line, I get the "Search failed" error you mentioned. Even if you don't bind the key, typing M-x yorick-pop after you load yorick.el should work, as long as the cursor is after an error, help output, or print,bookmark. (I just tried it in this RMAIL buffer to be sure -- M-x yorick pops up the file, until I go above the line after SEE ALSO, at which point I get your Search failed error.) The same function can be adapted to work with any code which prints error messages including a line number and file name; all you need to do is diddle the REGEXP. I actually put this in my .emacs (I use 19.25):


(setq after-load-alist
      (append after-load-alist
	      '(("shell"
		 (define-key shell-mode-map "\C-c\C-y" 'yorick-pop)))))

(autoload 'yorick-pop "yorick"
  "*Pop up the file for the most recent Yorick error message." t)



icf.llnl.gov[1] yorick
 Copyright (c) 1994.  The Regents of the University of California.
 All rights reserved.  Yorick ready.  For help type 'help'
> help,exp
 /* DOCUMENT exp(x)
      returns the exponential function of its argument (inverse of log).
   SEE ALSO: log, log10, sinh, cosh, tanh, sech, csch
  */
 defined at:  LINE: 547  FILE: /env/Yorick/1.0/Yorick/startup/std.i
>

With the cursor after the final prompt, I type C-c C-y and std.i pops up in the other window positioned to line 547. Note that the search is a backward search, looking for the line


 defined at:  LINE: 547  FILE: /env/Yorick/1.0/Yorick/startup/std.i

Q How do I manipulate the order of plotting in cell arrays?

A plf colors in each cell of a mesh (x,y) according to a color function z. Let x and y be MxN arrays, so that z is (M-1)x(N-1). Then the quadrilateral (1,1),(2,1),(2,2),(1,2) -- indices of the points in the x and y arrays gets the color z(1,1), the quad (2,1),(3,1),(3,2),(2,2) gets z(2,1), ..., the quad (M-1,1),(M,1),(M,2),(M-1,2) gets z(M-1,1), the quad (1,2),(2,2),(2,3),(1,3) gets z(1,2), the quad (2,2),(3,2),(3,3),(2,3) gets z(2,2), ..., the quad (M-1,2),(M,2),(M,3),(M-1,3) gets z(M-1,2), ..., until finally the quad (M-1,N-1),(M,N-1),(M,N),(M-1,N) gets z(M-1,N-1).

The order of the coloring is the order I have listed here. To say it more briefly, the zones are colored in the same order the colors are stored in the z array.

In ordinary 2D meshes, the order doesn't matter. However, if you got your 2D mesh as a projection of a surface in 3D (as in the plwf function), you can use the drawing order to hide the surfaces which are behind others. To do this you need to make sure that z(1,1) is the rearmost corner of your mesh, and z(M,1) is the next furthest back. The plwf function flips one or both indices of x, y, and z to accomplish this. The resulting plot is the same thing as the so-called "painter's algorithm", which is how most of the popular "3D" graphics packages render surfaces. It doesn't always draw the correct picture; Foley and Von Damm(?) and other graphics textbooks have examples of how it can fail (basically for extreme fisheye perspectives like the view through an f/0.5 lens).

The slice3 functions use an alternative, which is to draw with plfp instead of plf, and sort the polygons back to front. This algorithm doesn't always draw the right picture either. Drawing the right picture turns out to be prohibitively expensive -- it's better to use a fast algorithm, and increase the resolution of your surface until it works.

The problem with both 3D and animated graphics is that the time and thought required to create good pictures is many times what it is for good 2D pictures. Users tend to be discouraged by this, since they don't understand how it can be 100 times harder to use a sculpture or a movie as a visual aid in a scientific presentation than to use a photo. But it is.

Q What does the IREG array do, exactly?

The ireg parameter is an existence map for quadrilateral meshes -- what happens if your (x,y) mesh is MxN points, but there are "logical holes" in your mesh -- zones which don't exist. To get an idea of what this means, try this:


x= double(indgen(100))(,-:1:100);
y= transpose(x);
plm, y, x;
ireg= array(0, 100, 100);
ireg(2:,2:)= 1;        // now all 99x99 zones exist
ireg(38:52,61:91)= 0;  // whack a hole in the mesh
fma;
plm, y, x, ireg;

Obviously, you can knock as many holes as you like in the mesh. Values in ireg other than 0 and 1 are also signifcant; see the region keyword for plm, plf, etc.

This ireg capability probably isn't much use except to people doing 2D hydro simulations with codes that handle this kind of mesh. It's a wasteful way to store your data even for that case. I should extend the plm, plf, etc. functions to handle lists of logically rectangular mesh patches instead. No time soon though.

Q How do I get to make plots in batch mode without opening an X-window?

Sorry about the lack of documentation. A window can be associated with either an X display or an hcp file or both, but not neither. Try this:


window,7,display="",style="nobox.gs",hcp="myfile.ps";
pli,surfcalc;
hcp;

Note that you need (1) an hcp= keyword to the window command in addition to display="" and (2) an explicit hcp command (or use hcpon, then the fma for each page will imply an automatic hcp. In this case, don't forget a final fma after the last plot before quitting) See "help,window" for more details.

Q Can I get data from a window using the mouse?

Try help,mouse

The file Yorick/include/rezone.i has copious examples of the kind of thing that is possible. A simple example is the moush function (help,moush then go to that file and read the source). Note that what is impossible is an event driven system in which Yorick can accept either keyboard or mouse input. Someday.

Q Can you give me an example using plfp?

The basic idea of plfp is to draw a filled polygon -- you need to specify (x,y) at each of the n corners, plus a single value z representing the color to fill it. This looks a lot like the basic parameter list to plc (contours) or plf (filled) mesh, namely (z,y,x). However, you usually want to plot lots of them with a single call, so I decided to simply stick the x, y, and z values together into single large, unstructured 1D arrays: [z1, z2, z3, ...], [x1a,x1b,x1c,..., x2a,x2b,x2c,..., x3a,x3b,x3c,...], [y1a,y1b,y1c,..., y2a,y2b,y2c,..., y3a,y3b,y3c,...]. Since the number of (x,y) values for each z may differ, I can't make them multidimensional arrays -- instead, I need to add a fourth argument, which is an array of the number of sides [n1, n2, n3, ...]. The basic plfp interface therefore winds up like this: plfp, z,y,x,n

The hitch is that the unstructured 1D arrays x and y are more difficult to handle than a multidimensional array. The psum and cum index functions and the histogram function are the primary tools you need to manipulate such objects. I'll probably be writing better functions as I identify them. Some things are straightforward -- n(psum) is a list of the indices into x or y of the last points in each polygon, n(psum)-n+1 are the first point indices, and so on.

Here's an example that's both simple and challenging -- simple, because everything is small enough that you can just print the variables in order to understand what they are; challenging because the vectorized formulas to compute them are not so obvious.

/* we're going to draw 9 polygons, one each of a triangle,
   quadrilateral, pentagon, and so on up to an 11-sided figure */
n= indgen(3:11);

ns= histogram(n(psum)+1)(psum:1:-1)+1;  /* 111222233333 etc */
is= indgen(sum(n))-n(cum:1:-1)(ns)-1;   /* 012012301234 etc */

theta=2.*pi*is/n(ns);      /* angles from centers to points */

/* spread them all out -- put them on a 3x3 grid */
xc= span(-2,2,3)(,-:1:3);
yc= transpose(xc);

x= cos(theta) + xc(ns);
y= sin(theta) + yc(ns);
z= random(numberof(n));  /* make each one a different color */

plfp, z,y,x,n, cmin=-.1,cmax=1.1;

Ten lines. Not bad, eh? The ns and is lines are the ones which are ripe for replacement by some sort of utility functions.

Q Does Yorick already possess any features to cope with missing or otherwise flagged data in its display capabilities? Direct display of -999s usually doesn't work well.

Yes and no. Obviously, things get a little more complicated and you need to do special things to deal with the problem. I claim the Yorick primitives are enough to give you the tools to handle missing data however you decide to do it -- but I can't do it for you because there are too many possibilities. You need to work out your own user interface, but that's exactly what Yorick is good at.

If you represent your "bad data" by huge or small values in a 2D grid, then you can clip those values rather easily and effectively in plf or pli pictures by using the cmin= and/or cmax= keywords (that will "plot black or white pixels on a 2D plot"). For contour plots, you might be able to use the ireg parameter to set up an existence map for your mesh, but that will take a bit of tinkering. For simple curves of y vs x, you will need to write a modified plg function that splits your data into pieces however you decide you want to do that. Usually, these interface functions are just a few lines of code -- the hard part is figuring out exactly where you want the lines to go on your plot, because you need to precisely define notions like "leave a gap in an x, y line plot".

So yes and no -- Yorick has lots of features which you can use to implement the strategies you mentioned very easily, but you will need to figure out exactly what you want and do it yourself. The plf, pli, and plg functions don't really have any "innards" -- what you see is what you get.

Q Is there a way to add a reference colorbar to a pli or plf plot?

I may get around to writing a color bar generator as part of the next yorick distribution. I don't think color bars are very useful personally, which is why yorick doesn't have a documented way to create them. It makes no sense to use a color bar and try to trace contours of a constant color by eye. Instead, you should pick a few (<4?) "significant" z values and draw contours at those values using the plc function (called after your pli or plf of course).

If you insist on a color bar, it is not very hard to write a yorick function to draw one. Here's a simple example to get you started -- you may need to make a bunch of hard decisions about how you might want to label it, where to put it, etc., or maybe this is good enough already.


func colorbar(cmin, cmax)
/* DOCUMENT colorbar
            colorbar, cmin, cmax
     draw a color bar to the right of the plot.  If CMIN and CMAX
     are specified, label the top and bottom of the bar with those
     numbers.
 */
{
  plsys, 0;
  pli, span(0,1,200)(-,), .625,.46,.67,.84, legend="";
  plg, [.46,.84,.84,.46],[.67,.67,.625,.625], closed=1,
    marks=0,color="fg",width=1,type=1,legend="";
  plsys, 1;  /* assumes there is only one coordinate system */
  if (!is_void(cmin)) {
    plt, pr1(cmin), .6475,.46, justify="CT";
    plt, pr1(cmax), .6475,.84, justify="CB";
  }
}

To use, e.g.-

> pli,[span(.25,.75,200)]
> colorbar, .25,.75

Q Does Yorick pass by value, or what?

I think the manual does explain what happened to you. Yorick (and all similar interpreters) are actually "pass by five layers of indirection" languages (five is the actual number in Yorick -- I counted once; less efficient interpreters have more, and I don't see how to do with fewer), so it is a little difficult to characterize them as "pass by reference" or "pass by value".

Yorick is indeed "pass by value" in the sense that if I do this:


func f(x)
{
  x= expression;
}

the external value of x is unchanged, in the sense that f(a) or f(x) in a calling routine does not change the value of a or x in the caller.

However, the syntax


  x(indices)= expression

is totally unrelated (as I thought I stressed in the manual) to the syntax


  x= expression

The former directs that the expression be converted to the data type of an existing array x, broadcast to the shape specified by the indices, then stored into that array. This is a relatively rare (and expensive) operation in Yorick programs -- I call it "assignment". The latter just chucks whatever x was, and attaches the name "x" to the value of the expression -- I call that fast operation "redefinition".

The redefintion operation has one additional wrinkle: If the expression is just another variable, then x becomes a copy of that variable -- it turned out that an "easy equivalence" was just too confusing, and the usual intent of a statement like x=y is to make a copy of y.

Yorick is designed to be used with very large arrays of data. Mostly when you pass these around as arguments to various functions, you are just going to look at the values, not change them. So of course Yorick does not make a copy of a megabyte array every time you pass it into a new function -- you'd need to restrict yourself to a shallow calling tree indeed if it did! As a side effect, if you use an assignment statement in a function on an array which was passed to you as an argument, you do change the values in the external array -- it was never copied. If you didn't want that side effect, you should write your function like this:


  func f(x) {
    x_internal= x;
    x_internal(indices)= expression;
  }

As I said, you should generally avoid assignment statements in Yorick programs -- they are a lot less necessary than in Fortran or C.

If you *did* want the side effect of changing the external value of an argument, then you should write what you did:


  func f(x) {
    x(indices)= expression;
  }

I have to admit now that there is an important case for which this tactic will fail. Because I went for maximum speed in Yorick in preference to perfect linguistic purity, scalars of type double, long, and int are special high performance objects (internally different than any arrays or scalars of the other numeric types). Therefore, the following program intended to produce an external side effect on its input parameter:


  func f(x) {
    x(1)= expression;
  }

will fail in the case that the actual x passed to f is a scalar of type double, long, or int. For array parameters, the function has its side effect, but for those scalars it does not. I may be able to fix that inconsistency someday, but it involves a major change in the way the interpreter works, so it won't happen any time soon. Incidentally, the problem is not in the function linkage; the statement


   x(1)=expression

will fail for scalar x in or out of a function.

In analyzing this problem, I decided that this style of programming for side effects should receive direct support from the syntax. Therefore, if your intent is to change the value of an input parameter, and either you're not sure that input parameter will always be an array, or you don't want to use assignment operations to change its values, you should mark it with an ampersand when you define the function:


  func f(&x)
  {
    x= expression;
  }

Now if you call f(a), the value of a in the caller *will* change. Things like f(a+1) are still legal, but the changed value is just discarded. This mechanism works by performing an implied redefinition operation on each &-marked parameter when the function returns -- you *don't* want to use it if x is usually a megabyte array, because you'll wind up making a copy of it!

Q How do I run yorick in batch mode?

Check out "help,batch". If you start yorick like this:


     yorick -batch script.i

it should do what you want. Unlike -i, you can specify only a single -batch option. You need to process any other command line arguments yourself (follow the SEE ALSOs in the help,batch). Note that your custom.i is *not* included in this mode: if it were, somebody could inadvertently break script.i with their own peculiar custom.i. Note that if your OS supports it, you can start your script with:


     #!yorick -batch

Yorick interprets this first line as a comment, and if you set execute permission on this script, you can just invoke it as a program in its own right. (You have my permission to remove the .i from the script name if you use it like this.) Generally, a script like this should end with a quit command (-batch doesn't give it permission to quit when it runs out of things to do, only if it stops on an error). Note that attempting to read from the keyboard in batch mode is an error.

Q How can I get multiple frames in my X-window?

Personally, I think you're better off using Yorick's eps command to create an encapsulated PostScript file for a single picture, which you can then import into a variety of page layout programs to get multiple pictures on a single page, add annotations and captions, and the like. The free xfig program (it'll come with your Linux distributions) works pretty well for this purpose. That's how I turn Yorick plots into viewgraphs and figures for publication.

If you insist on having Yorick itself display several coordinate systems, I enclose a sample style sheet four.gs which should get you started. Put it in ~/Gist/four.gs, then do this in yorick:


> window,style="four.gs"
> x=span(0,10,200);y=cos(x)
> plg,y,x
> plsys,2;plg,x,y
> plsys,3;plg,x+y,x-y
> plsys,4;plg,y-x,x+y

You'll notice that it uses lots of your precious screen real estate (you'll need to use your window manager to resize the window). I would much rather put the four pictures in separate windows which can overlap, and use a page layout program to combine them later, but the choice is yours. If you are doing production graphics and you don't want any pictures to appear on your screen, do this:


    window,0,display="",hcp="your-file.ps",style="four.gs"

I just stuck this together in five minutes, so you should expect to spend a couple of days attempting to position your four viewports to guarantee that the text labels never overlap, that they are positioned where you want them on the page, and that they meet whatever other aesthetic criteria you need.


# Gist four.gs drawing style

# Four (2x2) small coordinate systems on a single page
# Legends: none

# See work.gs for description of meanings

landscape= 0

# The default coordinate system template is similar to work.gs
default = {
  legend= 0,

  viewport= { 0.19, 0.60, 0.44, 0.85 },

  ticks= {

    horiz= {
      nMajor= 4.5,  nMinor= 40.0,  logAdjMajor= 1.2,  logAdjMinor= 1.2,
      nDigits= 3,  gridLevel= 1,  flags= 0x033,
      tickOff= 0.0007,  labelOff= 0.0182,
      tickLen= { 0.0143, 0.0091, 0.0052, 0.0026, 0.0013 },
      tickStyle= { color= -2,  type= 1,  width= 1.0 },
      gridStyle= { color= -2,  type= 3,  width= 1.0 },
      textStyle= { color= -2,  font= 0x08,  prec= 2,  height= 0.0182,
        expand= 1.0,  spacing= 0.0,  upX= 0.0,  upY= 1.0,
        path= 0,  alignH= 0,  alignV= 0,  opaque= 0 },
      xOver= 0.395,  yOver= 0.370 },

    vert= {
      nMajor= 4.5,  nMinor= 40.0,  logAdjMajor= 1.2,  logAdjMinor= 1.2,
      nDigits= 4,  gridLevel= 1,  flags= 0x033,
      tickOff= 0.0007,  labelOff= 0.0182,
      tickLen= { 0.0143, 0.0091, 0.0052, 0.0026, 0.0013 },
      tickStyle= { color= -2,  type= 1,  width= 1.0 },
      gridStyle= { color= -2,  type= 3,  width= 1.0 },
      textStyle= { color= -2,  font= 0x08,  prec= 2,  height= 0.0182,
        expand= 1.0,  spacing= 0.0,  upX= 0.0,  upY= 1.0,
        path= 0,  alignH= 0,  alignV= 0,  opaque= 0 },
      xOver= 0.150,  yOver= 0.370 },

    frame= 0,
    frameStyle= { color= -2,  type= 1,  width= 1.0 }}}

system= {
  legend= "Upper Left (1)",
  viewport= { 0.10, 0.35, 0.60, 0.85 }}

system= {
  legend= "Upper Right (2)",
  viewport= { 0.45, 0.70, 0.60, 0.85 }}

system= {
  legend= "Lower Left (3)",
  viewport= { 0.10, 0.35, 0.25, 0.50 }}

system= {
  legend= "Lower Right (4)",
  viewport= { 0.45, 0.70, 0.25, 0.50 }}

legends= { nlines= 0 }

clegends= { nlines= 0 }
# end of four.gs

Q To conserve memory worry about array temporaries.

A much more serious problem is array temporaries. If you are working with very large arrays in Yorick, you should pay very careful attention to exactly how you write expressions. For example, if x in a 1 Mbyte array, then:


   y= (x+1)*exp(x)*sin(x);

requires 3 Mbytes, while


   y= (x+1)*(exp(x)*sin(x));

requires 4 Mbytes -- in the first case the deepest stack contains


   exp(x)
   (x+1)

or later


   sin(x)
   (x+1)*exp(x)

while in the second case, the deepest stack (because the parentheses delay the first * operation) contains:


   sin(x)
   exp(x)
   (x+1)

Each stack element is 1 Mbyte, and x is 1 Mbyte. So if you use big arrays, be very careful to think about how you write Yorick expressions!

For really large arrays, you may need to explicitly chunk your Yorick code, so that its array temporaries don't overwhelm you. This makes your interpreted code a lot uglier, and I wish there were some way to do it automatically, but there isn't. I think you can always find a way to fit within reasonable memory bounds with Yorick, but if you are within a factor of 4 or 5 of the largest arrays you could handle in a C program, you will need to be exceedingly careful, and possibly have rather ugly interpreted code.

Q ERROR (main) data type of pointee not installed writing binary file What's going on?

I never told you how to "install a data type", did I? You do it like this:


> struct ugh { long a;double b;}
> struct bugh { double c; long d; }
> x=[&[21,32],&ugh(a=1,b=-13.5),&bugh(c=2.7,d=-8)]
> f=createb("junk.pdb")
> save,f,ugh,bugh
> save,f,x
>

You can even use a single save as long as ugh and bugh come first:


> save,f,ugh,bugh,x

That is, you can save the struct *definitions* ugh and bugh, just like you save struct *instances*. When there are no pointers involved, writing an instance of a structure writes the definition first if necessary. However, when you are writing a pointer tree, I'm too deeply recursed to be able to get back to a point where I can write the definition of a structure I am encountering for the first time. Sorry about that. If you had written a non-pointee instance of ugh and bugh before you attempted to write x, you wouldn't have gotten the error -- it's only when the first time f has seen an ugh or bugh is while it's writing a pointee that there's a problem.

As a rule of thumb, if you *ever* think you need symbol_def, please write to me and explain why; I definitely don't intend for that to be a user level function, and I think I have cleaner ways to work around every situation where you might be tempted to use it. (Using an array of pointers or the _lst function are the usual means to avoid it, as you've apparently discovered on your own.)

I should also warn you that the format Yorick uses to write pointers to a binary file is not "standard", even if you regard the PDB format as a standard. Only Yorick can read them back; PDBLib-based programs can't. The entire binary file universe is in a state of upheaval with the US DOE's ASCI program, which is poised to force me to use the much uglier HDF file format instead of PDB. I have no intention of "orphaning" your data files, but I want to be clear that only Yorick is ever likely to be able to read files with pointers. The only way to avoid them is to generate "fake names" using the add_variable function instead of save, but that interface is the typical ugly interface you get with every other file package on the planet, so I don't blame you for using the nicer save interface and pointers. One fairly serious drawback to pointers in files is that Yorick doesn't give you a way to do a partial read of a pointee; (*f.x(3))(4:12) reads *all* of f.x(3) into memory -- even though it might be a megabyte array and you only asked for elements 4:12. In contrast, f.x3(4:12) only reads 4:12 into memory, even if f.x3 is a megabyte array. I'm hoping to be able to fix that problem by about yorick-2.0.

Q How can I read/write a struct in ASCII?

Yorick ASCII I/O is heavily weighted towards reading (very big) columns of numbers. It's difficult to figure out how to read giant arrays of structs -- what is the "obvious" default format? Of course I agree that it would be nice to "do what you meant" someday... For now, if you want to read or write a struct in ASCII, you need to write the I/O routines yourself:


struct tst { int a; float x; }
func tst_write(f, t)
{
  write, f, t.a, t.x;
}
func tst_read(f, t)
  a= t.a;  /* get the type and dimensions of the members for read */
  x= t.x;
  read, f, a, x;
  t.a= a;  /* scatter member values back into the original struct array */
  t.x= x;
}

With more effort, you could probably even cook up a pair of routines that wrote arbitrary structs, using the fact that print(structof(x)) returns a list of struct member names and data types. I don't really see the point, however, since the result wouldn't really be human readable.


>> struct tst { int a; float x; }
>> A=array(tst)
>> f=open("file")
>> read,f,A
>ERROR (*main*) read cannot handle non-array, complex, pointer, or structure

The ERROR message says it all.


>> read,f,A.a,A.x
>
>No error message, but:
>
>> A
>tst(a=0,x=0)
>
>the data is not read in either.  The file has been advanced to the
>next line, however.

As far as Yorick is concerned, this is exactly as if you had said this:


a= A.a;
x= A.x;
read, f, 3*a, x+7;

The point is, that A.a and A.x are expressions whose values are array (or scalar) temporaries. The read function reads into those temporaries (so it really was "read in"), which are then discarded (but you just didn't give yourself a way to see where it had been written). In order to "do what you mean", Yorick's parser would have to treat the read function differently than all other functions, deferring evaluation of its arguments, and somehow using those expressions in the same way as it uses the LHS of an assignment statement. The fact that C and Fortran and Yorick use the same index notation on the LHS of an assignment as in an expression causes a lot of confusion -- they're really very different objects -- including the fact that "what you mean" by an argument to the read function is the LHS-type of expression, rather than the usual type of expression.

However, I think it would be possible to recognize what you had passed as read arguments were impossible and issue some sort of error message. I'm sure that would have helped you a lot, and I'm sorry about the lack of an error. I'll add it to my list...

As far as handling ASCII reads and writes of structs automatically, I think I'll pass. Not only is there a serious problem with choosing a format (the ANSI % specifiers obviously aren't enough), but also I'm not convinced that structs really work all that well in Yorick in the first place. Certainly they aren't nearly as important in Yorick programs as in C -- mostly because Yorick pointers and array syntax don't mix.

Q Do source code tree tools, pretty printers work with Yorick?

That was in fact one of the reasons for choosing the C-like syntax.

I use Emacs's C-mode heavily with Yorick code, and it works extremely well. Yorick actually allows constructs (such as omitting the ; at the end of a statement or //-style comments) which break C-mode, but if you simply avoid those everything works fine. All the levels of font-lock-mode work with Yorick include files as well, if you like comments in one color, keywords in another, and so on.

Here's the line for ~/.emacs to get C-mode for Yorick .i files automatically:


;; Recognize Yorick .i files and use C mode.
(setq auto-mode-alist (append '(("\\.i$" . c-mode)) auto-mode-alist))

To get the font lock stuff at maximum fanciness, put these in .emacs:


(setq font-lock-maximum-decoration '((c-mode . 3) (c++-mode . 3)))
(add-hook 'c-mode-hook 'turn-on-font-lock)

Also, if you run yorick under Emacs's shell mode, you can use the distribution yorick.el file to pop open source files positioned at the proper line when there's an error or at a help message, as well as getting very good command line recall and editing features.

I suspect that other C-aware editors would work too, but I haven't tried. I also haven't tried pretty printers, but they'd have a good chance as well (if you use a C-aware editor, it tends to keep your programs looking good enough without a separate pretty printer).

Source code tree tools will almost certainly not work, since they will not recognize Yorick's func keyword. Even if they did, many Yorick programs would resist that sort of analysis, since it is unlikely to be able to cope with hook functions and other dynamic scoping constructs.

Q Can I read files written with Fortran unformatted writes? They've got junk bytes in them.

The bottom line is that it's no harder to read a Fortran "unformatted" file with Yorick than with a Fortran program on a machine other than the one on which it was written. In each case, you must figure out how many garbage bytes Fortran inserts at the beginning of each record. Usually it's 4, as you said, but since no Fortran standard addresses the issue, there's no guarantee (it can even vary from one Fortran compiler to another on the same platform -- the Fortran standard does not mandate that a binary file written by one compiler be legible by another).

Anyway, the real problem is that the Fortran model is sequential, while Yorick's model is random-access; that is, in Yorick, you need to figure out the absolute byte-address of anything you are going to read. As long as you know the number of garbage bytes, this is not a problem -- you just add up the size of the data you read as you go along. For example, suppose you wrote a file with a first record consisting of two integers NX and NY, followed by three more records consisting of x(NX), y(NY), z(NX,NY). Furthermore, suppose your Fortran program was running on a Sun SPARC. The Yorick program to read this binary file would look like this:


/* the following must be adjusted to be correct for the platform
 * and compiler where your Fortran program was built and ran */
fortran_record_bytes= 4;
primitive_function= sun_primitives;   /* see Yorick/std.i for choices */
integer_type= long;   /* need int for 64 bit machines (Fortran weird) */
integer_size= 4;      /* look up sizes in Yorick/include/prmtyp.i */
double_size= 8;       /* 12 on Macs with some compiler switches */

f= open("fortran_file","rb");
primitive_function, f;
address= fortran_record_bytes;

nxny= array(integer_type, 2);
_read,f,address, nxny;
address+= numberof(nxny)*integer_size + fortran_record_bytes;
nx= nx(1);
ny= ny(2);

x= array(0.0, nx);
y= array(0.0, ny);
z= array(0.0, nx,ny);
_read,f,address, x;
address+= numberof(x)*double_size + fortran_record_bytes;
_read,f,address, y;
address+= numberof(y)*double_size + fortran_record_bytes;
_read,f,address, z;
address+= numberof(z)*double_size + fortran_record_bytes;

You can hopefully see what to do if a record contains several variables. In that case, I warn you to be careful about alignment; the default primitive_function alignments are for the C compiler on the platform, and you may need to modify them for Fortran which is basically unaligned (all alignments 1).

Beware that many Mac and PC compilers have switches that allow you to change the sizes of their data types, so none of the predefined primitive type setting functions in prmtyp.i may be correct -- you need to know precisely how the program was compiled in order to know how to read the file it produced. Write your own primitive_function in that case, of course. This is not really Yorick's problem, since the users of this software presumably enjoy the extra challenge posed by the lack of a standard.

The most subtle gotcha is that the Fortran standard mandates that an integer *not* be large enough to use as an array index on 64-bit machines like the DEC alpha, hence you need to leave integer_type variable. Of course, this is true of all Fortran programs you try to port to such a machine, so Fortran users there presumably enjoy this little thrill as well. (C's long data type *is* long enough to index all possible arrays, which is why that's Yorick's default integer type.)

It is a misfeature of Yorick that there is no way to query a file to find out the sizes of the data types in that file, hence you need to just look up the proper values in the prmtyp.i file.

A similar question...



> The only problem I have encountered with Yorick, and this is why I took the
> liberty to send you this E-mail, is that I can't figure out how to read
> data generated by a fortran unformatted statement on the same or a 
> different platform. Am I missing something very trivial?

No, this is not trivial, but the problem is not with Yorick. It is with your Fortran compiler. And with the ANSI Fortran standard which does not mandate the precise meaning of the "fortran unformatted statement" in terms of what bytes are written to what disk address. In contrast, the ANSI C standard mandates precisely what bytes result from the fwrite statement. If you know the offset in bytes from the beginning of your file where Fortran put your data, then Yorick can read it easily. If you use software (such as a Fortran compiler) which does not document the layout of the files it writes, then you've implicitly agreed with that software supplier that you *want* to only be able to read your files by means of his software. Personally, I refuse to use such software (You can find a byte-for-byte description of the PDB and netCDF files Yorick writes in the distribution file Yorick/doc/FILE_FORMATS.)

That said, the situation is probably not as bleak as it sounds. You have at least two choices: (1) To figure out the layout of the files you are writing with Fortran, and (2) to add a set of Fortran I/O routines to Yorick. Option (1) is probably easier, but (2) might not be too hard if you actually use a simple subset of all possible Fortran I/O statements.

The simplest guess is that Fortran simply lays each new object you write on the heels of the last one you wrote. In that model, all you need to do is to keep track of the order and data types of the objects you wrote -- the byte address of the next object is simply the sum of the total number of bytes written so far. The only problem is that Fortran I/O is often organized into "records" (although the "unformatted statement" you mention might explicitly *not* write records -- I don't remember). If your files have records, Fortran will insert a few bytes of its internal garbage before or after each record. You don't need to figure out what this means (it's not standard anyway, since one Fortran compiler need not be able to read a file written by another Fortran compiler -- what a standard!), but you do need to figure out how many bytes to expect, so you can skip over them. A second possible problem is alignment. Sometimes, an I/O package will always begin a write at an address which is even or some higher power of two -- this probably won't happen, but you should be on your toes.

OK, so what you need to do is to write a very simple file -- generated by a few of your "fortran unformatted statement" calls with a small amount of data that you can easily recognize. On the machine where you wrote the file, you may be able to use the UNIX od utility (the GNU version will show floating point data) to figure out exactly where your Fortran program put your data. If you can't do that, you can use Yorick on any machine where you have it to explore the file. Suppose you wrote your file on a Cray YMP, and you move it to your Pentium to look at it. You would open the file with the following commands:


     f= open("your_data_file", "rb")
     cray_primitives, f

You can find all of the known machine types in the distribution file Yorick/std.i, with complete descriptions in Yorick/include/prmtyp.i. The comments at the top of the latter file tell you how to interpret the install_struct call arguments -- the only one you will need to worry about is the "size" argument, which tells you how many bytes are in each primitive object you write (if you didn't know already). (For example, cray_primitives calls raw_cray_p, whose source you can look up in prmtyp.i.)

The Yorick function _read is how you look at your file. It will translate formats for you automatically, but you have to tell it what address to read from. If you write out, say, 10 real*4 numbers first in your file, you would read them into Yorick with:


     my_numbers= array(float, 10)
     _read,f, 0, my_numbers      // 0 is the byte address
     my_numbers

If the numbers don't look correct, try a few other addresses near the beginning of your file:


     _read,f, 2, my_numbers; my_numbers
     _read,f, 4, my_numbers; my_numbers
     _read,f, 8, my_numbers; my_numbers
     _read,f, 12, my_numbers; my_numbers

and so on, until you find them. The offset of the first thing you wrote probably represents the number of bytes in your Fortran record headers, although the first record could be special. Repeat the procedure to find the second set of numbers you wrote, which should begin at an address near 10*(size in bytes of a real*4 on the machine where you wrote the file, as you can find in prmtyp.i if you're not sure) beyond the address where you found the first set. It usually doesn't take to long to decipher the file format this way. Then, you can construct a Yorick program which mirrors your Fortran program to read back the things you wrote out.

After all this, you'll appreciate the PDB files Yorick produces with its save/restore commands.

Q How can I take the reciprocal of all the elements in a matrix, leaving the zeroes alone?

My own favorite way to avoid zero divisors is this:


    y_over_x= y/(x + (x==0));

This does not "leave alone" the zero divsors, but often you don't care about those values. A more sophistocated alternative is something like this:


    mask= (x==0);
    y_over_x= (!mask)*(y/(x+mask)) + mask*(correct_limit_for_0_x);

If you find yourself doing this a lot, you should probably rethink your algorithms or inputs a bit. You might also want to investigate the "where" and "merge" functions (many examples in series.i and bessel.i), although these are a little too fancy (and slow) for simple fixes.

Q Tell me more about reading arbitrary binary files.

Ouch! I'm afraid there is not a lot of documentation on that. The only thing I can recommend is that you study the Yorick/include/netcdf.i file, which does what you want for netCDF files. The file format itself is described in Yorick/doc/FILE_FORMAT. I have several other examples, but they are for Livermore proprietary file formats and I can't distribute them. The functions you need (in Yorick/doc/std.doc) are:


     open(filename, "rb")   open a file in binary mode (also "wb", "r+b")
     _read                  raw binary read
     _write                 raw binary write
     install_struct         to set binary data formats if foreign
       probably you will only need sun_primitives, dec_primitives, etc.
       which call install_struct appropriately -- see use of xdr_primitives
       in netcdf.i.  After this call, _read and _write will take care of
       any format conversions automatically

These commands will suffice for simple files (e.g.- one or two images with some header information) that you are content to simply read into Yorick all at once with a single function call.

If you have a more complicated file which you want to be able to address by "variable names", like the PDB files Yorick's save and restore commands write, or like netCDF files, you will also need the following:


     add_variable
        perhaps using structof and dimsof to construct arguments
     add_record
     add_next_file

These are used heavily in netcdf.i. The basic idea is that you need to use _read to figure out the name, data type, dimensions, and byte address of a variable in the file. A single call to add_variable then adds that variable to Yorick's data structure describing the layout of the file. The add_record and add_next_file functions are for yet more complicated files which record time history information. If you use Yorick's C-like data structures, you might also need to use add_struct.

PS- The "clog" business is not really related -- it is just a human-readable dump of Yorick's internal data structure describing the file. Try the dump_clog function on Yorick save files if you want to fiddle with it. If you have a lot of *identical* files, you could write a clog file describing their layout, then use the 2nd argument to openb to use your hand-written clog file to avoid having to call add_variable, etc. yourself. I've never had such a situation myself.

Q How does Yorick handle the different byte orders and floating point formats?

You asked how Yorick handles the problem: Partly by a compilation of the characteristics of known systems (such as yours), and partly by a small, portable C program which computes the required information.

First, I'll describe the program. It's called fmcalc, and is in the Yorick subdirectory of the Yorick distribution (yorick-1.0.tar.gz at wuarchive.wustl.edu:languages/yorick or ftp-icf.llnl.gov:pub/Yorick). fmcalc is unrelated to the rest of Yorick; feel free to grab any part of it which interests you. The output of fmcalc is in the form of a header file which Yorick includes (obviously not directly useful to you). The section (4F.) in Yorick/doc/FILE_FORMATS summarizes my notation for byte order (order_value there), which does not cover an arbitrary permutation of bytes like your notation or Stewart Brown's, but which covers all of the machines in your list.

The byte order calculation in fmcalc is quite simple (most of the difficulty is in extracting the floating point encodings) with one exception you might be interested in: On a Cray, sizeof(short) is 8 bytes, but only three bytes are actually used (the Crays have 24 bit address registers). Thus, when you are testing for byte permutations you need to be able to recognize the byte ordering from only a fragment of the full sizeof() bytes... Hopefully the Crays are unique in this respect, but the ANSI C standard certainly allows this, and I hadn't considered the possibility until it bit me!

The second thing is the tables of "known" machines. These are in Yorick/binstd.c, with the structs defined in Yorick/binio.h. Again, the byte ordering information is only a small part of the characterization I need. Cray 1, XMP, and YMP machines are listed. They are garden variety big-endians, and *every* data type (short, int, long, float, and double) is 8 bytes. A short is truncated to 3 bytes when it is actually referenced, and the other integers are restricted to 6 bytes for most arithmetic operations (but not logical operations).

Q Can I label contours with more than a single character- i.e levels?

This turns out to be nearly impossible. The codes I've seen which do it (e.g. some things like Basis based on NCAR graphics) require the ability to plot text at an angle, which in turn all but requires stroked text, neither of which the Gist graphics package can handle.

Besides, the result is nearly always ugly. It turns out that the graphics package is already unreasonably dominated by the contour routines, and making the addition you want would require worsening what is already a bad situation. If you try to write the code you want, you will also discover it has a large artificial intelligence component in trying to decide where to put the labels -- always a sign that you're heading in the wrong direction!

One more quick word about my graphics philosophy: I tried to make Yorick operate in two modes: (1) Quick and dirty working plots. The main idea behind legends and curve markers is to provide day-to-day diagnostic plots that get the information you want on your screen or on paper, *without* any pretext of being the most artistic possible picture, but trying very hard not to obscure any information. (2) To allow you to spend a much longer time massaging a particular picture in order to get out publication quality graphics. In my opinion, *neither* Yorick's occasional markers *nor* NCAR's snaking numerical labels are publication quality. To make high quality pictures, I usually turn off all markers and use different line types and widths to distinguish curves; I use Yorick's eps command to output these stripped pictures, then Framemaker to add annotations.

So here are my suggestions for your problem with contour ugliness:

(1) If you haven't used the plf command, give it a try. I don't use contours very much anymore, because I can almost always see what is going on much better with a filled mesh. The combination of a filled mesh plus a few *unmarked* contours at significant values only is very hard to beat for conveying a feeling for a function on a mesh. The demos have two examples:


       #include "demo2.i"
       demo2, 1

       #include "demo4.i"
       demo4

(2) If you really need a line drawing (so you don't want any filled meshes), I recommend removing the markers entirely and using line types and line styles to distinguish contours. Then you can write a figure caption which lets the reader figure out which contour is which. For example, you might put solid contours every 0.5 with four interspersed dashed contours:


       levels= 0.1*(indgen(13)-1);  // or whatever
       mask= abs((levels+1.e-7)%0.5) < 1.e-6;
       levs5= levels(where(mask));
       levs1= levels(where(!mask));
       plc, z, y, x, levs=levs5, marks=0, type="solid";
       plc, z, y, x, levs=levs1, marks=0, type="dash";

Then say in the caption, "The first solid contour is at 0, with dashed contours every 0.1 and solid contours every 0.5 thereafter." Particularly significant level values can be marked with a third or even a fourth style. Line widths are also effective -- a wide contour will really grab the reader's attention.

[The above predates yorick 1.3. Some people won't take no for an answer, so if you really want plot labels you can now include "plclab.i" from the distribution.]

Q How does Yorick deallocate memory when objects are no longer referenced?



> Some further questions about objects : how they are deallocated ?
> if I build a struct containing a pointer an write
> struc.ptr = &span(1,100,100)
> struc = []

Yorick objects all have a use counter, so the object created by span(1,100,100) has one use which is "owned" by the pointer. The data structure "struc" itself also has one use, which is owned by the variable called "struc". When you set struc=[], the use counter for the old value of the variable struc is decremented, becoming zero in this case and triggering the deallocation of that object. Non-pointer members require no special deallocation, but pointer members do; when struc.ptr is destroyed, the use counter for its pointee decrements, triggering its destruction, and so on.

Yorick's use counters are much more efficient and less buggy than the current computer science solution -- garbage collection. The only drawback is that you can't have a ring of Yorick pointers (e.g.- a linked list in which the last link points back to the first link). I consider this a negligible limitation; even if there were a problem that couldn't be solved as elegantly any other way (I doubt the existence of such a problem), I would still choose use counters over garbage collection (for example, I just had to wait out a five second pause while Emacs collected its garbage).




> aptr=&span(1,100,100)
> struc.ptr = aptr
> aptr = []
>
> would struc.ptr still point on a span(1,100,100) ?

Yes, when you assign a pointer, the use counter of pointee increments, hence after struc.ptr=aptr, span(1,100,100) now has *two* uses, so aptr=[] reduces it to *one* use, and it survives.

I originally thought I would use Yorick structures and pointers as much as I do C structures and pointers. This turns out not to be the case. The reasons you use structs in C programs rarely apply to Yorick programs, but I'm not sure I can easily explain why -- it's more a matter of programming experience. I can tell you that I enthusiastically use C structs and pointers (sometimes I even regret an overuse later, like making Dimension in Yorick a linked list instead of an array), but they just don't often seem to be what I need in a Yorick program.

I should also point out that neither structs nor pointers really "work" in Yorick programs. A Yorick struct by itself is fine, but it is not suitable for really heavy duty number-crunching use because Yorick spends a lot more CPU time extracting a member of a struct than it does extracting an element of an array. The basic problem is that Yorick must look up the offset of a struct member *at each reference*, very much unlike C where the compiler computes and inserts the proper offset each time. For "light duty" (thousands of member references instead of millions), Yorick structs are fine. But it turns out that Yorick's dynamic scoping rules -- which are totally foreign to C or Fortran programmers -- actually provide a better solution to the problems I solve using structs in C (or common blocks in Fortran). Yorick pointers also have a subtle problem, which is that the address and dereference operators (unary & and *) don't vectorize. As a result, pointers are again only useful for "light duty" in a Yorick program. In fact, their main use has turned out to be a side effect of the fact that each element of an array of Yorick pointers can point to an object of a different data type or dimensionality.

So it's OK for you to keep thinking in terms of structs and pointers in Yorick codes, but I predict you will use them a lot less than you think you will.

Q How does the performance of Yorick degrade as the number of objects increases?

Yorick variables are looked up in hash tables *at parse time only*; thereafter they are refenced by index. The only performance penalty is an extra level of indirection, which you pay independent of how many variables there are. Since Yorick creates and destroys temporary arrays very frequently, it may be possible to write a Yorick program which spends most of its time in malloc and free operations. I have seen malloc and free take up to 10% or 20% of a Yorick program. It turns out that memory manager performance on most UNIX workstations is rather poor... Anyway, the answer to your question is that the real performance penalty to worry about is malloc and free in creating temporaries in Yorick expressions, and not the number of Yorick variables. In practice, I would be very surprised if the kind of problems I think you are talking about overwhelmed Yorick by sheer complexity of number of objects.

Q Can compiled routines call the interpreter?

The short answer to the question in your subject line is, "No, compiled routines cannot call interpreted routines."

I actually agonized over this one for a long time. Yorick's model for its "virtual machine" has a lot of pieces that are aimed in this direction, but it turns out that, deep down Yorick is just not as introspective as a Lisp engine (something like Scheme could easily do what you are asking). It turns out there are many very subtle problems with doing this which would have impacted both the performance and the simplicity of the code. (I know it may not look simple, but believe me, it could have been a lot worse!) When tough choices arose, my rule was always to choose performance over elegance, simplicity, or anything else.


>From a pure performance perspective, calling an interpreted routine
from a compiled one makes no sense at all -- it is equivalent to
having a high performance outer loop with a very, very slow inner
loop: crazy!

Okay, I appreciate that you have a compiled routine you really like that does just what you want and takes a function pointer -- that's why I agonized over this one. Maybe I'll be smarter in a future version of Yorick, and see a way to call an interpreted function from a compiled one, but it's not going to happen real soon.

Three basic possibilities occur to me:

(1) My favorite is that you bite the bullet and create a Yorick version of your C or Fortran nonlinear least-squares fitter. If you do, I would love it if you would share the result with me -- a general purpose routine of that sort could make a nice addition to the next release of Yorick. In fact, if you want to just give me the algorithm (or your C or Fortran code), I would have a go at it myself. In any event, it could be a learning experience for me and get you the code you want fairly quickly. *WARNING* Before you go this route, I strongly recommend that you code up your fit_function_interpreted and do some timing tests on it to ensure that interpreted speed will be fast enough for your eventual needs. If not, you might run it by me to be sure you haven't missed any vectorization opportunities.

(2) Consider the possibility of writing a parameterized version of your fit_function_compiled, in which you call the interpreter to set the parameters for the particular case you want to do today. Then a separate interpreted call can run your compiled least squares fitter, which need never call an interpreted routine. You can set a *lot* of "parameters" with a Yorick program, so the fit_function_compiled could perform a *very* general task. Furthermore, adding a new fit_function_compiled in a new custom version of Yorick would be very easy once you have a working model.

(3) Scheme (see prep.ai.mit.edu:/pub/gnu or the various comp.lang newsgroups) interpreters may do what you want. Also, my arch-rival Basis (ftp-icf.llnl.gov:/pub/basis) is a (big and slow) interpreter which can do what you want.

Good luck, and let me know if you do write (or want help to write) a Yorick nonlinear least squares fitter. You might check romberg.i or rkutta.i in the Yorick distribution (in Yorick/include) for vaguely similar Yorick code.

Q The ternary (condition? true:false) doesn't work for vectors, but loops are slow? How do I handle this?

The ternary operator (condition? true:false) is problematic when true and/or false are array valued expressions. The problem is that the C operator guarantees that only one branch (either true or false) will be executed; when true and false are arrays, *both* expressions must always be executed, which means that constructs like:


     (x!=0? sin(x)/x : 1.0)

are bound to fail with a floating point error if x is an array which contains a 0.0 value, even though that appears to be what the statement is trying to avoid!

Therefore, I restricted the ternary operator in Yorick to scalar values. There is a *function* called merge2 which does what your supermongo interpreter does with vector ternary operations:


     l= (k>50) ? i+1:k+1         in supermongo
     l= merge2(i+1, k+1, k>50)   in Yorick

I usually do this instead of merge2:


     mask= (k>50);
     l= (i+1)*mask + (k+1)*!mask

because this will (I believe) execute faster than the merge2 function. You are correct, *never* write a loop in Yorick! Or rather, be very suspicious of *every* loop! There are very few trivial things like this which require loops.



>a loop to obtain the same result which is very inefficient. 
>One remark: the way supermongo
>does the job can be made better: by calculating the expr1 and expr2 
>only when necessary 
>rather than to start by calculating both. This can save a lot of time 
>(but is not very easy to implement in an interpreter).

Actually, it *is* pretty easy in Yorick! If you want to do something like (x!=0? sin(x)/x : 1.0) (which I need more often than the example you gave), then you want to use Yorick's merge function. This requires a unique programming style as well; see "help, merge" for a usage description and the file Yorick/include/bessel.i for typical usage example.


     l=k*2 if (k>20) && (k<30);          in supermongo
     l= (k*2)(where((k>20) & (k<30)));   in Yorick (note single &)

This is one of several uses of index lists in Yorick; I suspect that index lists in Yorick are actually considerably more powerful and natural than this special syntax in supermongo, which strikes me as somewhat clumsy. For example, Yorick's index lists *do* apply quite naturally to matrices:


     list= where(k>20 & k<50);
     matrix= [i, sin(pi*i/20), cos(pi*i/20), tan(pi*i/20)];
     part= matrix(list,);

Here, matrix is a table of values of a variable and its sin, cos, and tangent; part is a table of only some of the values of the original variable (or actually of the related variable k). The rule in Yorick is that the result of the indexing operation has the values of the array being indexed, with the dimensions of the list used as the index (in place of the slot being indexed).

Unlike the supermongo notation, the list may repeat values, so that x([2,2,1,1,1]) is the same as [x(2),x(2),x(1),x(1),x(1)]. Thus, the list indexing is also useful for "colormaps" -- you can have a 2D *integer* (or even char) array "image" representing pixel values, and a 1D array "map" whose *index* is the pixel value, and whose value is the corresponding physical quantity (say Watts/cm^2); then map(image) is the 2D array of Watts/cm^2.

The third use of index lists is with the sort function (see help,sort). A slick usage example is the extension to multiple key sorting in Yorick/include/msort.i, if you want to see that in action.

Q Is there a simple way to read columnar data with Yorick's read?

The read function reads columnar data. For Example, if the file looks like this:


1.2   2.4  4.8
1.5   3.0  6.0
1.8   3.6  7.2

Then then following Yorick statements will read the three columns:


f= open("whatever_you_called_the_file");
col1= col2= col3= array(0.0, 3 /*rows per column*/);
read, f, col1, col2, col3;

The same sequence with create instead of open and write instead of read would write three columns (of zeroes here). If you don't know how many columns there are, dimension the colI arrays larger than you will need, and use the return value of the read function to truncate them afterwards (the read will stop when something other than a number is encountered).

You might want to read the help documentation for the read, write, rdline, and strtok functions. The files prefix.i and readn.i (in the Yorick/include directory of the distribution) have some grubby examples of reading ASCII, but simple columnar reads and writes were supposed to be too simple to need exceptional routines...

Also check out ascii.i , which expands on this functionality...

Q What is the proper technique for creating and referencing arrays (including arrays of structures) within other structures?

First let me apologize again for the current lack of Yorick documentation. Since it will be a while before I release a more complete manual, I feel I owe those of you on the mailing list a few words about user-defined data types -- structs -- in Yorick:

First, I want to caution you that Yorick is *not* C. You should use structs far more sparingly in a Yorick program than you would in a similar C program. In fact, I am not positive there is *ever* a good reason for using a struct in a Yorick program. If a function needs to return several arrays, or needs a large number of input parameters, I strongly recommend you use Yorick's dynamic scoping (extern statements) instead of attempting to bundle them into a struct (as you would cerainly do in a C program). It is true that this programming style will give your Yorick programs a "look and feel" more like E-Lisp than C, but I think it is also a more comfortable style for an informal interpreted language.

There is also a technical reason to avoid structs in Yorick -- Yorick's member extraction operation (x.member) is slower than other indexing operations. This is because the Yorick parser (unlike a C compiler) has no idea of the type of "x" (in fact, the type might be different from one call of the containing function to the next) ; therefore, the string "member" must be looked up in a hash table at run time to determine its offset. In C, a struct is powerful precisely because this lookup can be done once and for all at compile time.

So why does Yorick offer structs at all? Two reasons: (1) My model for a binary file (save/restore or openb functions) is that the file handle be usable as if it were an instance of a struct (a file handle is both a struct instance and a struct definition). (2) You might need to call a compiled C routine which accepts (or returns) a C-struct. Because of (2) Yorick structs are guaranteed to be packed in precisely the same way as the equivalent C-struct (when compiled by the C compiler used to build Yorick).

Now that I've told you not to use structs, I'll tell you how to make a struct:

First, you need to define the struct, by means of a struct statement. Since a struct statement is executed once to build the corresponding struct definition, it may not appear inside a Yorick function. Logically, the syntax should be:


typename = struct { long n; double x,y; }  // WRONG!!

But I decided to make it look more like C instead:


struct typename {
  long n;
  double x, y;
}

Unlike C, this statement is executed in order to create the variable "typename" (silently overwriting any previous definition -- even if the variable previously contained data or a function definition). The variable "typename" contains a "structure definition" object, which is not at all the same as a "structure instance" object. The basic data types (char, short, int, long, float, double, complex, string, and pointer) are also "structure definition" objects. Such an object can be used in only a few ways; the most important are as the first argument in a call to the array function, and as a function:

As a function, a "structure defintion" object is the type converter for that data type: Hence long(3.14) returns the long integer 3 and sqrt(complex(x)) will not fail for x<0. Unfortunately, I was unable to figure out what "type conversion" meant for compound types, so this construction can only be used with a nil argument to make a scalar instance of the struct with all its members 0:


instance= typename();

After this, instance.n, instance.x, and instance.y are all zero. However, the typename "function" also accepts keyword arguments to allow you to set individual members to non-zero values, e.g.:


instance= typename(n=4, y=-31.4);

After this, instance.n is 4, instance.y is -31.4, and instance.x is 0.0.

To get an array of structure instances, use the array function:


instance= array(typename, 3, 4);

After this, instance is a 3-by-4 array of typename instances; instance.x is a 3-by-4 array of doubles (all zero). Expressions like instance(2:3,1).x or instance.x(2:3,1) or instance(2:3,).x(,1) all work "as you expect", either in an expression or as the left hand side of an assignment statement. (If a member of a struct is itself an array -- a rarety, since the dimensions of such an array would be fixed unlike typical Yorick arrays -- the member dimensions will vary faster than the dimensions of an array of instances.)

I'll mention three other functions which can meaningfully accept typename as their argument -- structof, typeof, and print (including an implicit print statement).

So, as I say, you are probably making a mistake if you are trying to use a struct in a pure Yorick program. But feel free to experiment, and if you think you have a really slick use of a struct -- where no other Yorick construction seems to work quite as well -- by all means tell me about it! I think I can show you a better way, but if you can prove me wrong, you will teach me something important.

Q Can you pass a pointer to a compiled function to another compiled function?

The answer is (unfortunately), there is no way to pass a pointer to a compiled function to another compiled function. I don't mean you can't do what you want, but you can't do it with function pointers.

-----here is my excuse-----

What you suggest would violate the ANSI C standard, since a Yorick "pointer" is a C "void *", and a C pointer to a function may not be cast to a void* -- void* is reserved for pointers to data, which conceivably would not fit into the same space as pointers to functions (although on all machines Yorick runs on now they do).

My real problem was that Yorick can transparently move data between memory and PDB files, so that a pointer needs to point to an object that I can write to disk and later read back on another architecture. Since writing a portable compiled function to disk is clearly out of the question, I can't ever make function pointers work. I also punted on interpreted functions -- although these could in principle be written to portable disk files, in practice the information required to link the external variables is very dependent on the exact state of Yorick at the time the interpreted function was defined.

Nevertheless, the Yorick language really does need pointers to functions, so that is one of the few things I am thinking of adding to the language. (Yorick's pointers in general don't work quite as well as I'd hoped.) That still won't help you, since I certainly won't be implementing that feature using C function pointers.

-----end of my excuse-----

So now, what can you do?

Your easiest option is to write a wrapper for dneqnf, so that the interpreter never sees the function pointer argument:


extern wneqnf;
/* PROTOTYPE FORTRAN
   void wneqnf(long, double array, long array, long array, double array, 
   	       double array, double array)
*/

You write wneqnf to be a big case statement, switched on its first argument, which calls the actual dneqnf with a different objective function for each "function index". Since you need to declare each possible objective function to Yorick anyway, this wrapper is not much more programming effort than you are already expending. Okay, it's ugly, but I would chalk it up in the "crude, but quick and simple" category, which is honorable enough. In fact, if you don't need to invoke each of the objective functions using the interpreter separately, you don't need to declare those to Yorick at all...

A second option is to maintain a fiction that there is only a single objective function, but to provide separate interpreted calls to set and query which actual function is currently "installed". This approach has the advantage of minimizing the interpreted interface, since you would always have only four functions callable from the interpreter, no matter how many objective functions you might define:


   find_optimum              (calls dneqnf)
   objective_function        (calls generic objfcn)
   set_objective_function
   get_objective_function    (or you could keep track in the interpreter)

The virtual "objective_function" call might be a little tricky to program in Fortran (I don't think you're allowed to put a function pointer in a Fortran saved common block, so it might have to be a computed goto like wneqnf above), but it would obviously be easy in C. I guess I like this second approach a little better than the first, but I'm not quite sure where you need to get to...

By the way, the Fortran integer data type is equivalent to the C/Yorick data type "long", *not* "int". The "int" data type in C *might* not be large enough to hold an array index. More importantly (and because of this size issue), int arrays are very rare in Yorick; unless you've worked hard to make them, you will find your integer arrays are of type "long" in Yorick. The problem is that when you call a compiled function which has an argument declared "int array", and you pass it a standard Yorick long array, you will trigger a type conversion that will copy the data. Since you are probably using some of these as output arrays this can actually break your code, in addition to slowing it down. With Fortran, there is no way you can ever have a variable which corresponds to Yorick's (or C's) "int" data type -- you should *always* use type "long" for integer or logical arguments in a Yorick "PROTOTYPE FORTRAN" comment.

Q How do I save to a PDB file a variable whose name is in a string variable?

You did find the appropriate level below the save/restore interface. Someday I'll document how to use it. The relevant functions are called add_variable, add_record, add_member, and install_struct (you need the latter two only if you are working with data structures). To write a variable x to a PDB file with a name computed_string, you could do this:


     add_variable, f, -1, computed_string, structof(x), dimsof(x);
     get_member(f, computed_string)= x;

The first line only affects the *description* of the file (as you discovered); the second line actually writes the data. Note that the get_member function is precisely equivalent to the syntax f.text_of_computed_string, except that you don't need to be able to type the text itself if you use get_member. I admit this looks a little funny, but the parser actually works by translating f.x into get_member(f,"x") (so x is parsed not as a variable name, but as a string literal when it is the right operand of "."). This underlines the fact that member extraction -- which is a mere indexing operation in C -- is grossly less efficient in Yorick (not that it matters when it is going to trigger a read from a disk file). The problem, of course, is that Yorick discovers the data type of f only at runtime.

Q Why should I use "#include filename", not "include, filename"?

Your problem is in writing code that actually executes when you include it. This is strongly discouraged in the Yorick manual precisely because you have to think about what you are doing so carefully. Here is the simplest example of such a failure:


if (0) write, "this wont print";
else write, "this will print";

This is correct Yorick code, but it doesn't work if you type it straight at the interpreter, because when you hit the carriage return at the end of the first line, it *executes immediately* -- that's what it means that Yorick is an interpreter! So, the first Yorick program:


if (0) write, "this wont print";

has already been executed and forgotten when you begin typing the second line:


else write, "this will print";

By itself, this line is *not* a legal Yorick program, since there is no "if" clause associated with the "else".

Q "WARNING bad member descriptor in PDB chart" when saving a struct.

There is a workaround for the bug you found. Just be sure you write out the data structures in the proper order *before* you write any data (or write data in order "up the hierarchy"):


struct A { long i,j; }
struct B { A x; long i;}
s= B()
save, createb("oops.pdb"), A,B, s

works, but just


save, createb("oops.pdb"), s

fails. Also, if you did

t= A(), then this would work:


save, createb("oops.pdb"), t, s

but this would fail:


save, createb("oops.pdb"), s, t

Q What makes Yorick so fast?

Array syntax. Yorick operates by translating your input into a list of "virtual machine instructions" for a simple stack pointer plus program counter machine. Each instruction is a pointer to a function which executes the corresponding instruction; the main loop (in its entirety) reads (see Yorick/task.c):


     while ((Action= (pc++)->Action)) Action();

The actions manipulate objects on the stack -- push a variable onto the stack, add the top two stack elements and replace them by the single result, and so on. I designed Yorick to get to the actual operation as quickly as possible, but an add operation, for example, must still fiddle around to determine precisely what type of objects are to be added, whether type conversion is required, the length of array operands, and so on.

All of that fiddling around wastes a lot of time, which is why Yorick is about 100 times slower than compiled code for simple scalar operations, and several hundred times slower for code involving explicit scalar array indexing. I believe that by hand coding the instruction fan-out (from generic Action() to Add_double_to_double()), it might be possible to get within a factor of about 50 of compiled code for simple loops. The commercial IDL interpreter is the only product I have seen which does any operation faster than Yorick -- it's array indexing operations are about 40% faster than Yorick's (see ./Yorick/include/test1.i). I don't know how Stern does it.

So without array syntax, the picture is pretty grim. With it, calculations that "vectorize" can run within a factor of 5 of compiled speed. I don't believe a non-trivial calculation can get much closer than this to compiled speed (./Yorick/include/test2.i is a non-trivial example of a highly vectorizing calculation). The additional factor of 5 is caused by two residual problems: The first is memory management -- Yorick needs array temporaries to hold intermediate array results. The more important unavoidable inefficiency is that an interpreter cannot make use of modern machine architectures which have many registers; a loop like this:


     for (i=0 ; i<1000000 ; i++)
       result[i]= a + b*x[i] + x[i]*y[i];

will be eaten alive by a compiler, which will use at elast a half dozen registers to minimize the number of slow fetch and store operations. The Yorick code:


     result= a + b*x + x*y

will be done by a sequence of very simple loops something like this (actually not quite this efficiently -- it doesn't have a separate case for multiplication by a scalar):


     ......
     for (i=0 ; i<1000000 ; i++) temp1[i]= x[i]*y[i];
     ......
     for (i=0 ; i<1000000 ; i++) result[i]= b*x[i];
     ......
     for (i=0 ; i<1000000 ; i++) temp1[i]= temp2[i]+temp1[i];
     ......
     for (i=0 ; i<1000000 ; i++) result[i]= a+temp1[i];

In practice, with modern RISC workstations, this piece by piece approach costs at least a factor of 5 in speed for a moderately complicated calculation. That's a lot better than a factor of several hundred, but I certainly don't claim you will be able to do all your programming in the Yorick language!

There are many interesting small problems that you can do with 20% or even 1% of the speed your computer will deliver (20% of my 75 MHz Pentium is still 1.5 million floating point operations per second). A more realistic attitude is that you can write several compiled modules to do the real number crunching for your problem (e.g.- tracing millions of rays through a scene, or solving huge transport matrices, or evolving wave equations), then use the Yorick interpreted language to set up input for the crunchers, pass intermediate results from one cruncher to another, and help you to analyze and understand the final results.

Q Sometimes when I plot a function, the x axis label(s) has question marks in the fields? What does this mean?

Yes this is deliberate. What has happened is that you have told Yorick to make a plot where the limits run from, say, 1.234567890123449 to 1.234567890123451. Especially on the x-axis, there is no way to print labels of this length without the labels overlapping and being illegible. (For y-axis labels, they eventually run off the left edge of the paper, rather than overlapping, but the problem is the same.) Gist handles this conundrum by listing only the final two digits of such a monstrous number on the axis itself, marked with a question mark to let you know something is weird, with either an X0= or a Y0= field (or both) to tell you what the rest of the digits were. In this example, the labels on the axis might read:


        ?90   ?95   ?00   ?05   ?10

with an additional X0= field below:


        X0= 1.23456789012345??

The two question marks show you the location of the two digits that appear in the axis labels themselves (note my convention for wrapping through 00 - sigh).

There are two answers to your question about how to change the number of digits an axis label is allowed to have before the ??s start:

(1) This is controlled by a data structure in the Gist graphics package. Currently, the only way to effect this data structure from Yorick is to switch to a different graphics style sheet using the style= keyword in the window function (try help, window). None of the style sheets I supply in the distribution goes in the direction of allowing *more* digits (vg.gs is a more attractive style for viewgraphs and journal figures which uses a huge font and correspondingly *fewer* digits).

The file $(prefix)/Yorick/gist/work.gs (or Gist/work.gs in the distribution) can be used as a template for creating your own style sheets - it contains brief descriptions of the meanings of the various parameters you must specify. The ticks.horiz.nDigits and ticks.vert.nDigits fields, in particular, set the number of digits Gist will allow in an axis label before it overflows into the ?? mode.

I caution you that if you increase nDigits, Gist will sometimes create overlapping axis labels; you will also need to reduce the tick density (nMajor), and/or reduce the font size in order to avoid that. In general, you will find it very difficult to design a new graphics style sheet, and to certify that it will never produce overlapping labels or other ugliness. If anyone takes the trouble to do so, I would appreciate such contributions; I've only made the half dozen in the distribution. The only ones I use personally are work.gs - the default - and vg.gs or nobox.gs for publication quality graphics. Many people like boxed.gs better than work.gs. (Not me -- I read Tufte's "The Visual Display of Quantitative Information" -- "minimize non-data ink" is a very good rule.)

The difficulty of creating a graphics style sheet leads to the second and better answer to your question:

(2) Don't do that!

When you see ?? all over your plot, Yorick is trying very hard to tell you that there's just no way to polish a turd!

The obvious thing to do is to replot your data after subtracting the mean value; that is, if you made the ugly plot with:


     plg, y, x

(and the ?? were in the x-axis labels), then the try again using this:


     plg, y, x-avg(x)

Occasionally, there is a "meaningful" large offset that you may wish to subtract instead of just avg(x). Whatever. The point is that a set of very long axis labels is impossibly redundant and doesn't look good, even if you could find some way to cram them all in.

Q When exactly does the x-window screen redraw?

I tried hard to minimize the number of redraws, but I have noticed that some operations seem to redraw more than necessary. However, the problems I've noticed have to do with resizing the X window, so hopefully they aren't related to what you're experiencing...

So, here goes:

None of the primitive plotting functions plf, plc, plt, etc., or the mapping functions such as limits, gridxy, etc., draws anything at all. They merely add items or set properties in the display list. So much of the question boils down to, "When does Yorick walk the current display list?"

However, there is an additional complication. Some changes do not damage the existing picture, while others do. An example of a non-damaging operation is adding a curve to a plot, when either the limits are set to fixed values, or the curve does not happen to extend beyond the limits of the other objects which have been plotted. If Yorick (Gist actually) detects a non-damaging change to the picture, it will not redraw any existing objects. However, any change which affects the plot limits will force the screen to be cleared and every object to be redrawn at the new scale. Damage in an X window is not limited to things you do inside Yorick; you can also damage a picture by covering and then uncovering it with other windows, iconifying it, resizing it, and the like. Whether this forces Yorick to walk the display list depends on whether your X server generates any "expose events" that Yorick feels may require it to redraw.

There are other subtleties. One is the palette command. If you are using the default shared color X window setup, rather than the private palette option (private=1 keyword to the window command), changing the palette forces a redraw, since the server changes the pixel values in an unpredicatable way. With private palettes, Yorick controls the pixel values and no redraw is necessary (you must use private mode to do "color table animation" in Yorick). A more insidious subtlety is selecting one of the boxed styles (style="boxed.gs"). These feature tick marks which extend into the plotting viewport, and which the users of that style want to lie on top of anything plotted in the window. Therefore, anything you add to such a plot requires a complete redraw, since it may have covered a tick mark (that's one reason the outward pointing ticks are the default). I could partially repair that behavior by just redrawing the ticks, but there is still a real logic problem with inward pointing ticks that you should be aware of, and right now I just erase the picture and start over.

Incidentally, the animate command completely changes Yorick's redraw behavior, but I recommend you stay away from that for the sort of thing you're talking about. Use the movie.i interface if you need it.

When does Yorick draw in the first place? Just before it delivers the next > prompt, Yorick checks to see whether the display list has changed, or whether some external damage has occurred, which requires a redraw. If a redraw is necessary, it next checks to see whether the changes require every object to be redrawn, or if it can start walking the display list at the point it left off last time. You can also force Yorick to walk the current display list in two ways: (1) Issue an fma command. This causes Yorick to check whether the entire display list has been drawn, just as it would do before the > prompt. Only after making sure the picture you see is current does Yorick delete the display list (the nominal part of the fma command). (2) Issue a redraw command. Unlike the other cases, a redraw always forces the display list to be walked from the beginning, the whole idea being that you see some damage that Yorick hasn't detected and you want to be sure it erases the screen and starts over.

So, what can you do to minimize the number of redraws?

(1) Use the default outward ticks (style="work.gs").

(2) If possible, set the plot limits to fixed values (as opposed to extreme values of the data) before you begin -- just after an fma, before anything is actually drawn. Yorick remembers limits past an fma, so if they aren't changing, you don't need to do this.

(3) Don't let Yorick return to the > prompt before you have completed your display list. There are two easy ways to do this:

(3a) Put your plot commands inside a function. This is especially effective if you need to do several similar plots:


> func plotit {
cont> plf,z,y,x; plc,z,y,x,levs=[za,zb,zc];
cont> add_labels,z,y,x,[za,zb,zc]; }
> plotit

(3b) Put your plot commands inside curly braces to prevent Yorick's virtual machine from immediately executing the first line of a series:


> {plf,z,y,x; plc,z,y,x,levs=[za,zb,zc];
cont> add_labels,z,y,x,[za,zb,zc]; }
>

Q How do I use the "machine primitives" commands to read binary data from other machines?


>   f1 = open("testdata.br","rb"); // the byte swapped file
>   byte_swapped_data = array(float, 1000);
>   _read, f1, 0L, byte_swapped_data;
>   // now some spiffy code to unswap the byte into another array

You're working much too hard.

You are correct that Yorick does not have an easy way to "equivalence" variables of different data types (you can do it with the reshape function, but I strongly advise you against that), and that bit mask and shift operations won't work on floating point data.

However, you really didn't want to swap the bytes, since if you then took both testdata.br and your Yorick interpreted code to a machine which didn't need the bytes swapped, it wouldn't run right. The portable way to do it - so that your Yorick program behaves the same no matter where it's running - is to recognize that it's the *data* file that is big-or-little endian:


   f1 = open("testdata.br","rb"); // binary file written on a Sun
   sun_primitives, f1;        // tell Yorick where f1 was written
   correct_data = array(float, 1000);
   _read, f1, 0L, correct_data;

This bit of Yorick code will run correctly on either a little endian (e.g.- 486, Pentium, DEC alpha), a big endian (e.g.- Sun, HP, SGI, RS6000), or a weirdo (e.g.- Cray YMP or J90) -- wherever you happen to move your testdata.br file. I'm assuming here that your file was written on a big-endian (like a Sun), and that you're running on a little-endian (like a 486). If it's the other way round, use dec_primitives instead of sun_primitives.

With Yorick's save/restore files, you don't need to do this, since the information about where the file was written is stored in the file, and Yorick automatically supplies the correct sun/dec/cray/etc_primitives call when you do an openb.

Besides sun_primitives (which is also appropriate for HP 700 series, SGI, and IBM RS6000 machines), you can also specify pc_primitives, cray_primitives, alpha_primitives, and several others. The binary data format descriptions are parameterized, so you can handle machines I don't know about, too (I just realized that there isn't an exactly correct version for a Linux box - dec_primitives is closest, missing only the fact that doubles are aligned to 4-byte boundaries, though that wouldn't matter for what you're doing). See the file Y_SITE/include/prmtyp.i for details (you don't need to explicitly include this file, as Y_SITE/startup/std.i contains the wrappers I intend people to actually call). For address calculations (to call _read), you may need to know the sizes of the various data types on various platforms, which you have to look up in prmtyp.i if you don't already know them. If you have the reverse problem - to write a simple binary file for an existing utility on some platform - the ???_primitive functions are also exactly what you need.

Q What is eq_nocopy for? How do I use it to save memory?

If you do something like this in Yorick:


   a= y= array(0., 10000, 10000);
   ...
   y= y+a;  /* or some other expression involving y */

you will in fact get three copies of the giant array, since Yorick does not notice that you are about to clobber the original value of y at the time you need to add 1 to it. (It is actually impossible to be certain you have found the final reference to the y array in a general expression, although it makes the interpreter look stupid not to notice in this example.)

There is a new (with 1.2) function, eq_nocopy, which allows you to explicitly work around this problem by writing the following function:


func last_ref(&x)
/* DOCUMENT last_ref(x)
     returns X, destroying X in the process.
   SEE ALSO: eq_nocopy
 */
{
  local y;
  eq_nocopy, y, x;
  x= [];
  return y;
}

Then, instead of y=y+1, you write:


   y= last_ref(y)+a;

Now you only get two copies of the monster.

Notice that something like:


   y= y+1;

always gets two copies, even if you write:


   y= last_ref(y)+1;

The reason is that Yorick first broadcasts the 1 out to the size of the y array before performing the addition. In this case (the more frequent case, I think), the last_ref function gains you nothing. In Yorick's defense, it does notice that the broadcast 1's array is a temporary, and reuses it to hold the result of y+1. You need to "think like a stack machine" when you write expressions -- the order of operations you specify can have a major impact on the amount of temporary space required. One general rule is to prefer (((( over )))), by which I mean that:


   y= ((((x+1)*x+2)*x+3)*x+4);

keeps at most two items on Yorick's stack, while


   y= (4+x*(3+x*(2+x*(1+x))));

pushes up to eight items onto the stack before the operations begin to pop them off. In this case, it doesn't matter because the objects on the stack are just scalars or pointers to the array x, but in very similar looking cases, the objects you leave dangling on the stack can be temporary arrays, in which case a factor of four savings in space might be important.

Although you don't normally try to manage memory in Yorick, if you are working with very large arrays, you might be able to save some memory by putting statements like:


   y= [];

at the exact point in your code that you no longer need the y array. This frees y immediately. Unfortunately, whether this actually helps depends on how your operating system's memory manager works in detail. The only sure way to avoid disasterous fragmentation is not to work with such huge blocks of memory.

I have to say again, therefore, that the correct solution to the problem of huge arrays is the same in Yorick as in any language: Chunk your calculation! You won't sacrifice any performance with such gigantic calculations if you rewrite them with explicit loops that break them up into a dozen or two passes, but you can often fit them into a tenth or less the memory by doing that. Obviously, this makes your interpreted program uglier, but you are explicitly trying to push your hardware right to its limit and you are going to pay for that by a certain amount of ugliness. I have to say, though, that for all but the most trivial programs, breaking the problem into chunks does not have much of an impact on its readablity. You will also find that you can go to considerably larger problems no matter how much memory you are willing to buy.

Q How are Yorick pointers implemented? What are they useful for, especially wrt binary files?

A Yorick array has a header containing its data type, dimensions, and a use counter and optable, immediately followed by the data. Yorick pointers point to the beginning of the data, so they are numerically identical to a C pointer to the same data and can be passed directly to C routines accepting pointer or array arguments. However, Yorick expects that by backing up by the size of its header, it will find the header and be able to determine the data type and dimensions associated with the pointer. This fairly innocent-looking (and unavoidable) restriction makes Yorick pointers amazingly less useful than C pointers, because it eliminates all pointer arithmetic. (I may change the details of this scheme -- I'm experimenting with looking up pointer addresses in a hash table instead of the current back-up-to- the-header scheme. That will make it easy for Yorick to "adopt" a pointer to C data as one of its own arrays. But Yorick will still not support pointer arithmetic.)

Part 2 of the question has to do with binary files. One of Stewart Brown's main technical goals in his PDB format was to be able to read and write arbitrary C data structures, including the typical struct containing members which are pointers to other structs or data arrays. My original goal was to make Yorick's data types rich enough to handle Stewart's model. Unfortunately, I didn't like Stewart's pointer implementation (originally it couldn't support partial writes, and partial reads were very inefficient) in the PDB file format, so I designed my own and use that as the default when Yorick writes pointers to a file.

Unlike the numeric data types, there is no single obvious way to write pointer data to a file. Actually, as far as the data itself goes, I believe there is an obvious technique -- on disk, I write a pointer as the disk address of the data it points to (that's where I broke with Stewart). The only problem is, where to put the metadata describing the type and dimensions of the pointees? Currently, I use a back-up-to-the-metadata scheme that parallels what I do in memory. The disadvantage is that this intermixes metadata and data in the file; again I'm thinking about moving pointee metadata to reside with the rest of the metadata, using the data address as the "name" of the variable in a hash lookup.

I recommend that you work hard to avoid using pointers in your binary files. It is better to store your data as ordinary arrays of numbers, and provide your own (usually short) Yorick read and write functions to reconstruct and deconstruct whatever pointers you use in memory. Still, when you are writing data to be read back within a few months and not archived for years, or when the reconstruction routines are prohibitively complex, go ahead and write pointered data to your PDB (Yorick save/restore/createb) files -- it works and it's there to be used.

One important gotcha: There is currently no way in Yorick to read back only a portion of a pointee; Yorick's partial reads only work on top-level objects, not on dereferenced pointees. That is, a.b->c(5,4).d reads in the *entire* c array (even if it is 100x100) before it extracts element (5,4). If you had managed to get c written as a simple array, then c(5,4).d will just read element (5,4) -- possibly 10,000 times faster access to the data you need. Hopefully I'll get around to fixing this, but as I said, I discourage the use of pointers in binary files generally.

The real problem with pointers in binary files is that *other* programs have no chance of reading such a file, or more importantly, other file formats (such as netCDF or HDF) don't have any such concept in their data models, so you can't even easily translate such a file to something a program based on those popular data models will be able to handle. To make matters worse, things are rapidly changing. HDF is in the middle of a radical redesign to version 5.0, and netCDF version 4.0 will support things not easily readable by Yorick. I've been waiting for the smoke to clear before I attempt a redesign of Yorick's binary file package.

I know I should avoid explicit loops, but how, for instance, could I get the norms of an array of 3-vectors, r, without writing r2=array(0.,n); for(i=1;i<=n;i++) {r2(i)=r(i,+)*r(i,+);}?

r2= (r*r)(,sum);

Also, don't forget the abs operator -- if you'd wanted sqrt(r2) I'd've said abs(r(,1),r(,2),r(,3)).

I agree that vectorizing things sometimes obscures your intent. Unfortunately, it really is harder to think in parallel. For us at the labs, the real break was 15 or 16 years ago learning to program Crays. There are at least two steps in this process:

First, you need to learn when a loop represents operations that can be performed in parallel, versus operations which are actually sequential. The question to ask is whether the result depends on the order you iterate through the loop index; things which fail this obvious test usually can't be vectorized in Yorick, and you really will need a loop. In that case, you should hunt around for another dimension that *does* vectorize, so each operation within your loop is itself as big a vector operation as you can manage -- if you can make those inner vectors big enough, the outer loop performance becomes irrelevant. (In fact, for very large arrays you *need* to chunk up your loops even when Yorick syntax allows a fully vectorized expression.)

The second step is to realize that there are many ways to do a given task in Yorick. You can't let yourself get fixated on some slick syntax like a(,+)*b(+,) for matrix multiplication -- as you see, that syntax has some serious limitations. When you have a strong feeling that something can be vectorized, go ahead and spend the time to study it, or at least come back to it when you get a chance. I welcome examples of challenges you have faced -- if you overcome it, your answer can become part of the FAQ -- if not, perhaps your example will suggest some generic useful function I can add to Yorick. There are several candidate functions related to index lists, the histogram function, and the plfp function.

Currently, there is only one good idea for a new syntax (as opposed to merely a new function) to handle a vectorization issue (suggested by Ed Williams) which I'm pretty sure will make it into yorick-2.0:

    x(i,=,=,j,=,...)

will set the indices marked "=" equal to one another. This will make it much easier to do traces and other operations on diagonal elements, and turns out to be very easy to implement. At first it seems like it would solve your r2 problem as well: r(=,+)*r(=,+), but alas that is very very much harder to implement, since you are trying to relate indices in two different arrays. Note that (r(,+)*r(,+))(=,=) gets the correct answer, but at a huge cost in both temporary storage and CPU time. There are many examples of this, even with Yorick's current syntax.

A third step is to design your Yorick functions so that they accept and produce array arguments in a natural way. Often this is easier than either of the other steps, but sometimes seeing how the interface should work is the hardest part of program design.

How do I open a Basis-written .pdb file in Yorick, and vice-versa?

Try

#include "basfix.i"
help, obasis

basfix.i defines functions obasis, baset and baget allowing you to open and manipulate Basis written .pdb files with its @decorated variable names. The stripping is controlled by the flag "at_pdb_open".

Nothing special is required to write Basis-readable .pdb files with Yorick.

How do I define a function that requires no arguments?

If it doesn't return a value, you can use:

> func hello {print, "Hello World!";}
> hello
"Hello World!"
>

To be able to return a value, you need to provide a dummy argument in the function definition, which can be omitted in the function invocation:

> func hello(dum) {return "Hi";}
> hello
> hello(0)
"Hi"
> hello()
"Hi"
>

> func hello(dum) {print, "Hello World!"; return "Hi!";}
> hello
"Hello World!"
> hello()
"Hello World!"
"Hi!"
>

Note the difference between invoking the function with "hello()" and "hello", which do and do not return a value, respectively. These are the "degenerate" forms of the standard syntax "hello(arg1,arg2,..)" and "hello, arg1,arg2,.." respectively.

Is there an easy way to figure out what library functions are available in Yorick's include directory?"

You can just prowl around of course. However, the command

> library

at a Yorick prompt, prints the README in the include directory, which briefly describes the available packages.

Could you give some hints on "vectorizing" loops?

The basic rule in Yorick code is that if you loop over an array index, you are probably making a mistake. The usual tricks for vectorizing loops that people miss involve nested loops. The tricks involve index lists, referencing multidimensional data with a single index, and the histogram (!) function. Here are some examples:

The following examples are taken from a code that reads rows of a ragged matrix, where each row contains slightly different numbers of points. The file format was


row1 x y value value value ... value
row1 x y value value value ... value
row1 x y value value value ... value
row2 x y value value value ... value
row2 x y value value value ... value
row2 x y value value value ... value
row2 x y value value value ... value
row3 x y value value value ... value
row3 x y value value value ... value
...

where each line contains the same number of values -- all at one element of the matrix, and the first value is constant as long as you are in the same row of the matrix, incrementing as you move to the next row. The points can be read in a single Yorick read statement:

  tmp= array(0., 3+nTS, number_of_lines_in_file);
  read, file, tmp;
  rowNR= tmp(1,); /* [row1,row1,row1,row2,row2,row2,row2,row3,row3,...] */

Here is a very inefficient way (even though uses one Yorick vector construct inside the where function):

    i=1;
    coastI   = array( 0, 1);
    coastI(i) = 1;
    do {
	i += numberof(rowNR(where(rowNR==rowNR(i))));
	grow, coastI, i;
    } while ( rowNR(coastI(0)) != rowNR(0) );
    nRow = numberof(coastI);
    npy = coastI()(dif);
    grow, npy, lTS - coastI(nRow) + 1;   //last row;

Here coastI is [i1,i2,i3,...] where tmp(,i1) is the first element of the first row (i1=1), tmp(,i2) is the first element of the second row, and so on, and npy is [n1,n2,n3,...] where n1 is the number of lines of the file for the first row, n2 the number of lines for the second row, and so on. In the example, npy= [3,4,2,...] and coastI= [1,4,8,10,...].

The loopless way to compute the above important index lists in Yorick is to use the histogram function:

  ndxlst= (rowNR(dif)!=0)(cum) + 1;
  npy= histogram(ndxlst);
  coastI= (npy(cum)+1)(1:-1);

In the first line, (rowNR(dif)!=0) is 0 until we change rows, then has a single 1, then more zeros; (cum) sums these back, producing a list that looks like [0,0,0,1,1,1,1,2,2,...] in the example, and adding 1 produces [1,1,1,2,2,2,2,3,3,...]. This ndxlst is an extremely useful array -- it has one element for each line of the file (or second index of tmp), whose value is the row index for that line. The histogram function counts these to yield the npy array immediately, and coastI is related to the partial sums of npy.

Besides scalar array references like rowNR(0) or rowNR(i) -- which are always suspicious -- you can recognize two other problems with this loop. Both involve incorrect scaling; as written, the time for the loop increases with N^2 where N is the number of lines in the file, though the task should only take order N time. First, the do loop requires nRow passes (presumably proportional to N), and the where function must examine N values in each pass. Just because the latter is vectorized and runs relatively faster doesn't save you from the insidious N^2 scaling!

Second, and an even more common mistake, is the use of the grow function in a loop. Each time you call grow, the entire array you are building must be copied, so growing an array like this always involves a subtle N^2 scaling. I speak from hard experience -- I didn't understand this myself until I had done it several times and analyzed the disaster when someone tried it with large N. (In the case of grow, you can mitigate the problem by allocating large blocks, so that you call grow in only a small fraction of the passes through the loop.)

Next, consider the following inefficient method for collecting the data in the ragged matrix into matrices with equal length columns; transposing the data at the same time:

    nCol= max(npy);
    x = y = array( .0,          nRow, nCol );
    dat   = array( -99.99, nTS, nRow, nCol );

    for(r=1; r<=nRow; ++r)
    {
      x(r,1:npy(r)) = 
        tmp( 2 , coastI(r) : coastI(r)+npy(r)-1 );
      y(r,1:npy(r)) = 
        tmp( 3 , coastI(r) : coastI(r)+npy(r)-1 );
    }

    for(t=1; t<=nTS; ++t)
      for(r=1; r<=nRow; ++r)
        dat(t,r,1:npy(r)) = 
          tmp( t+3 , coastI(r) : coastI(r)+npy(r)-1 );

To eliminate all these loops, do this:

  nCol= max(npy);
  x = y = array( .0,          nRow, nCol );
  dat   = array( -99.99, nTS, nRow, nCol );

  c= indgen(numberof(ndxlst)) - npy(cum)(ndxlst);
  list= ndxlst + (c-1)*nRow;
  x(list)= tmp(2,);
  y(list)= tmp(3,);
  dat(,list)= tmp(4:0,);

There are two tricks here: first, to notice that x(r,c) is the same as x(r + (c-1)*nRow), and second, to use the ndxlst -- which is the list of row indices (r) for second index of tmp. The arcane looking expression for c takes [1,1,1,2,2,2,2,3,3,...] and produces [1,2,3,1,2,3,4,1,2,...], which are the column indices corresponding to each line of the file. The point is, you can use array syntax to compute all of the array indices in the loop simultaneously.

Next, let's look at another common situation; suppose we want to average the values of dat over its first two indices, but not include any of the places (marked with -99.99) where no data existed in the original file (the ragged edges of the matrix). Here is a way to do that in C that doesn't work well in Yorick:

    dean = array(0.,max(npy));
    for(j=1; j<=max(npy); ++j) {
	k=0;
	for(i=1; i<=nRow; ++i) {
	    for(t=1; t<=nTS; ++t) {
		if(dat(t,i,j)!=-99.99) {
		    dean(j) += dat(t,i,j);
		    ++k;
		}
	    }
	}
	if(k>0) dean(j) /= k;
    }

Here is a reasonable Yorick idiom for the same task, using a mask array instead of index lists this time:

  dean= dat(*,);  /* combine t and i indices */
  mask= (dean>-99.9);
  count= mask(sum,);
  dean= (dean*mask)(sum,)/(count+!count);

One previous example also used a mask: Remember (rowNR(dif)!=0)? Then, we formed partial sums with cum; this time mask(sum,) forms sums over a single index -- previously combined from two indices using the * pseudo-index.

Next, consider the operation of rescaling each row up to the maximum real element of that row. Mere translation of a reasonable C routine looks like this:

    xx= x;
    yy= y;
    for(r=1; r<=nRow; ++r) {
	
	dx = x(r,npy(r)) - x(r,1);
	dy = y(r,npy(r)) - y(r,1);
	length = sqrt(dx^2+dy^2);
	dx=dx/length;
	dy=dy/length;

	for(j=2; j<=npy(r); ++j) {

	    dx_loc = x(r,j) - x(r,1);
	    dy_loc = y(r,j) - y(r,1);
	    length_loc = sqrt(dx_loc^2+dy_loc^2);
	    xx(r,j) += scale*length_loc*dx;
	    yy(r,j) += scale*length_loc*dy;
	}
    }

In full Yorick idiom, here is the same operation:

  imax= dimsof(x)(2);
  list= indgen(imax) + (npy-1)*imax;
  dx= x(list) - x(,1);
  dy= y(list) - y(,1);
  length= abs(dx, dy);
  dx/= length;
  dy/= length;
  length_loc= scale*abs(x-x(,1), y-y(,1));
  xx= length_loc*dx;
  yy= length_loc*dy;

Here is one old trick and one new trick: The old one is construction of an index list to pick out the ragged edge of the original array, which allows dx and dy to be computed very quickly. The new trick is to recognize that the restriction of the inner loop index j to between 2 and npy(r) is really unnecessary; the values outside that range are garbage and won't be used anyway. In C it makes sense to avoid computing them, in Yorick it does not. One other small point is the use of the idiom abs(x,y) for sqrt(x^2+y^2), which is 3-4 times faster and works on the whole range of double precision numbers (x^2+y^2 overflows at the square root of the full range). Note that abs(x,y,z,...) is sqrt(x^2+y^2+z^2+...).

Here's a pecadillo:

    for(i=1; i<=nr; ++i) 
	for(j=1; j<=max(npy); ++j)
	    d(i,j) -= dean(j);

Which is identical to:

  d-= dean(-,);

Finally, lets look at the construction of a region array, to enable the ragged dat(t,,) to be correctly plotted with plf, plfc, plc, etc.:

    ireg = array( 1, nr , max(npy) ); 
    ireg(1,) = 0;
    ireg(,1) = 0;
    for(i=1; i<=nRow ; ++i)
	for(j=2; j<=max(npy); ++j)
	    if ( dat(t,i-1,j-1) == -99.99 ||
		 dat(t,i-1,j  ) == -99.99 ||
		 dat(t,i  ,j-1) == -99.99 ||
		 dat(t,i  ,j  ) == -99.99 )  ireg(i-row1st+1,j)=0;

Here is the same common programmimg task in Yorick idiom:

  ireg = array( 1, nr , max(npy) ); 
  ireg(1,) = 0;
  ireg(,1) = 0;
  missing= (dat(t,,) < -99.9);  /* == test isn't very safe */
  if (numberof(missing)) {
    ireg(2:0,2:0)= !(missing(2:0,2:0) | missing(2:0,1:-1) |
                     missing(1:-1,2:0) | missing(1:-1,1:-1));

I hope these examples help you to see that many loops that can't be obviously expressed using Yorick's vector syntax, can be expressed after a little thought. Here are the key points to remember: (1) Be suspicious of any loop when the loop counter is used as an array index. (2) Check to be sure your loop scales properly -- do N passes require order N^2 work? Often this signals a misapplication of array syntax inside a loop. (3) Use index lists and directly computed offsets into multidimensional arrays. (4) Use mask arrays. (5) The index functions like cum and dif often help to construct index lists -- don't assume their only duty is finite difference operations. (6) Can the histogram function help you?

With practice, all these idioms become second nature. I find they also help me to analyze what I really need to do in compiled code where I can't use them directly, so I will make the case that learning them is useful even beyond the confines of Yorick language programming.

How do I look at the current palette and relate the color index to the color?

Type

fma; pli, [span(0,1,200)]

to see the current palette; the x-axis is labeled with the char index you need to get that color in plfp, plmk, plf, pli, or the color= keyword for any of the line primitives.

Is there a function available that will return a directory listing as an array of strings?

There's a non-portable kludge: $ls >tmpfile and read the tmpfile. I've worked around the problem in the past by simply testing for the existence of files using open; as long as you can generate the name of a file which might be present (e.g.- a member of a family of files), that works pretty well, and is portable. The only thing you can't do that way is to generate some sort of a menu of possible choices -- but Yorick can't do that very well anyway, so I haven't been dying to fix the problem.

Could you say more about the *_primitives and machine representations?

alpha_primitives: appropriate for DEC alpha machines

sun_primitives: appropriate for Sun (except >10 years old), HP, and SGI workstations

cray_primitives: appropriate for Cray XMP, YMP, and J90 (but will go out since SGI merger)

dec_primitives: appropriate for DEC MIPS/Ultrix workstations closest approximation for Linux/i86 machines

mac_primitives and macl_primitives: another case where compiler switches change the machine floating point formats

pc_primitives may be reasonable for a modern Windows machine, I'm not sure

vax_primitives and vaxg_primitves: appropriate for DEC VAX machines (VMS), depends on what compiler swtiches you set when you compiled the program that wrote the file in question

The VAX and Cray are the only machines that are *not* IEEE format. There are two ways to vary within "IEEE standard" *besides* big-endianness or little-endianness:

(A) The lengths of the data types may vary. The most common deviations occur for int (2 or 4 bytes), long (4 or 8 bytes), and double (8 or 12 bytes).

(B) The alignments of the data types within structs. The various processors require addresses that are multiples of different powers of two for the various data types. This doesn't affect how they are written into files, except when different data types are members of a single data structure which is written to the file "as is".

So you can't describe a machine simply as "IEEE big-endian" or "IEEE little-endian"; that still leaves room for deviation both in number of bytes and alignments of several data types.

The only pre-defined little-endian primitive type functions are dec_primitives, alpha_primitives, and pc_primitives. Embarrassingly, none of these is correct for a Linux/i86 system: dec_primitives is the only usable one, differing only in the alignment for double (4 on a Linux/i86 machine, 8 in dec_primitives). So on a Linux box, you need to use dec_primitives or write your own linux_primitives.

You can easily write your own primitive type functions, just follow the models in Yorick/include/prmtyp.i (YSITE/include/prmtyp.i installed). The correct parameters for your machine and compiler are computed by the fmcalc program (Yorick/fmcalc.c in the distribution); when your build Yorick, this program generates the file Yorick/prmtyp.h, which you can examine to write your own, say, linux_primitives function. I should try to keep a complete library of primitive type functions for all known architecture/compiler combinations, but needless to say, this is pretty tedious.

See section 3F of ftp://ftp-icf.llnl.gov/pub/Yorick/comp-formats.txt for a description of the floating point parameterization I use in Yorick. A briefer (but still complete?) description is in Yorick/include/prmtyp.i in the Yorick distribution.

How do I manipulate strings in Yorick- e.g. indexing characters?

Yorick is not very good with strings. The function you want to extract substrings is called strpart. The others are strlen, strmatch, and strtok. The only string operations built directly into Yorick syntax are catenation (+) and alphabetic comparison (>, <, etc); everything else must be a function call. The strpart function is necessary because Yorick encourages arrays of strings, so that indexing operations are always assumed to be on the indices of the string array, not on the characters in the string. I don't really consider that a weakness, as Yorick is intended for numerical work -- you can program in Emacs Lisp if you want a good string-based interpreter.

That consideration aside, the only real hole in Yorick's string handling capability is the absence of regular expression search and replacement. I've considered adding that, but it's a very large amount of code (check out the size of the GNU regexp package for example) for something that is again very peripheral to Yorick's mission. Perhaps I should implement it as an optional package -- or someone else could...

The Yorick idiom for converting from a (scalar) string to an array of char is:

  ctext= *pointer(text);

Yorick's array syntax can be used naturally on the char array. The idiom for converting a 1D array of char back to a string is:

  text= string(&ctext);

Obviously, you should resort to this only when the compiled string functions I provide just won't work.

People often overlook the pr1, swrite, and sread functions as well, which are useful for generating strings from numbers and vice versa.

To show you some typical Yorick string manipulation, here is an example:


  > b= "abc";
  > strpart(b,2:2)
  "b"

The distribution file Yorick/include/string.i contains more examples. And here are a couple of "useful" tasks, typical of what Yorick does pretty well with strings:


  func number_lines(filename)
  {
    f= open(filename);
    lines= rdline(f, 25000);    /* read up to 25000 lines of f */
    lines= lines(where(lines));  /* throw away nulls after EOF */
    numbers= swrite(format="%5ld ", indgen(numberof(lines)));
    f= create(filename+".numbered");
    write,f, numbers+lines;  /* rewrite file with line numbers */
  }

  func pixmap_read(filename)
  {
    if (strpart(filename,-3:0)==".xwd") return xwd_read(filename);
    if (strpart(filename,-3:0)==".ppm") return pnm_read(filename);
    error, "unrecognized file extension";
  }

  func split_name(filename)
  /* DOCUMENT split_name(filename)
       returns [directory, basename, extension1, extension2, ...]
       where the "extensions" are separated from the "basename"
       in FILENAME by dots (".").
   */
  {
    cname= *pointer(filename);  /* convert filename to char array */
    list= where(cname=='/');    /* find any slashes */
    if (numberof(list)) {
      result= strpart(filename,1:list(0));  /* directory name */
      filename= strpart(filename,list(0)+1:0);
    } else {
      result= string();
    }
    result= [result];
    do {
      name= strtok(filename,".");
      filename= name(2);
      grow, result, [name(1)];
    } while (filename);
    return result;
  }

  func next_name(current_name)
  {
    len= strlen(current_name);
    for (i=0 ; i>-len ; i--) {
      c= strpart(current_name,i:i);
      if (c<"0" || c>"9") break;
    }
    if (i) {    /* current_name ends in digits */
      i= -(i+1);  /* number of trailing digits */
      n= 0;     /* read the number off the end */
      sread, strpart(current_name,-i:0), n;
      /* construct the next name in the numbered sequence */
      return strpart(current_name,1:-i-1) +
               swrite(format="%0"+pr1(i)+"ld", n+1);
    } else {
      /* just increment the final character */
      i= (*pointer(c))(1);  /* be sure to clip trailing \0 */
      return strpart(current_name,1:-1) + string(&char(i+1));
    }
  }

Only the first example touches on Yorick's real strength in dealing with strings, which is its ability to convert very large columns, especially columns of numbers. The thing I really want to be able to do is to read megabytes worth of numeric tables, which Yorick does pretty well.

As I said, the only tasks I've found that are very ugly in Yorick (I don't find any of the above examples over the top in terms of clumsiness) have to do with regular expression search and replace operations. The last two examples above are obviously straining the limits of easy string manipulation in Yorick.

I've got a file with NaNs in it. How do I read it in and filter them out?

A pox on people who do that. The only way I can think of to deal with the situation is to read the array twice -- once as a char variable, then a second time as a float or a double. Roughly like this:

x= array(char, 4 /*or 8 for double*/, dims);
_read, f,addr, x;
list= where(!(x!=nan_value)(sum,..));
x= array(float, dims);
_read, f,addr, x;
x(list)= bad_value;

You'll need to find nan_value (an array of 4 char here) by reading one out of your file; I don't know what the value is, and its byte order of course will be different depending on the endianness of the file. The bad_value can be any number you like -- pick a number the bozos should have written into the file in the first place.

Mac Gist doesn't "see" my CGM files!

MacGist has a dual file filter- it will "see" files that are of type CGMF or whose filename is "xxx.cgm". It won't see files with two dots, like "a.b.cgm". MacYorick created CGM files are OK. Problems can arise when you bring other CGM files down to a Mac to create .PICT files for page-layout programs etc..

How can I read/write FORTRAN sequential/unformatted/record files with Yorick?

David Syer has contributed a package that does this. It is seqio.i in ftp-icf.llnl.gov/pub/Yorick/contrib Because of the lack of a firm standard for these files, it possibly might not work for some compilers/machines. See the earlier discussion

Last revised on 9/30/98

LLNL Disclaimers