Batch tricks: Arrays

Arrays are frequently used in programming and nearly all programming languages have something that can be used as such, nevermind what it's called. Arrays, tables, lists, etc. are all just a means of organizing large amounts of similar data.

In batch files we have to do a little trickery to achieve something one might call arrays. There are at least two possibilities that spring to my mind instantly:

Numerous variables

The easiest and most flexible solution I've come up with so far is simply to create a variable for each entry, all of them share a common prefix (which may be empty). So, to create an array with, say, 100 members, this can be done quickly with

for /l %%i in (1, 1, 100) do set ARRAY%i=0

That way we have an array, called ARRAY (the prefix, I usually use it as a name) with 100 variables in it, each initialized to 0.
Array access is simple in just accessing the appropriate variable, it has, however, a few issues:

  • Accessing the value of an array element that is known at write-time works as intended:

    echo %ARRAY2%
  • Accessing an arbitrary value requires delayed expansion:

    echo !ARRAY%NUM%!

    for /l %%i in (3, 1, 5) do echo !ARRAY%%i!

    Using %ARRAY%NUM%% won't work since cmd's parser gets confused with the %. And % variables will be expanded when the line is read, not when it's executed.
  • Accessig array values for calculations works without delayed expansion (since it doesn't require the % around variable names):

    set /a SUM=ARRAY%X%+ARRAY%Y%

If you insist, you can also use a notation familiar with other programming languages, [ and ] are perfectly legal as characters in environment variables (if I remember correctly the only illegal characters are = and the null byte).

One long string

If you know which characters your array variables will contain you can also use a long string with a separator character:

set ARRAY=1,2,3,4,5,6,7,8,9,10

This requires you to use for /f whenever you need to access a value within the array, write-access to values is very inconvenient (essentially you have to either search for the right indices within the string and do substring magic or you rebuild the entire array each time you change or delete a value.

To sum it up: I always used the first variant, since it's pretty easy and depending on your application deleting values (and tedious copying) might not be necessary or even wanted (my Sieve of Eratosthenes simply kept only the values it was interested in and used if defined to check for them.