Import settings

Next: Minsky-specific commands Up: Importing data into a Previous: Temporal data Contents

Import settings

The final tab allows a few more options. The first line allows setting the name, short and long description of the parameter the data is being imported into. These could also be changed after the data is imported, so is largely a convenience option.

The second line has options for the import process.

Don't fail on invalid data

If selected, Ravel will simply ignore any data it cannot understand — eg malformed dates etc

Missing value

— the default is ``NaN'', which stands for not a number, a special IEEE floating point value that Ravel understands to be missing data. You can choose to override it, if that suits your purpose — eg substituting 0 for missing data in the file.

Duplicate keys

In a hypercube, data is indexed by a list of indices, collectively known as a key. The indices may be strings, integers or date/time values. If more than one value exists in the CSV file for a given key, Ravel throws a ``Duplicate key'' exception. This exception gives you the option of writing a report, which is basically a sorted version of the original CSV file, with the errors listed at the beginning. You can open this report in a spreadsheet to see if data needs to be corrected or removed.

Duplicates will arise if you load data from a transactional database which stores the exact time of every transaction. Here the correct choice on ``Duplicate Key Action'' is to sum the entries so that all sales for a particular day (or month, quarter, or year) are aggregated for analysis of trends in the data.

Counter mode

As mentioned previously, Ravel is designed to work with numerical data. However, it is possible to work with purely symbolic data by using counter mode. This mode counts the number of times a key is present in the data file — either zero or one, or in the case of duplicate keys, a value greater than one. You can feed this into a Ravel and perform rollups to analyse the symbols statistically, for example as histograms.

To select counter mode, select the ``counter'' check box in the input dialog form. If no data columns are present at all, Ravel automatically selects counter mode.

Even in the current settings, you may still get a message ``exhausted memory — try reducing the rank'', or a similar message about hitting a 20% of physical memory threshold. In some cases, ``titles'' and ``addresses'' might be pretty much unique for each record, leading to a large, but very sparse hypercube. If you remove those columns, then you may encounter the ``Duplicate key'' message. In this case, you can aggregate over these records—which is desirable anyway to analyse trends in aggregate data. You do this by setting ``Duplicate Key Action'' to sum (or perhaps average for this example).

Next: Minsky-specific commands Up: Importing data into a Previous: Temporal data Contents