CSI Millennium (tm) (CSIM) format with CSI Y2K extensions 11/17/1998 This document describes the abandoned CompuTrac data format, which until recently was actively used by Equis' MetaStock(R) charting software. CSI has decided to rename the format because the y2k extensions made it unique to CSI's proprietary use. CSI will continue updating to this format, with backward compatible extensions to allow for update past January 1,2000 and through the end of the 21st century. This format description and the CSIM format itself are the proprietary and copyrighted property of Commodity Systems, Inc., All Rights Reserved. To accurately access the data files within a given directory, the programmer must read that same directory's master file list, which uniquely identifies the specific market data files (time series) stored in that directory. This master file list is named MASTER, and is comprised of up to 256 records, with each record being 53 bytes in length. The fields are formatted as follows: MASTER FILE RECORD LAYOUT (MASTER) Record 1: DESCRIPTION Position Length Format Number of Entries 1-2 2 CVI Last Entry Used 3-4 2 CVI UNUSED 5-53 49 The "Last Entry Used" field is accessed in order to assign the next file number to a .DAT/.DOP file combination. At file creation, this field is initialized to zero, which indicates the first file to create will be F1.DAT. This field can be ignored for programs that only need to read the data files. Special NOTE: Even though the "Number of Entries" field is two bytes in length, the stored file number is only one byte; therefore the maximum file number cannot exceed 255. If the last entry used has the value 255, and the number of entries is less than 255, then you must scan the master file list for an unused number. The pseudocode is shown below: FileNumbers() - Array of integers holding the file numbers of the master file list NumberOfEntries - number of master file list entries If NumberOfEntries=255 then FileNumber=0 else FileNumber=LastEntryUsed+1 If NumberOfEntries<255 and LastEntryUsed>254 then FileNumber=0 For x=1 to 255 Found = False For y=1 to NumberOfEntries If FileNumbers(y)=x then Found=True Exit for End if Next y If NOT Found then FileNumber=x Exit for End If Next x End If If FileNumber=0 then No space to create in this directory Records 2 through Number of entries+1: DESCRIPTION Position Length Format File Number (1) 1 1 Byte Reserved 2-3 2 Record Length 4 1 Byte Record Length, in bytes, of the data file Number of Fields(2) 5 1 Byte Reserved 6 1 Byte Century Indicator(3) 7 1 Byte Item Name 8-18 11 Character Delivery Month 19-20 2 Character Slash 21 1 Character Delivery Year (last two digits) 22-23 2 Character Reserved 24-25 2 ?? Minimum Date 26-29 4 MBF Maximum Date 30-33 4 MBF File Type (D,W,M) 34 1 Char Reserved 35-36 2 Integer Symbol Area 37-53 17 Character The 17 byte symbol area is further divided as follows for usage by QuickTrieve (all ASCII characters): Description: Position Length Format Type Flag Indicator(4) 1 1 Character For Non Options: Symbol(5) 2-7 6 Character Conversion Factor Code(6) 8 1 Character Third character of symbol if a commodity 9 1 Character Commodity Number if a commodity(7) 10-12 3 Character For Options: Symbol 2-7 6 Character Conversion Factor Code(6) 8 1 Character Delivery Month Code(8) 9 1 Character Delivery Year(last two digits) 10-11 2 Character Strike Price (modulo 1000) 12-14 3 Character Unlike QuickTrieve, which uses commodity numbers for identification, CSI's Unfair Advantage system exclusively uses the 17 character symbol area to uniquely identify both stocks, futures and options. NOTES: (1) The File Number represents the physical file number on disk for the corresponding data file. For example, if the byte is a 5, then this record corresponds to data file F5.DAT and its companion file F5.DOP. See the section entitled DESCRIPTOR FILE LAYOUT for a discussion on how the DOP files relate to the DAT files. (2) Number of data fields in the data file. This will always be the record length divided by 4, since all data fields are 4 byte single precision floating point numbers (3) The century indicator byte is used to signify the century of the delivery year for Commodities and stock options. The following values may be found in this byte: 18:Delivery century is 1800's 19:Delivery century is 1900's 20:Delivery century is 2000's 21:Delivery century is 2100's Any other value is considered invalid and the delivery year will be assumed to fall within the 1921-2020 year period. If the delivery year is greater than 20 the century is assumed to be 1900's, and if the delivery year is less than or equal 20, the century is assumed to be 2000's. Examples: delivery year of 15=2015, delivery year of 21=1921. (4) Type Flag: @=Non-option stock or commodity, 1=Commodity Option, 2=Stock Option. If there is no number at position 10-12 (or if the number is zero), the item is a stock, otherwise it is a commodity. (5) For Stocks the symbol field is the CSI symbol. For commodities, the symbol field is the first two characters of the CSI symbol (the third character of the CSI commodity symbol is stored at position 9), followed by the two digit delivery month followed by the last two characters of the delivery year. The delivery month and year were placed here as well as at position 19-23 because some software displays an error if identical symbols are found (which would be the case if the dm/dy were not included in the symbol field for multiple delivery months of the same commodity), yet did not display the symbol field when presenting a list of names to the user. (6) Conversion Factor codes: -4=Q -3=P -2=O -1=N 0=K 1=J 2=I 3=H 4=G 5=F (7) Should the CSI commodity inventory ever exceed 999, please consult the CSI website for updated information (8) Delivery Month Code for Options: A-L = Delivery month 1-12 for CALLS M-X=Delivery Month 1-12 for PUTS (9) Users of this format should regularly consult the CSI website and this document for changes and announcements concerning the CSIM format. ************ End of CSIM format master file definition *********** DATA FILE RECORD LAYOUT Data is formatted on disk in a variable length record with all information in binary format. The filename is determined by the File Number field of the master file entry, e.g. if the file number field contains a binary five, the physical data file name on disk is F5.DAT and the descriptor file is F5.DOP. The record length is set by the Record Length field of the master file entry. NOTE FOR METASTOCKr compatibility: Metastockr restricts the flexibility inherent in the format by forcing a special case data file of length 28 bytes (7 fields). If you want to create files readable by older (pre version 6.5) MetaStockr software, you must force this exception as well. The structure provides for one header record and many data records as follows: Header Record (record 1 of the data file): Description Position Length Format Reserved 1-2 2 Integer Always set to binary 0 Last Posted Record 3-4 2 Integer Data Records (Records 2-Last posted record) The data records are variable according to the descriptor file described below. The only two constants are 1) The first field is date, and 2) each field is a 4 byte single precision float in Microsoft Binary Format. DESCRIPTOR FILE LAYOUT The descriptor (.DOP) is a sequential (carriage return/linefeed delimited) file holding the names of all data fields present for a particular data file. The number of records in this file is determined by the Number of Fields entry of the master file record. Each record of the sequential file is of the format: "FieldName",InputConversionFactor,DisplayConversionFactor An example descriptor file is shown below: "DATE",0,0 "OPEN",-3,-3 "HIGH",-3,-3 "LOW",-3,-3 "CLOSE",-3,-3 "VOL",0,0 "OI",0,0 The above example is typical of most data files. The DATE,VOL and OI price fields always have a conversion factor of 0, while the OPEN,HIGH,LOW and CLOSE price fields have the conversion factor of the commodity represented. IMPORTANT NOTES ABOUT CONVERSION FACTORS: 1) When reading CSIM files you generally do not have to worry about the input conversion factor. This is because the stored numbers are all in adjusted decimal format and ready for internal calculation. This is different from the CSI QuickTrieve format, which stores all values as whole numbers and conversion to decimal must be performed before doing arithmetic calculations. The display conversion factor is used to display the scale on the chart for viewing by the end user. 2) The original CompuTrac system assumes that negative conversion factors for raw market information are different from CSI's system of conversion factors used in QuickTrieve and Unfair Advantage applications. Specifically, a conversion factor of -1 for the CompuTracr format means halves, and a conversion factor of -2 means quarters, for which the QuickTrieve format has no equivalent. The CompuTracr conversion factor of -3 means eighths, which is equivalent to a QuickTrieve conversion factor of -1. To summarize: -1=halves, -2=quarters, -3=eighths, -4=sixteenths, -5=thirty-seconds, -6=sixty-fourths. NOTE 1: MBF stands for Microsoft Binary Format. It is a method of storing binary numbers that has subsequently been replaced by the IEEE standard format for most computer languages. Most compilers have some type of conversion function that will convert from MBF to IEEE and back. If not, ask your CSI marketing representative for our functions available for C, Delphi and Turbo Pascal applications that will perform this numeric conversion. NOTE 2: The first physical date and last physical date fields stored in the master file, as well as the date field in each data record, are stored in the following manner: Dates in the 1900's are stored as they always have been, without the leading century. Dates after December 31,1999 are stored with a leading one to make a seven digit number. Examples:January 1,2000=1000101, February 20,2004=1040220. To get the true date with century included, take the number in the date field and add 19000000. Be sure to use a 4 byte integer or a double precision real to store this result. Single precision reals do not accurately store numbers this large. ************ End of CSIM format data file definition *************** Unfair Advantage, QuickTrieve, QuickManager, QuickPlot and QuickStudy are registered trademarks of Commodity Systems, Inc. Boca Raton, FL USA