I have several applications where I store data in Excel spreadsheets, then process them by exporting the data into a a flat text file as quoted, comma separated fields. These were easy to process using compiled QuickBASIC programs, but not so much for COBOL. I searched for a solution on the Internet written in COBOL and found nothing, so I wrote this subprogram to parse the records into discrete fields. Each field may still need additional formatting (right justification, padding with spaces/zeros, etc.), but the majority of the work is done with one call.
As delivered here, the program will process a passed buffer (single record read from a CSV text file) of 5,000 characters and will return up to 25 fields of 200 characters each. This will accommodate most needs, but the subprogram can be expanded fairly easily.
I also include a very simple test program that will just display the fields returned from the subprogram and a small file of test data exported from an Excel spreadsheet of test data.
To install the source for the subroutine and the test program, as well as the sample data, and compile them, execute the bash script: ParseCSVsetup [md5: 6ef9130a3aeb8db7785dbcdec7f6ec18]. Right click and save the script in the location where you want the source programs to reside, then execute it. It will create two files containing the source programs, plus the sample data file, then execute the GnuCOBOL compiler to compile the subprogram and test program. The GnuCOBOL compiler must be installed prior to executing the bash script.
If you execute the test program it will display output to standard output:
jay@Phoenix $ ./testParse Record: 0001 Field +000000001 Length: +0006 Data: Row ID Field +000000002 Length: +0008 Data: Order ID Field +000000003 Length: +0010 Data: Order Date Field +000000004 Length: +0014 Data: Order Priority Field +000000005 Length: +0014 Data: Order Quantity Field +000000006 Length: +0005 Data: Sales Field +000000007 Length: +0008 Data: Discount Field +000000008 Length: +0009 Data: Ship Mode Field +000000009 Length: +0006 Data: Profit Field +000000010 Length: +0010 Data: Unit Price Field +000000011 Length: +0013 Data: Shipping Cost Field +000000012 Length: +0013 Data: Customer Name Field +000000013 Length: +0008 Data: Province Field +000000014 Length: +0006 Data: Region Field +000000015 Length: +0016 Data: Customer Segment Field +000000016 Length: +0016 Data: Product Category Field +000000017 Length: +0020 Data: Product Sub-Category Field +000000018 Length: +0012 Data: Product Name Field +000000019 Length: +0017 Data: Product Container Field +000000020 Length: +0019 Data: Product Base Margin Field +000000021 Length: +0009 Data: Ship Date Record: 0002 Field +000000001 Length: +0001 Data: 1 Field +000000002 Length: +0001 Data: 3 Field +000000003 Length: +0010 Data: 10/13/2010 Field +000000004 Length: +0003 Data: Low Field +000000005 Length: +0001 Data: 6 Field +000000006 Length: +0006 Data: 261.54 Field +000000007 Length: +0004 Data: 0.04 Field +000000008 Length: +0011 Data: Regular Air Field +000000009 Length: +0007 Data: -213.25 Field +000000010 Length: +0005 Data: 38.94 Field +000000011 Length: +0002 Data: 35 Field +000000012 Length: +0018 Data: Muhammed MacIntyre Field +000000013 Length: +0007 Data: Nunavut Field +000000014 Length: +0007 Data: Nunavut Field +000000015 Length: +0014 Data: Small Business Field +000000016 Length: +0015 Data: Office Supplies Field +000000017 Length: +0022 Data: Storage & Organization Field +000000018 Length: +0048 Data: Eldon Base for stackable storage shelf, platinum Field +000000019 Length: +0009 Data: Large Box Field +000000020 Length: +0003 Data: 0.8 Field +000000021 Length: +0010 Data: 10/20/2010 ... continues for contents of 20 records
In order to dynamically call the subroutine from other programs, you should copy the object module (ParseCSV.so) to a location available through the COB_LIBRARY_PATH.
* Updated September 20, 2019 * I was tired of looking for and copying the Working-Storage fields for this, and a few other, routines, so I updated the installation script to provide a copybook that may be used in the calling program for this routine. For easiest use, you should copy the copybook file into the directory that GnuCOBOL searches during compile time for copy books (on my Linux system that is /usr/local/share/gnucobol/copy). At the same time I modified the original code to utilize a variable length field for the input field, where the length is determined at execution time.
* Updated May 31, 2022 * I received feedback from Alfred Cleve, who had done some performance benchmarking of this routine; he found that the INSPECT of the WORK-BUFFER replacing spaces with x'00' was a serious hit on performance, especially when processing a large number of records. I had not done any performance testing, as my uses of the routine are usually for a small number of records, mostly below 10,000. I have improved the performance by initializing the WORK-BUFFER to low-values, adding the parameter INITIAL to the PROGRAM-ID to ensure that each call of the subprogram gets a fresh copy of the local storage, and removed the INSPECT statement, as it is no longer necessary when the buffer is pre-initialized. For comparison, here are the time values for a test execution of the routine using a sample file of 113,906 records:
Original code Modified code real 0m59.197s user 0m58.649s sys 0m0.516s real 0m9.278s user 0m8.762s sys 0m0.512s
This page was last updated on May 31, 2022.