Sunday, December 11, 2011

File Checking - Perl to the rescue

One of my coworkers posted about an issue with one of the data files we receive from outside world. We have a batch program that processes these files and posts transactions into our system. The issue was that a file had one record that an extra byte and any amount of checks didn't reveal the "bad" row. After different checks, he decided to download to PC, take it into TextPad and turned on visual spaces to find which row had that extra space.  Very tedious, but it works.

As you may know, Perl is a very good scripting tool for such purposes. I've created a sample perl script that does basic checks from file size to record size. If it finds any discrepancy, it prints the line #s and records. (See sample output below). For those interested, I'm attaching a copy (text version of the script) here for reference.

(If you are uploading to unix, you need to chmod +x to make it executable).

Please let me know, if you want more information. Feel free to change the script as needed (but please send me a copy, so I can keep mine updated).

Sample Usage:

/tmp/chkfile.pl FIN_09022011_131353.txt 259

1st parameter is the file name and the second parm is expected record size.

Sample Output:
$/home/svaradar/dev/perl
$ /tmp/chkfile.pl FIN_999905_BILLPAY_09022011_131353.txt 259
Name of the file              : FIN_09022011_131353.txt
File Size                     : 1554 bytes
Record Size expected          : 259
Total # of lines in file      : 6
File appears to be a DOS file. (contains carriage returns)
All rows match!

After creating a bad record (I just "fixed" one of the record to change it to 260 chars):
$ /tmp/chkfile.pl FIN_09022011_131353.txt 259
Name of the file              : FIN_09022011_131353.txt
File Size                     : 1555 bytes
Record Size expected          : 259
Total # of lines in file      : 6
File appears to be a DOS file. (contains carriage returns)

Following rows were unmatched:
+4 -  Size: 260 - << P0001CHK0000000000000374.4600099999990001400001111              011000015                                                                                     Sample Record                                                                                     <CR>>>


<CR>- The script translates CTRL-M to printable <CR>; otherwise it would have inserted just a blank line in the output!



Below file contains the perl script in PDF file format:

chkfile.pl

No comments :

Post a Comment

I will be happy to hear your comments or suggestions about this post or this site in general.