Friday, October 5, 2007

Defining crucial changes for text files

Sometimes I wish to be able to define what I think is a crucial change for a text file, instead of just every diff from one version to the next which we have plenty of tools for.

I need more than just an average diff util to see what has actually changed from one report output to the next, in order to avoid false positive line matches.

Some of the problems with standard diff util is that it can not handle these cases:

  • The order of rows has changed, but the content has not changed. (Moved lines detection in file compare)
  • The offset of columns has changed, but the content has not changed.
  • Whitespace, tabs and or spaces could be ignored.
  • Data in a line has changed, but is ok to ignore, such as date changes.
  • More or less data in a certain section has changed, but can be ignored.
  • Tags order changes, but the content within a tag does not. Eg. HTML tags.

Some examples of when a more advanced diff util could come in handy is:

  • Nessus .nsr scan result files, looking for interesting changes.
  • WYSIWYG HTML editors saves tags in another way that when file was loaded, even if there was no changes.

One approach is to make a configuration file for the diff util so you can use it in as many places as possible. Is this referred to as custom file filters.

Commandline is required for scripted compare.

Unix version is almost a must. Because often it is output from unix boxes that will be compared!

All this, instead of writing a custom parser for diff everytime a new usage comes up :-)

No comments: