Diff tool with an html front endDownload tool - 20 Kb A simple yet useful diff tool with html frontend
This article provides a simple C++ WIN32 tool to perform diffs on arbitrary files. It also features a nice and workable html output.
1. Diff toolsSo you tell me why the hell should I need a diff tool while I already have windiff in the devstudio package ? windiff is great when you have nothing else to do your work, but heck this tool leaves a lot to desire, especially the unproductive way of how the diffs are presented. After all, if the diffs are so badly presented that you spend time just to figure them out, why shouldn't windiff be upgraded to something better. That said, if you are interested in merging as well as diffing, you can buy a third party tool such as Araxis[^], or a free tool such as Winmerge[^]. If you are focused on Xml content, MS lets you play with the Xml diff patch[^], a C#-based diff tool specialized with Xml. I was urged to produce this tool since I have to do with configuration files that change over time, and I wanted not only something to show me the diffs over time, I wanted it to be nicely integrated in the automation chain. This requirement in fact excluded third parties, because none were providing both the API I needed and the appropriate rendering format. I also loved the idea that diff algorithms and associated techniques were something new to me. So I took the keyboard and wrote that simple tool. The engine itself took me a couple hours. It means that a diff tool can't be that hard to build. Ok, what do we have then :
2. Using itIt's important to note that, although the picture above shows a diff between Xml files, this tool can be used for ALL possible text file formats you might think of. It's an agnostic diff tool. 2.1 interactive modeSimply double-click on the executable, then choose two files to compare in the multi-selection File dialog. The Html rendering is automatically displayed by your default browser as soon as the diff engine has finished the job. 2.2 command line modeIn batch mode, the syntax is : For those of you expecting to pipe the output somewhere else, I have provided another project file, 2.3 using optionsBy default, the diff tool is case sensitive and also watches indent. Depending on the need, it's of interest to disable either. In interactive mode, just uncheck the boxes. In command line mode, add-c or -i in the command line. In the code itself, you can play around the CFileOptions class which is instantiated at the top-level, and whose behavior is watched by all rows while doing the diff.
3. Compiling itAlthough the main Tested on 9X/2K. Both VC++6 and VC++7 workspaces are provided.
4. Developing it4.1 the diff engineI have always thought that producing a diff was a difficult engineering problem. I was wrong. Against all odds, the design I had on first thought perfectly worked through time. Basically, what I have is a structure which, for each line of both source files, attaches a signature and a status. The signature is a precalculated token that lets me compare strings from the two source files very fast, without actually going through // preprocessing the file, build precalculated tables BOOL CFilePartition::PreProcess(/*in*/CString &szFilename, /*in*/CFileOptions &options) { ASSERT( !szFilename.IsEmpty() ); if (szFilename.IsEmpty()) { OutputDebugString("error : empty input filename\r\n"); return FALSE; } SetName(szFilename); SetOptions(options); // read the file first, // and build the table of tokens CStdioFile f; if ( !f.Open(szFilename, CFile::modeRead) ) { TCHAR szError[MAX_PATH]; sprintf(szError, "error : cannot open %s\r\n", szFilename.GetBuffer(0) ); OutputDebugString(szError); return FALSE; } // CString s; while ( f.ReadString(s) ) // (reads both Unix and Windows files) AddString(s); f.Close(); return TRUE; } // store it void CFilePartition::AddString(/*in*/CString &s, /*in*/long i) { CFileLine *p = new CFileLine(); ASSERT(p); if (p) { m_arrTokens.Add( p->SetLine(s, m_options) ); m_arrLines.Add( p ); } } // shows how the token is calculated long CFileLine::SetLine(/*in*/CString &s, /*in*/CFileOptions &o) { m_s = s; CString so = GetLineWithOptions(s,o); // filters the input line // according to options (case, indent, ...) long nToken = 0; long nLength = so.GetLength(); TCHAR *lpString = so.GetBuffer(0); for (long i=0; i<nLength; i++) nToken += 2*Token + *(lpString++); // (George V. Reilly hint) return nToken; } The status is an enum which is the result of what was found out of the two source files : I want to know what was changed, what was added, and what was deleted.
Once tokens are all ready, what I do is go through all content lines of the first source file, namely by the way the reference file. All lines are matched against the other source file's content. Anytime a line is matched, it is straight forward to know whether the dual line in the other source file is at the same "height" or not. And if it's below, it's because a block has been added. Hence one of the things I am interesting in : the // performs a diff between the reference file (f1) and the other file (f2) // CFilePartition instances are actually virtual file objects // results : two new virtual file objects with a status for each content line // BOOL CDiffEngine::Diff( /*in*/CFilePartition &f1, /*in*/CFilePartition &f2, /*out*/CFilePartition &f1_bis, /*out*/CFilePartition &f2_bis) { long nbf1Lines = f1.GetNBLines(); long i = 0; long nf2CurrentLine = 0; while ( i<nbf1Lines ) { // process this line long nLinef2 = nf2CurrentLine; if ( f1.MatchLine(i,f2,nLinef2) ) { // matched, either the lines were identical, or f2 has added something if (nLinef2 > nf2CurrentLine) { // add blank lines to f1_bis long j = nLinef2 - nf2CurrentLine; while ( j>0 ) { f1_bis.AddBlankLine(); f2_bis.AddString( f2.GetRawLine(nLinef2-j), Added ); j--; } } // exactly matched f1_bis.AddString( f1.GetRawLine(i), Normal); f2_bis.AddString( f2.GetRawLine(nLinef2), Normal); nf2CurrentLine = nLinef2 + 1; // next line in f2 } else { ... // checking out "change" or "deletion" } i++; // next line in f1 } return TRUE; } Then funny things begin to happen. Matching the other source file against the reference file gives only the first half of the cake. Since both files play a dual role, it is worth to take advantage of relations built out of the other file being now the reference file. Especially when the resulting algorithm cross references alternatively, much like in a DNA shape (or whatever you might think of at the moment). That's how the ... dots above get their implementation : long nLinef1 = i; if ( f2.MatchLine(nLinef2, f1, nLinef1) ) { // the dual line in f2 can be found in f1, that's because // the current line in f1 has been deleted f1_bis.AddString( f1.GetLine(i), Deleted); f2_bis.AddBlankLine(); // this whole block is flagged as deleted if (nLinef1>i+1) { long j = nLinef1 - (i+1); while ( j>0 ) { i++; f1_bis.AddString( f1.GetRawLine(i), Deleted); f2_bis.AddBlankLine(); j--; } } // note : nf2CurrentLine is not incremented } else { // neither added, nor deleted, so it's flagged as changed f1_bis.AddString( f1.GetRawLine(i), Changed); f2_bis.AddString( f2.GetRawLine(nLinef2), Changed); nf2CurrentLine = nLinef2 + 1; // next line in f2 } Please note that within the process we are adding blank lines in either the reference or the other file anytime a line is flagged as That's pretty much all about it. This code is below the 500-line threshold! Be sure to note that algorithms presented here may have flaws, or may be uselessly lengthy. Especially if you happen to have been working on such algorithms for a while. That's a 1.0 release. Please feel free to contribute.
4.2 the html rendererI wanted something nice to show, fast to produce, and easy to work with. This simple renderer is simply the result of these requirements. Being nice means that I wanted windiff to be purged out of my mind for the rest of my life. I have had enough of that horizontal view with overlapped files especially when, adding to the frustration, it's obvious that horizontal views are against intuition when it comes to comparing files. Having a vertical non overlapped view was numero uno requirement, and was easy to come up with by using html cell table tags. Next to it, I wanted it to be produced fast. There is actually not much to say about it. The output of the diff engine is two virtual file instances where the status is known for each line of content of actual files. To produce the diff, I only have to choose colors for a given status and use CSS html styles to to override the row formatting. Using styles exemplifies a de facto factorization. Think about it next time you create ASP code! In addition, I didn't want to miss the opportunity to let the report be customized. Here is a simple API : // adds a header and a footer to the resulting html report void SetTitles(CString &szHeader, CString &szFooter); // defines sequentially : // - the color of the source text (of the form #FF4444) // - the color of the background // - the color of lines that have changed // - the color of lines that have been added // - the color of lines that have been deleted void SetColorStyles(CString &szText, CString &szBackground, CString &szChanged, CString &szAdded, CString &szDeleted); Finally, being easy to work with was a result of the blank lines added to dual files when lines are flagged as added or deleted. Doing so, we ensure that code blocks perfectly match after small or big changes. The resulting diff is easy to browse. CString CDiffEngine::Serialize(/*in*/CFilePartition &f1, /*in*/CFilePartition &f2) { // html header CString s = "<!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.0 Transitional//EN'>\r\n" \ "<!-- diff html gen, (c) Stephane Rodriguez - feb 2003 -->\r\n" \ "<HTML>\r\n" \ "<HEAD>\r\n" \ "<TITLE> File Diff </TITLE>\r\n" \ "<style type='text/css'>\r\n"\ "<!--\r\n" \ ".N { background-color:white; }\r\n" \ ".C { background-color:" + m_szColorChanged + "; }\r\n" \ ".A { background-color:" + m_szColorAdded + "; }\r\n" \ ".D { background-color:" + m_szColorDeleted + "; }\r\n" \ "-->\r\n" \ "</style>\r\n" \ "</HEAD>\r\n" \ "\r\n" \ "<BODY BGCOLOR='#FFFFFF'>\r\n" \ "\r\n" + m_szHeader + \ "<table border=0 bgcolor=0 cellpadding=1 cellspacing=1 width=100%>" \ "<tr><td>\r\n" \ "<table width=100% bgcolor=white border=0 cellpadding=0 cellspacing=0>" \ "\r\n<tr bgColor='#EEEEEE' style='color:0'><td width=50%>" \ "old version</td><td width=50%>new version" \ " (<b style='background-color:" + m_szColorChanged + \ ";width:20'> </b>changed " \ "<b style='background-color:" + m_szColorAdded + ";width:20'> </b>" \ "added <b style='background-color:" + \ "m_szColorDeleted + ";width:20'> </b>deleted) " \ "<FORM ACTION='' style='display:inline'><SELECT id='fontoptions' " \ "onchange='maintable.style.fontSize=this.options[this.selectedIndex].value'>" \ "<option value='6pt'>6pt<option value='7pt'>7pt<option value='8pt'>8pt" \ "<option value='9pt' selected>9pt</SELECT>" \ "</FORM></td></tr>\r\n" \ "<tr bgColor='#EEEEEE' style='color:0'><td width=50%><code>" \ + f1.GetName() + "</code></td><td width=50%><code>" + \ f2.GetName() + "</code></td></tr>" \ "</table>\r\n" \ "</td></tr>\r\n" \ "</table>\r\n" \ "\r\n" \ "<br>\r\n" \ "\r\n" ; long nbLines = f1.GetNBLines(); if (nbLines==0) { s += "<br>empty files"; } else { s += "<table border=0 bgcolor=0 cellpadding=1 cellspacing=1 width=100%><tr><td>" \ "<table id='maintable' width=100% bgcolor='" + m_szColorBackground + \ "' border=0 style='color:" + m_szColorText + \ ";font-family: Arial, Helvetica, sans-serif; font-size: 9pt'>\r\n"; } char *arrStatus[4] = { "", " class='C'", " class='A'", " class='D'" }; CString sc; // write content // for (long i=0; i<nbLines; i++) { sc += "<tr><td width=50%" + CString(arrStatus[ f1.GetStatusLine(i) ]) + ">" + Escape(f1.GetRawLine(i)) + "</td>"; sc += "<td width=50%" + CString(arrStatus[ f2.GetStatusLine(i) ]) + ">" + Escape(f2.GetRawLine(i)) + "</td></tr>"; } // for i s += sc; if (nbLines>0) s += "</table>" \ "</td></tr></table>\r\n"; // write html footer s += m_szFooter + "</BODY>\r\n" \ "</HTML>\r\n"; return s; } // a helper aimed to make sure tag symbols are passed as content CString CDiffEngine::Escape(CString &s) { CString o; long nSize = s.GetLength(); if (nSize==0) return CString(" "); TCHAR c; BOOL bIndentation = TRUE; for (long i=0; i<nSize; i++) { c = s.GetAt(i); if (bIndentation && (c==' ' || c=='\t')) { if (c==' ') o += " "; else o += " "; continue; } bIndentation = FALSE; if (c=='<') o += "<"; else if (c=='>') o += ">"; else if (c=='&') o += "&"; else o += c; } return o; }
4.3 wrap upFinally, here is how to use the API :CString szFile1 = "..."; CString szFile2 = "..."; CString szOutfile = "..."; //.html file CFileOptions o; if (!bCaseOption) o.SetOption( CString("case"), CString("no") ); if (!bIndentOption) o.SetOption( CString("indent"), CString("no") ); CFilePartition f1; f1.PreProcess( szFile1, o ); // precalculate tokens CFilePartition f2; f2.PreProcess( szFile2, o ); // precalculate tokens CFilePartition f1_bis, f2_bis; CDiffEngine d; d.Diff(f1,f2,f1_bis,f2_bis); // actual diff d.ExportAsHtml(szOutfile, d.Serialize(f1_bis, f2_bis)); // wrap up
5. Update history16 Feb - initial release23 Feb - added :
Stéphane Rodriguez - Aug 2005.
|
Home Blog |