Friday, December 09, 2005

.NET deep clone - IsDirty check using IsClone<T>

Those of you that read my colleague Anders Norås' blog may have tried the deep clone using serialization method (see also article at MSDN Mag). This is really useful e.g. when implementing undo, transactions, and other mechanisms that need to make copies of objects.

[UPDATE] Refer to this post for a faster IsDirty check and more reliable IsClone method: IsClone using custom formatter.

Another useful appliance of the clone method is for implementing an IsDirty property in your business entity objects or in other areas of your application. I have implemented an IsClone method that checks whether two objects are identical or not:

public static bool IsClone<T>(T sourceA, T sourceB)
{
IFormatter formatter = new BinaryFormatter();
Stream streamA = new MemoryStream();
Stream streamB = new MemoryStream();

formatter.Serialize(streamA, sourceA);
formatter.Serialize(streamB, sourceB);

if(streamA.Length != streamB.Length) return false;

streamA.Seek(0, SeekOrigin.Begin);
streamB.Seek(0, SeekOrigin.Begin);

byte[] hashA = new System.Security.Cryptography.MD5CryptoServiceProvider().ComputeHash(streamA);
byte[] hashB = new System.Security.Cryptography.MD5CryptoServiceProvider().ComputeHash(streamB);

for (int i = 0; i < 16; i++)
{
if (hashA[i] != hashB[i]) return false;
}

//if here, objects have same hash = are equal
return true;
}


The "source" objects must of course be deep serializable. The method uses hashing to achieve good performance. The compare using MD5 is based on code by Corrado Cavalli for fast comparison of byte arrays.

2 comments:

Kjell-Sverre Jerijærvi said...

Note that there is a small quirk wrt data type 'decimal' as the binary representation of e.g. 1234567.89 might vary. Use the .GetBits method to see the actual bytes.

WinForms data-binding of decimal is subject to modifying the bytes (employs TryParse?). This will cause the .IsClone method to think that an unchanged object is logically modified.

Anonymous said...

It could be better to just scan the stream and compare bytes. The md5 would do that anyhow, but will add to it a lot of computation. So just byte by byte comparison is probably faster.

MemoryStream ms1 = new MemoryStream();
MemoryStream ms2 = new MemoryStream();

using (StreamWriter sw1 = new StreamWriter(ms1))
{
sw1.Write("The quick brown fox jumps over lazy dog");
}
using (StreamWriter sw1 = new StreamWriter(ms2))
{
sw1.Write("The quick brown fox jumps over lazy dog");
}

// Console.WriteLine("Equals " + ms1.Equals(ms2));
byte [] ms1Bytes = ms1.GetBuffer();
byte [] ms2Bytes = ms2.GetBuffer();

bool equals = ( ms1Bytes.Length == ms2Bytes.Length );
for (int i = 0, length = ms1Bytes.Length; i < length && equals; i++)
equals = (ms1Bytes[i] == ms2Bytes[i]);

Console.WriteLine("scan bytes " + equals);


//Nuri