Hello,
TT wrote : No, you did good, the only thing is that there is a transform between image1 and image2 that depends on camera, so it's a key point to be taken inton account.
@TT. I have no knowledge in the domain considered, however, intuitively, I see a problem: the very small variances in the transformation which will generate weak geometric distortions and prevent exact matching. It would therefore be advisable to associate standardized geographic signatures as keys and not directly the images, so as not to lose precision or degrade the original information. Then there would be the management of contours, overlaps, positioning and scaling.
It therefore seems to me that the processing of duplicates on this type of data cannot or should not ignore the specificity linked to the nature of the data. The precision of the signatures could tend, in particular, towards a "derivative" of the contours.
On the other aspect (data duplicates), I revised my lzle
pseudo-code (just a backbone) using only one list, this code would work on split files and therefore much smaller containing no duplicates between each other . These files can be obtained by distributing the source data between several output files according to a scale and a calculation on the ascii codes.
For each distribution file:
While not eof(smallbigfile)
key=ComputeShortHash(FileLine)
If MyList.HashTag(key)=1 Then
MyList.Check(3)
End If
Wend
MyList.NFRecursive(1)
MyList.NFMethod(-1)
MyList.Root
While MyList.HashStep
If MyList.Check<>3 Then
MyList.NodeFlat
End If
Wend
While not eof(smallbigfile)
If MyList.HasHashTag(ComputeShortHash(FileLine)) Then
key=ComputeFullHash(FileLine)
If MyList.HashTag(key)=1 Then
MyList.Check(4)
End If
End If
Wend
MyList.Root
While MyList.HashStep
If MyList.Check<>4 Then
MyList.NodeFlat
End If
Wend
' to this point list contains only full hash duplicates and some memory slots in garbage collector
' The list shall be then switched to multi value mode, then file is parsed a third time and exact matches for full hash will be stored as values for comparisons..
ps : using some more advanced features not fully tested in that use case..(edited, forgot some stuff)