In malware research, threat hunting and security intelligence exchanging, hashes, such as MD5 or SHA256, take a dominant position. Malware researchers search malware on VirusTotal with hashes, exchange security intelligence with IoC (incident of compromise) that include hashes. However, hashes have some characteristics, such as one-to-one relationship between file and its hash, this limit researchers to do files correlation. Of course that isn’t what hashes was made for. Because of that, some other related “enhanced” hashes have been proposed, such as ssdeep, sdhash, TLSH, and imphash, and they help to learn the similarity of binary files.
All of them is calculated from binary point of view, and there are the other methodologies to learn executable files similarity which are from graph point of view. For example, Zynamics bindiff takes a bigger picture of view of executable to learn the similarity/difference of two executable files. It give researchers very detail information about what similarity in which parts of two executable files, however, it could process two files in the same time.
This research, graph hash, tries to combine the advantages of these two types of methodologies, to calculate the hash of executable files from graph view, and it helps to classify malware with consistent and efficient way.