Tuesday, July 27, 2021

Chop, crop and search with a custom cutoff criteria

We have been investigating the presence of putative image duplicates in the paper Sharma et al., 2020 using AI based methods that we evaluated using the established examples from Bik et al., 2016 paper. Despite an exhaustive search of all the supplementary figures, we have not been able to find any exact duplicates in this paper. However, it is possible that parts of the figures are chopped, cropped and pasted in different combinations. This would be similar to cutting out the lanes of a gel image and pasting them together into a new image. As we saw in the previous post, the imagededup package does not perform well when faced with duplication with repositioning (category II).

Today, we try to find a simple solution to this problem by chopping up each image into many small pieces and searching them for presence of putative duplicates. The linux utility "convert" is a very powerful tool with many image manipulation abilities. We use the below code snippet to chop each of the images into five almost equally sized parts with vertical lines.

 for img in `ls -1 *`  
 do  
 echo $img  
 convert $img -crop 5x1@ +repage +adjoin "$img"_%d.png  
 mv "$img"_*.png crop1  
 done  

Each of the images now have 5 parts with filenames that mention the old image id and the part number. For instance, image98.png is cut into five parts named as image98.png_0.png, image98.png_1.png, image98.png_2.png, image98.png_3.png and image98.png_4.png. The for loop in the above code does this chopping for each image and moves the chopped files into the crop1 folder. After the images have been chopped using the crop option in convert utility, we can use the imagededup package to look for putative duplicates.

Approximately 300 images are present in the original dataset obtained from Sharma et al., 2020. After chopping each image into five parts, we have 1500 images to deal with. Manually parsing the output of imagededup for high similarity scores is laborious and best avoided. The code given below detects putative image duplicates among the files located in the image_dir folder and stores the results in the duplicates_cnn dictionary. The first for loop in this case iterates through the keys of this dictionary. The second for loop iterates through the values that are stored for each of these keys. Each value is actually a tuple with the first element being the image file name and the second element being the score. We look for scores greater than the cutoff value defined before the for loop and print the key and keyvalue.

 #Find duplicates using CNN along with scores   
 from imagededup.methods import CNN   
 cnn_encoder = CNN()   
 duplicates_cnn = cnn_encoder.find_duplicates(image_dir=image_dir, scores=True)   
 #arbitrary cutoff score  
 cutoff=0.97  
 for key in duplicates_cnn:  
      for keyvals in duplicates_cnn[key]:  
           if keyvals[1] > cutoff:  
                print(key,keyvals)  

The code above will provide us a shorter version of the output listing only the images that are detected to be putative duplicates with very high scores.Even with a high cutoff score of 0.97, we find more than 200 putative duplicates.

 image100.png_4.png ('image93.png_4.png', 0.9714619)  
 image102.png_0.png ('image103.png_0.png', 0.98250186)  
 image103.png_0.png ('image102.png_0.png', 0.98250186)  
 image11.png_0.png ('image12.png_0.png', 0.9850818)  
 image11.png_0.png ('image14.png_0.png', 0.97351915)  
 image11.png_0.png ('image16.png_0.png', 0.9819039)  
 image11.png_0.png ('image20.png_0.png', 0.9708375)  
 image11.png_0.png ('image7.png_0.png', 0.98577213)  
 image114.png_0.png ('image115.png_0.png', 0.9871844)  
 image114.png_0.png ('image116.png_0.png', 0.977131)  
 image115.png_0.png ('image114.png_0.png', 0.9871844)  
 image115.png_0.png ('image116.png_0.png', 0.982214)  
 image115.png_2.png ('image115.png_3.png', 0.97350514)  
 image115.png_3.png ('image115.png_2.png', 0.97350514)  
 image116.png_0.png ('image114.png_0.png', 0.977131)  
 image116.png_0.png ('image115.png_0.png', 0.982214)  
 image118.png_0.png ('image120.png_0.png', 0.988398)  
 image118.png_0.png ('image121.png_0.png', 0.9891088)  
 image12.png_0.png ('image11.png_0.png', 0.9850818)  
 image12.png_0.png ('image14.png_0.png', 0.97201216)  
 image12.png_0.png ('image16.png_0.png', 0.98974717)  
 image12.png_0.png ('image20.png_0.png', 0.9705702)  
 image12.png_0.png ('image7.png_0.png', 0.9938463)  
 image120.png_0.png ('image118.png_0.png', 0.988398)  
 image120.png_0.png ('image121.png_0.png', 0.9912125)  
 image121.png_0.png ('image118.png_0.png', 0.9891088)  
 image121.png_0.png ('image120.png_0.png', 0.9912125)  
 image122.png_0.png ('image123.png_0.png', 0.98453903)  
 image122.png_0.png ('image124.png_0.png', 0.99413013)  
 image123.png_0.png ('image122.png_0.png', 0.98453903)  
 image123.png_0.png ('image124.png_0.png', 0.98252416)  
 image124.png_0.png ('image122.png_0.png', 0.99413013)  
 image124.png_0.png ('image123.png_0.png', 0.98252416)  
 image124.png_2.png ('image124.png_3.png', 0.9753659)  
 image124.png_3.png ('image124.png_2.png', 0.9753659)  
 image13.png_3.png ('image13.png_4.png', 0.99999994)  
 image13.png_4.png ('image13.png_3.png', 0.99999994)  
 image138.png_0.png ('image139.png_0.png', 0.9742407)  
 image139.png_0.png ('image138.png_0.png', 0.9742407)  
 image139.png_0.png ('image141.png_0.png', 0.9892728)  
 image139.png_0.png ('image174.png_0.png', 0.97267145)  
 image14.png_0.png ('image11.png_0.png', 0.97351915)  
 image14.png_0.png ('image12.png_0.png', 0.97201216)  
 image14.png_0.png ('image16.png_0.png', 0.98304456)  
 image14.png_0.png ('image20.png_0.png', 0.9994699)  
 image14.png_0.png ('image7.png_0.png', 0.9742934)  
 image141.png_0.png ('image139.png_0.png', 0.9892728)  
 image141.png_0.png ('image174.png_0.png', 0.9725107)  
 image141.png_1.png ('image141.png_2.png', 0.97022605)  
 image141.png_2.png ('image141.png_1.png', 0.97022605)  
 image142.png_0.png ('image144.png_0.png', 0.986072)  
 image143.png_0.png ('image145.png_0.png', 0.9743204)  
 image144.png_0.png ('image142.png_0.png', 0.986072)  
 image145.png_0.png ('image143.png_0.png', 0.9743204)  
 image15.png_4.png ('image18.png_4.png', 0.9827007)  
 image151.png_0.png ('image154.png_0.png', 0.97797024)  
 image152.png_0.png ('image153.png_0.png', 0.9902647)  
 image153.png_0.png ('image152.png_0.png', 0.9902647)  
 image153.png_0.png ('image154.png_0.png', 0.97013867)  
 image154.png_0.png ('image151.png_0.png', 0.97797024)  
 image154.png_0.png ('image153.png_0.png', 0.97013867)  
 image156.png_0.png ('image158.png_0.png', 0.97740877)  
 image158.png_0.png ('image156.png_0.png', 0.97740877)  
 image159.png_0.png ('image160.png_0.png', 0.9743622)  
 image159.png_0.png ('image161.png_0.png', 0.9805405)  
 image159.png_1.png ('image159.png_3.png', 0.9810934)  
 image159.png_3.png ('image159.png_1.png', 0.9810934)  
 image16.png_0.png ('image11.png_0.png', 0.9819039)  
 image16.png_0.png ('image12.png_0.png', 0.98974717)  
 image16.png_0.png ('image14.png_0.png', 0.98304456)  
 image16.png_0.png ('image20.png_0.png', 0.98187333)  
 image16.png_0.png ('image7.png_0.png', 0.9959326)  
 image160.png_0.png ('image159.png_0.png', 0.9743622)  
 image161.png_0.png ('image159.png_0.png', 0.9805405)  
 image164.jpg_0.png ('image287.jpg_0.png', 0.9828615)  
 image168.png_0.png ('image169.png_0.png', 0.9929953)  
 image169.png_0.png ('image168.png_0.png', 0.9929953)  
 image174.png_0.png ('image139.png_0.png', 0.97267145)  
 image174.png_0.png ('image141.png_0.png', 0.9725107)  
 image178.png_0.png ('image179.png_0.png', 0.9757037)  
 image179.png_0.png ('image178.png_0.png', 0.9757037)  
 image18.png_4.png ('image15.png_4.png', 0.9827007)  
 image182.png_0.png ('image183.png_0.png', 0.98172903)  
 image183.png_0.png ('image182.png_0.png', 0.98172903)  
 image188.png_0.png ('image189.png_0.png', 0.98023623)  
 image189.png_0.png ('image188.png_0.png', 0.98023623)  
 image192.png_0.png ('image194.png_0.png', 0.9978157)  
 image194.png_0.png ('image192.png_0.png', 0.9978157)  
 image20.png_0.png ('image11.png_0.png', 0.9708375)  
 image20.png_0.png ('image12.png_0.png', 0.9705702)  
 image20.png_0.png ('image14.png_0.png', 0.9994699)  
 image20.png_0.png ('image16.png_0.png', 0.98187333)  
 image20.png_0.png ('image7.png_0.png', 0.9730121)  
 image202.png_0.png ('image203.png_0.png', 0.9940014)  
 image203.png_0.png ('image202.png_0.png', 0.9940014)  
 image211.png_0.png ('image212.png_0.png', 0.98358434)  
 image212.png_0.png ('image211.png_0.png', 0.98358434)  
 image212.png_0.png ('image214.png_0.png', 0.9713317)  
 image214.png_0.png ('image212.png_0.png', 0.9713317)  
 image215.jpeg_0.png ('image215.jpeg_1.png', 0.9919382)  
 image215.jpeg_1.png ('image215.jpeg_0.png', 0.9919382)  
 image219.png_2.png ('image219.png_3.png', 0.97034454)  
 image219.png_3.png ('image219.png_2.png', 0.97034454)  
 image220.png_0.png ('image221.png_0.png', 0.9760274)  
 image221.png_0.png ('image220.png_0.png', 0.9760274)  
 image221.png_2.png ('image221.png_3.png', 0.97269195)  
 image221.png_3.png ('image221.png_2.png', 0.97269195)  
 image23.png_0.png ('image25.png_0.png', 0.97705656)  
 image23.png_2.png ('image26.png_2.png', 0.97086954)  
 image234.png_0.png ('image235.png_0.png', 0.99267024)  
 image234.png_1.png ('image234.png_2.png', 0.97507715)  
 image234.png_2.png ('image234.png_1.png', 0.97507715)  
 image235.png_0.png ('image234.png_0.png', 0.99267024)  
 image237.png_0.png ('image238.png_0.png', 0.9760754)  
 image238.png_0.png ('image237.png_0.png', 0.9760754)  
 image238.png_0.png ('image239.png_0.png', 0.9754666)  
 image239.png_0.png ('image238.png_0.png', 0.9754666)  
 image242.png_0.png ('image243.png_0.png', 0.9714698)  
 image243.png_0.png ('image242.png_0.png', 0.9714698)  
 image245.png_0.png ('image246.png_0.png', 0.97148263)  
 image245.png_0.png ('image247.png_0.png', 0.97165793)  
 image246.png_0.png ('image245.png_0.png', 0.97148263)  
 image246.png_0.png ('image247.png_0.png', 0.9809675)  
 image247.png_0.png ('image245.png_0.png', 0.97165793)  
 image247.png_0.png ('image246.png_0.png', 0.9809675)  
 image25.png_0.png ('image23.png_0.png', 0.97705656)  
 image257.png_0.png ('image258.png_0.png', 0.98425305)  
 image257.png_0.png ('image259.png_0.png', 0.9732449)  
 image258.png_0.png ('image257.png_0.png', 0.98425305)  
 image258.png_0.png ('image259.png_0.png', 0.98271394)  
 image259.png_0.png ('image257.png_0.png', 0.9732449)  
 image259.png_0.png ('image258.png_0.png', 0.98271394)  
 image26.png_2.png ('image23.png_2.png', 0.97086954)  
 image262.png_0.png ('image263.png_0.png', 0.9708605)  
 image263.png_0.png ('image262.png_0.png', 0.9708605)  
 image265.png_0.png ('image266.png_0.png', 0.985058)  
 image265.png_0.png ('image267.png_0.png', 0.97712684)  
 image266.png_0.png ('image265.png_0.png', 0.985058)  
 image266.png_0.png ('image267.png_0.png', 0.9758965)  
 image267.png_0.png ('image265.png_0.png', 0.97712684)  
 image267.png_0.png ('image266.png_0.png', 0.9758965)  
 image270.png_1.png ('image270.png_2.png', 0.97145426)  
 image270.png_2.png ('image270.png_1.png', 0.97145426)  
 image271.png_1.png ('image271.png_2.png', 0.97522026)  
 image271.png_2.png ('image271.png_1.png', 0.97522026)  
 image273.png_0.png ('image274.png_0.png', 0.9760238)  
 image274.png_0.png ('image273.png_0.png', 0.9760238)  
 image274.png_0.png ('image275.png_0.png', 0.97969353)  
 image274.png_1.png ('image274.png_3.png', 0.9715067)  
 image274.png_3.png ('image274.png_1.png', 0.9715067)  
 image275.png_0.png ('image274.png_0.png', 0.97969353)  
 image277.png_0.png ('image278.png_0.png', 0.9858266)  
 image277.png_0.png ('image279.png_0.png', 0.98131776)  
 image278.png_0.png ('image277.png_0.png', 0.9858266)  
 image278.png_0.png ('image279.png_0.png', 0.9779868)  
 image279.png_0.png ('image277.png_0.png', 0.98131776)  
 image279.png_0.png ('image278.png_0.png', 0.9779868)  
 image287.jpg_0.png ('image164.jpg_0.png', 0.9828615)  
 image63.png_0.png ('image71.png_0.png', 0.99036366)  
 image7.png_0.png ('image11.png_0.png', 0.98577213)  
 image7.png_0.png ('image12.png_0.png', 0.9938463)  
 image7.png_0.png ('image14.png_0.png', 0.9742934)  
 image7.png_0.png ('image16.png_0.png', 0.9959326)  
 image7.png_0.png ('image20.png_0.png', 0.9730121)  
 image71.png_0.png ('image63.png_0.png', 0.99036366)  
 image83.png_0.png ('image86.png_0.png', 0.9774096)  
 image83.png_0.png ('image92.png_0.png', 0.9701129)  
 image83.png_0.png ('image98.png_0.png', 0.9754302)  
 image83.png_2.png ('image86.png_2.png', 0.97005486)  
 image83.png_3.png ('image86.png_3.png', 0.9776021)  
 image83.png_4.png ('image86.png_4.png', 0.97311985)  
 image85.png_0.png ('image87.png_0.png', 0.9724677)  
 image85.png_0.png ('image93.png_0.png', 0.9737506)  
 image85.png_0.png ('image96.png_0.png', 0.9903543)  
 image85.png_0.png ('image99.png_0.png', 0.98590815)  
 image85.png_2.png ('image96.png_2.png', 0.98394686)  
 image85.png_3.png ('image96.png_3.png', 0.9750851)  
 image86.png_0.png ('image83.png_0.png', 0.9774096)  
 image86.png_0.png ('image92.png_0.png', 0.9725857)  
 image86.png_0.png ('image98.png_0.png', 0.97047156)  
 image86.png_2.png ('image83.png_2.png', 0.97005486)  
 image86.png_3.png ('image83.png_3.png', 0.9776021)  
 image86.png_4.png ('image83.png_4.png', 0.97311985)  
 image87.png_0.png ('image85.png_0.png', 0.9724677)  
 image87.png_0.png ('image89.png_0.png', 0.97378576)  
 image87.png_0.png ('image99.png_0.png', 0.9758906)  
 image89.png_0.png ('image87.png_0.png', 0.97378576)  
 image90.png_3.png ('image98.png_3.png', 0.9723841)  
 image92.png_0.png ('image83.png_0.png', 0.9701129)  
 image92.png_0.png ('image86.png_0.png', 0.9725857)  
 image93.png_0.png ('image85.png_0.png', 0.9737506)  
 image93.png_0.png ('image96.png_0.png', 0.9713357)  
 image93.png_4.png ('image100.png_4.png', 0.9714619)  
 image96.png_0.png ('image85.png_0.png', 0.9903543)  
 image96.png_0.png ('image93.png_0.png', 0.9713357)  
 image96.png_0.png ('image99.png_0.png', 0.9820196)  
 image96.png_2.png ('image85.png_2.png', 0.98394686)  
 image96.png_3.png ('image85.png_3.png', 0.9750851)  
 image98.png_0.png ('image83.png_0.png', 0.9754302)  
 image98.png_0.png ('image86.png_0.png', 0.97047156)  
 image98.png_3.png ('image90.png_3.png', 0.9723841)  
 image99.png_0.png ('image85.png_0.png', 0.98590815)  
 image99.png_0.png ('image87.png_0.png', 0.9758906)  
 image99.png_0.png ('image96.png_0.png', 0.9820196)  

The use of a more stringent criteria of 0.99 results in a more managable list:

 image12.png_0.png ('image7.png_0.png', 0.9938463)  
 image120.png_0.png ('image121.png_0.png', 0.9912125)  
 image121.png_0.png ('image120.png_0.png', 0.9912125)  
 image122.png_0.png ('image124.png_0.png', 0.99413013)  
 image124.png_0.png ('image122.png_0.png', 0.99413013)  
 image13.png_3.png ('image13.png_4.png', 0.99999994)  
 image13.png_4.png ('image13.png_3.png', 0.99999994)  
 image14.png_0.png ('image20.png_0.png', 0.9994699)  
 image152.png_0.png ('image153.png_0.png', 0.9902647)  
 image153.png_0.png ('image152.png_0.png', 0.9902647)  
 image16.png_0.png ('image7.png_0.png', 0.9959326)  
 image168.png_0.png ('image169.png_0.png', 0.9929953)  
 image169.png_0.png ('image168.png_0.png', 0.9929953)  
 image192.png_0.png ('image194.png_0.png', 0.9978157)  
 image194.png_0.png ('image192.png_0.png', 0.9978157)  
 image20.png_0.png ('image14.png_0.png', 0.9994699)  
 image202.png_0.png ('image203.png_0.png', 0.9940014)  
 image203.png_0.png ('image202.png_0.png', 0.9940014)  
 image215.jpeg_0.png ('image215.jpeg_1.png', 0.9919382)  
 image215.jpeg_1.png ('image215.jpeg_0.png', 0.9919382)  
 image234.png_0.png ('image235.png_0.png', 0.99267024)  
 image235.png_0.png ('image234.png_0.png', 0.99267024)  
 image63.png_0.png ('image71.png_0.png', 0.99036366)  
 image7.png_0.png ('image12.png_0.png', 0.9938463)  
 image7.png_0.png ('image16.png_0.png', 0.9959326)  
 image71.png_0.png ('image63.png_0.png', 0.99036366)  
 image85.png_0.png ('image96.png_0.png', 0.9903543)  
 image96.png_0.png ('image85.png_0.png', 0.9903543)  

Even this shorter list has 28 hits. Closer manual inspection of these hits highlights the difficulties involved in this chop and search stratergy. The extremely promising hit of image 13 parts 3 and 4 is actually due to these two images being just the canvas. Similarly, the hit seen between part 0 of image 14 and image 20 are due to the presence of the same human karyotype in both these images. However, the original images are distinct enough as they are screenshots showing the synteny relationship of the same human chromosome with very different species.

Self-criticism seems to be trending this month. Many thanks to the brave/crazy Nicholas P. Holmes for sharing his thougts on this. Even seemingly disastorous events such as retracting a paper have achieved glory. Cost of doing science continues to increase. Whether we will see a long-term change and how the reward system will evaluate honest science vs exceptional science is unclear.



Saturday, July 24, 2021

Beautiful, More Beautiful and Most Beautiful. The maxim of Nicolaus Steno

 If you don't know who is Nicolaus Steno, you can look at the below video. 


Now that you know who is Nicolaus Steno, you will appreciate his maxim about the importance of understanding and comprehending. In the previous post, we looked at how the paper by Sharma et al 2020 has many figures in the supplementary that are detected as putative duplicates of each other. While the imagededup package we used has been benchmarked by its authors, we would need to verify its abilities using a set of images that are known to be duplicated for use in Fakery.  Thankfully, the Bik et. al., 2016 paper does a very thorough job of grouping the types of fakery into the following classes:

  1. Category I: simple duplications (https://journals.asm.org/doi/10.1128/mBio.00809-16#fig2)
  2. Category II: duplication with repositioning (https://journals.asm.org/doi/10.1128/mBio.00809-16#fig3)
  3. Category III: duplication with alteration (https://journals.asm.org/doi/10.1128/mBio.00809-16#fig4)

It is possible to test the imagededup package on these verified fakeries. However, to do that we need to cut out each of these image parts identified by Bik et. al., 2016 and create a dataset on which the code can be executed. This dataset is uploaded on github (https://github.com/Corvus7/Fakery.git) and may serve as a training dataset for future efforts at developing AI-based solutions. The full code used and the results are provided below:
 cd  
 git clone https://github.com/Corvus7/Fakery.git  
 image_dir='~/Fakery/bik2016/Fig2/'  
 from imagededup.methods import CNN   
 cnn_encoder = CNN()   
 duplicates_cnn = cnn_encoder.find_duplicates(image_dir=image_dir, scores=True)   
 duplicates_cnn   
 {'Blue_1.png': [('Blue_2.png', 0.9211361)], 'Blue_2.png': [('Blue_1.png', 0.9211361)], 'Figure_2_cut_out.jpeg': [], 'Green_1.png': [('Green_2.png', 0.95247614)], 'Green_2.png': [('Green_1.png', 0.95247614)], 'Red_1.png': [('Red_2.png', 0.90385926)], 'Red_2.png': [('Red_1.png', 0.90385926)]}  
 image_dir='~/Fakery/bik2016/Fig3/'  
 from imagededup.methods import CNN   
 cnn_encoder = CNN()   
 duplicates_cnn = cnn_encoder.find_duplicates(image_dir=image_dir, scores=True)   
 duplicates_cnn   
 {'Blue_1.png': [], 'Blue_2.png': [], 'Figure_3_cut_out.jpeg': [], 'Green_1.png': [], 'Green_2.png': [], 'Red_1.png': [], 'Red_2.png': []}  
 image_dir='~/Fakery/bik2016/Fig4/'  
 from imagededup.methods import CNN   
 cnn_encoder = CNN()   
 duplicates_cnn = cnn_encoder.find_duplicates(image_dir=image_dir, scores=True)   
 duplicates_cnn   
 {'Blue_first.png': [('Blue_second.png', 0.97069657)], 'Blue_second.png': [('Blue_first.png', 0.97069657)], 'Full_screenshot.png': [], 'Green_first.png': [('Green_second.png', 0.98090243)], 'Green_second.png': [('Green_first.png', 0.98090243)], 'Lane_10.png': [('Lane_9.png', 0.90004313)], 'Lane_9.png': [('Lane_10.png', 0.90004313)], 'Orange_first.png': [('Orange_second.png', 0.9222448)], 'Orange_second.png': [('Orange_first.png', 0.9222448)], 'Pink_first.png': [('Pink_second.png', 0.9387925)], 'Pink_second.png': [('Pink_first.png', 0.9387925)], 'Purple_first.png': [('Purple_second.png', 0.95602536)], 'Purple_second.png': [('Purple_first.png', 0.95602536)], 'Red_first.png': [('Red_second.png', 0.9667898)], 'Red_second.png': [('Red_first.png', 0.9667898)], 'orange_first.png': [('orange_second.png', 0.91980916)], 'orange_second.png': [('orange_first.png', 0.91980916)]}  
The results are a bit surprising. Category II duplicates are not at all picked up by the CNN method. The correct figures are tagged as duplicates in Category I and Category III. However, the scores range from 0.9 all the way up to 0.98. This suggests that these scores themselves are not any reliable indicators and manual inspection is definitely required. AI-based methods like imagededup need a lot more sophistication to allow their widespread use in anti-fakery approaches. As far as the paper by Sharma et al 2020 is concerned, the raw data in the form of SAM files are provided in the supplementary materials. Running the samtools split command will separate out each of the sub-components by read group. 


Friday, July 23, 2021

Fakery, some more fakery and even more fakery. The fakir is not the most fake

The works "Faker" and "Fakir" are thought to be semantically related in the English language. While a faker is "One who makes false claims", the term fakir/faqir/faqeer is used to denote a holy man in the Indian subcontinent. Why these two words are semantically related is something one could speculate about. This post is not about this. The lack of evidence to support the claims that the great British statesman Sir Winston Leonard Spencer Churchill ever referred to Mahatma Gandhi, as a "seditious fakir" should give us enough insight.

Now coming to the main topic of this post, fakery in science is one of the most perplexing things i have encountered. Yet, claiming to have invented or discovered something which is not true is surprisingly common. Thanks to the selfless efforts of intelligent people like Jennifer Byrne and Elisabeth Bik, many instances of such fakery have begun to come to light. The paper titled "The Prevalence of Inappropriate Image Duplication in Biomedical Research Publications" is probably the most influential among these efforts. By analyzing a large enough dataset of published papers, the authors try to decipher the major patterns in fakery.

  1. The clearest pattern they demonstrate is the effect of the impact factor (https://journals.asm.org/doi/10.1128/mBio.00809-16#fig6). Journals with lower impact factors seem to have more fakery than those with higher impact factors. The most fakery is seen in journals with impact factors between 2 and 3.
  2. The country of origin also seems to have a strong enough effect (https://journals.asm.org/doi/10.1128/mBio.00809-16#fig7). India and China (including Taiwan) have a greater fraction of papers with problematic images than expected. In the words of Bik et al., "Countries plotted above the blue line had a higher-than-expected proportion of problematic papers; countries plotted below the line had a lower-than-expected ratio." 
  3. Another interesting trend identified by Jennifer Byrne is that such fakery involves "targeting less-well-known human genes to produce low-value and possibly fraudulent papers".

After having read this, I realized that the paper by Sharma et.al., 2020 meets all three criteria. It is published in a journal with an impact factor between 2 and 3, all authors are having affiliations from India and the paper is about a less well-known human gene. Hence, it is very likely that this paper may have employed large-scale image duplication and other forms of fakery to get past the review process. This would make it an ideal candidate to investigate some more fakery.

Not everybody is as gifted as Bik, who has done most of her sleuthing just using her eyes. Therefore, the use of AI-based methods is expected to help relieve the pressure on human volunteers and allow the large-scale search of the scientific literature. Unfortunately, AI is yet to deliver on this promise. The python package imagededup (https://github.com/idealo/imagededup) makes major strides in this direction. It provides an interface to search for image duplication in a large set of files using algorithms like Perceptual hashing and CNN. A score is also provided to help prioritize the putative duplicates identified. Further progress in tools such as these would be able to prevent fraudulent use of image duplication.

Returning to the paper published by Sharma et.al., 2020, an astounding 287 supplementary figures are reported by this paper. The code given below is used to look for fakery in this paper's supplementary figures.

 wget https://static-content.springer.com/esm/art%3A10.1007%2Fs00251-020-01186-2/MediaObjects/251_2020_1186_MOESM9_ESM.pptx  
 mv 251_2020_1186_MOESM9_ESM.pptx 251_2020_1186_MOESM9_ESM.zip  
 unzip 251_2020_1186_MOESM9_ESM.zip  

The above code downloads the PowerPoint file containing all the supplementary figures and renames it as a zip file and extracts its contents. By doing these three steps, we have all the images present in the PowerPoint now available in a single folder under ppt/media/. The next steps would involve installing the imagededup package with all the required pre-requisites.

After everything is installed and working properly, the following code should work. Make sure the correct version of NumPy is installed. Fire up the python console and the correct path to the image directory (the folder in which all the images have been extracted by the unzip command above.)

 #Find duplicates using CNN along with scores  
 from imagededup.methods import CNN  
 cnn_encoder = CNN()  
 duplicates_cnn = cnn_encoder.find_duplicates(image_dir=image_dir, scores=True)  
 duplicates_cnn  
After the above code executes (you don't need a GPU, it just complaints that you lack a GPU), the set of putative duplicate images are identified and printed on the screen with the scores.
 {'image1.png': [], 'image10.png': [], 'image100.png': [('image101.png', 0.9183681), ('image83.png', 0.9336556), ('image84.png', 0.94029236), ('image85.png', 0.96624136), ('image86.png', 0.95270556), ('image87.png', 0.91503084), ('image88.png', 0.9674068), ('image89.png', 0.9193216), ('image91.png', 0.95968944), ('image92.png', 0.9491019), ('image93.png', 0.9689957), ('image94.png', 0.9400719), ('image96.png', 0.947449), ('image99.png', 0.9547266)], 'image101.png': [('image100.png', 0.9183681), ('image84.png', 0.92484504), ('image85.png', 0.921789), ('image86.png', 0.90466857), ('image87.png', 0.92616916), ('image88.png', 0.91239095), ('image91.png', 0.92148745), ('image96.png', 0.92620283), ('image97.png', 0.9051003), ('image98.png', 0.91163087), ('image99.png', 0.91708857)], 'image102.png': [('image103.png', 0.9484605)], 'image103.png': [('image102.png', 0.9484605)], 'image104.png': [], 'image105.png': [('image112.png', 0.9010905)], 'image106.png': [], 'image107.png': [('image110.png', 0.90515953)], 'image108.png': [('image110.png', 0.90387726)], 'image109.png': [('image110.png', 0.9003114)], 'image11.png': [('image14.png', 0.9102166), ('image16.png', 0.92004216), ('image20.png', 0.93255997), ('image7.png', 0.9118265), ('image8.png', 0.9021534)], 'image110.png': [('image107.png', 0.90515953), ('image108.png', 0.90387726), ('image109.png', 0.9003114)], 'image111.png': [], 'image112.png': [('image105.png', 0.9010905)], 'image113.jpeg': [('image150.jpeg', 0.9157622)], 'image114.png': [('image171.png', 0.91326845)], 'image115.png': [('image121.png', 0.92301464)], 'image116.png': [], 'image117.png': [('image120.png', 0.90136623)], 'image118.png': [], 'image119.png': [('image120.png', 0.92832935)], 'image12.png': [('image16.png', 0.91794515), ('image7.png', 0.9085101), ('image8.png', 0.9045775)], 'image120.png': [('image117.png', 0.90136623), ('image119.png', 0.92832935), ('image139.png', 0.9278433), ('image140.png', 0.91120446), ('image152.png', 0.9259067), ('image153.png', 0.90862787)], 'image121.png': [('image115.png', 0.92301464), ('image141.png', 0.9063702)], 'image122.png': [('image125.png', 0.9004039)], 'image123.png': [('image201.png', 0.9129802)], 'image124.png': [], 'image125.png': [('image122.png', 0.9004039), ('image202.png', 0.9160283), ('image203.png', 0.9110071), ('image212.png', 0.91093236)], 'image126.png': [], 'image127.png': [], 'image128.png': [], 'image129.png': [], 'image13.png': [], 'image130.png': [], 'image131.png': [], 'image132.png': [], 'image133.jpeg': [], 'image134.png': [], 'image135.png': [], 'image136.png': [], 'image137.png': [], 'image138.png': [], 'image139.png': [('image120.png', 0.9278433)], 'image14.png': [('image11.png', 0.9102166), ('image16.png', 0.94823205), ('image20.png', 0.9343158), ('image7.png', 0.94182235)], 'image140.png': [('image120.png', 0.91120446)], 'image141.png': [('image121.png', 0.9063702), ('image74.png', 0.91188), ('image76.png', 0.92340136), ('image78.png', 0.90045446)], 'image142.png': [], 'image143.png': [('image144.png', 0.9097591)], 'image144.png': [('image143.png', 0.9097591)], 'image145.png': [], 'image146.png': [], 'image147.png': [('image148.png', 0.91800094)], 'image148.png': [('image147.png', 0.91800094)], 'image149.png': [], 'image15.png': [('image18.png', 0.9242378), ('image26.png', 0.91410726)], 'image150.jpeg': [('image113.jpeg', 0.9157622)], 'image151.png': [], 'image152.png': [('image120.png', 0.9259067), ('image153.png', 0.9593252)], 'image153.png': [('image120.png', 0.90862787), ('image152.png', 0.9593252), ('image157.png', 0.9117504)], 'image154.png': [('image158.png', 0.9086372)], 'image155.png': [], 'image156.png': [('image157.png', 0.9200652)], 'image157.png': [('image153.png', 0.9117504), ('image156.png', 0.9200652)], 'image158.png': [('image154.png', 0.9086372)], 'image159.png': [], 'image16.png': [('image11.png', 0.92004216), ('image12.png', 0.91794515), ('image14.png', 0.94823205), ('image20.png', 0.9272426), ('image7.png', 0.959764)], 'image160.png': [], 'image161.png': [], 'image162.png': [], 'image163.png': [], 'image164.jpg': [], 'image165.png': [('image199.png', 0.90960884), ('image209.png', 0.9267041)], 'image166.png': [('image194.png', 0.9086143)], 'image167.png': [], 'image168.png': [], 'image169.png': [], 'image17.png': [('image19.png', 0.90715945)], 'image170.png': [], 'image171.png': [('image114.png', 0.91326845)], 'image172.png': [('image173.png', 0.9054203)], 'image173.png': [('image172.png', 0.9054203)], 'image174.png': [], 'image175.png': [('image211.png', 0.9042617)], 'image176.png': [], 'image177.png': [], 'image178.png': [], 'image179.png': [], 'image18.png': [('image15.png', 0.9242378), ('image26.png', 0.9157932), ('image7.png', 0.9010002)], 'image180.png': [], 'image181.png': [], 'image182.png': [], 'image183.png': [], 'image184.png': [], 'image185.png': [('image191.png', 0.92406636), ('image209.png', 0.90291137)], 'image186.png': [('image194.png', 0.90917015)], 'image187.png': [], 'image188.png': [], 'image189.png': [], 'image19.png': [('image17.png', 0.90715945)], 'image190.png': [], 'image191.png': [('image185.png', 0.92406636)], 'image192.png': [], 'image193.png': [], 'image194.png': [('image166.png', 0.9086143), ('image186.png', 0.90917015)], 'image195.png': [], 'image196.png': [], 'image197.png': [], 'image198.png': [], 'image199.png': [('image165.png', 0.90960884)], 'image2.png': [], 'image20.png': [('image11.png', 0.93255997), ('image14.png', 0.9343158), ('image16.png', 0.9272426), ('image7.png', 0.9131277), ('image8.png', 0.9061092)], 'image200.png': [], 'image201.png': [('image123.png', 0.9129802)], 'image202.png': [('image125.png', 0.9160283), ('image203.png', 0.94643486)], 'image203.png': [('image125.png', 0.9110071), ('image202.png', 0.94643486)], 'image204.png': [], 'image205.png': [], 'image206.png': [], 'image207.png': [], 'image208.png': [], 'image209.png': [('image165.png', 0.9267041), ('image185.png', 0.90291137)], 'image21.png': [('image36.png', 0.9179164)], 'image210.png': [], 'image211.png': [('image175.png', 0.9042617)], 'image212.png': [('image125.png', 0.91093236)], 'image213.png': [], 'image214.png': [], 'image215.jpeg': [], 'image216.png': [], 'image217.png': [('image218.png', 0.91770554)], 'image218.png': [('image217.png', 0.91770554), ('image224.png', 0.9038681)], 'image219.png': [], 'image22.png': [('image35.png', 0.90540993)], 'image220.png': [], 'image221.png': [], 'image222.png': [], 'image223.png': [], 'image224.png': [('image218.png', 0.9038681)], 'image225.png': [], 'image226.png': [], 'image227.png': [], 'image228.png': [], 'image229.png': [], 'image23.png': [('image25.png', 0.9128949), ('image26.png', 0.9407302)], 'image230.jpg': [], 'image231.jpg': [], 'image232.png': [], 'image233.png': [], 'image234.png': [('image235.png', 0.95958734)], 'image235.png': [('image234.png', 0.95958734)], 'image236.png': [], 'image237.png': [('image238.png', 0.9559316), ('image239.png', 0.95943606), ('image257.png', 0.9001226), ('image259.png', 0.91905767)], 'image238.png': [('image237.png', 0.9559316), ('image239.png', 0.93660516), ('image257.png', 0.9084082), ('image259.png', 0.93839127)], 'image239.png': [('image237.png', 0.95943606), ('image238.png', 0.93660516), ('image250.png', 0.9003399), ('image259.png', 0.9138143)], 'image24.png': [], 'image240.png': [], 'image241.png': [('image242.png', 0.9320455), ('image243.png', 0.92742676), ('image250.png', 0.90400875), ('image251.png', 0.90990597)], 'image242.png': [('image241.png', 0.9320455), ('image243.png', 0.9628831)], 'image243.png': [('image241.png', 0.92742676), ('image242.png', 0.9628831)], 'image244.png': [], 'image245.png': [('image246.png', 0.9437679), ('image247.png', 0.93426013), ('image250.png', 0.9031271), ('image253.png', 0.91433334)], 'image246.png': [('image245.png', 0.9437679), ('image247.png', 0.9361171), ('image250.png', 0.9113657)], 'image247.png': [('image245.png', 0.93426013), ('image246.png', 0.9361171), ('image250.png', 0.90679574)], 'image248.png': [], 'image249.png': [], 'image25.png': [('image23.png', 0.9128949), ('image26.png', 0.92126876)], 'image250.png': [('image239.png', 0.9003399), ('image241.png', 0.90400875), ('image245.png', 0.9031271), ('image246.png', 0.9113657), ('image247.png', 0.90679574), ('image251.png', 0.9163149)], 'image251.png': [('image241.png', 0.90990597), ('image250.png', 0.9163149)], 'image252.png': [], 'image253.png': [('image245.png', 0.91433334)], 'image254.png': [], 'image255.png': [], 'image256.png': [], 'image257.png': [('image237.png', 0.9001226), ('image238.png', 0.9084082), ('image258.png', 0.91473335), ('image259.png', 0.9384123)], 'image258.png': [('image257.png', 0.91473335), ('image259.png', 0.92645323)], 'image259.png': [('image237.png', 0.91905767), ('image238.png', 0.93839127), ('image239.png', 0.9138143), ('image257.png', 0.9384123), ('image258.png', 0.92645323)], 'image26.png': [('image15.png', 0.91410726), ('image18.png', 0.9157932), ('image23.png', 0.9407302), ('image25.png', 0.92126876)], 'image260.png': [], 'image261.png': [], 'image262.png': [('image263.png', 0.92793417)], 'image263.png': [('image262.png', 0.92793417)], 'image264.png': [], 'image265.png': [('image266.png', 0.93153894), ('image267.png', 0.94791853)], 'image266.png': [('image265.png', 0.93153894), ('image267.png', 0.9459948)], 'image267.png': [('image265.png', 0.94791853), ('image266.png', 0.9459948)], 'image268.png': [], 'image269.png': [], 'image27.png': [('image28.png', 0.9027702), ('image40.png', 0.91454256)], 'image270.png': [], 'image271.png': [], 'image272.png': [], 'image273.png': [('image274.png', 0.9425941), ('image275.png', 0.9263323)], 'image274.png': [('image273.png', 0.9425941), ('image275.png', 0.9559473)], 'image275.png': [('image273.png', 0.9263323), ('image274.png', 0.9559473)], 'image276.png': [], 'image277.png': [('image278.png', 0.94348145), ('image279.png', 0.93136877)], 'image278.png': [('image277.png', 0.94348145), ('image279.png', 0.9180116)], 'image279.png': [('image277.png', 0.93136877), ('image278.png', 0.9180116)], 'image28.png': [('image27.png', 0.9027702), ('image30.png', 0.9038577)], 'image280.png': [], 'image281.png': [], 'image282.png': [], 'image283.png': [], 'image284.png': [], 'image285.png': [], 'image286.png': [], 'image287.jpg': [], 'image29.png': [('image34.png', 0.90683985)], 'image3.png': [], 'image30.png': [('image28.png', 0.9038577), ('image33.png', 0.9480987), ('image39.png', 0.94268227), ('image40.png', 0.93139154)], 'image31.png': [('image40.png', 0.919406)], 'image32.png': [], 'image33.png': [('image30.png', 0.9480987), ('image34.png', 0.9020339), ('image39.png', 0.9246298), ('image40.png', 0.9213065)], 'image34.png': [('image29.png', 0.90683985), ('image33.png', 0.9020339), ('image40.png', 0.92046106)], 'image35.png': [('image22.png', 0.90540993)], 'image36.png': [('image21.png', 0.9179164)], 'image37.png': [], 'image38.png': [], 'image39.png': [('image30.png', 0.94268227), ('image33.png', 0.9246298), ('image40.png', 0.9188498)], 'image4.png': [], 'image40.png': [('image27.png', 0.91454256), ('image30.png', 0.93139154), ('image31.png', 0.919406), ('image33.png', 0.9213065), ('image34.png', 0.92046106), ('image39.png', 0.9188498)], 'image41.png': [], 'image42.png': [], 'image43.png': [], 'image44.png': [], 'image45.png': [('image47.png', 0.90898323)], 'image46.png': [], 'image47.png': [('image45.png', 0.90898323)], 'image48.png': [], 'image49.png': [], 'image5.png': [('image6.png', 0.93074346)], 'image50.png': [], 'image51.png': [], 'image52.jpg': [], 'image53.png': [('image54.png', 0.9346998), ('image55.png', 0.9301629), ('image56.png', 0.90333104), ('image58.png', 0.9169454), ('image59.png', 0.9067757)], 'image54.png': [('image53.png', 0.9346998), ('image55.png', 0.9548632), ('image56.png', 0.9364393), ('image58.png', 0.95981777), ('image59.png', 0.9343269)], 'image55.png': [('image53.png', 0.9301629), ('image54.png', 0.9548632), ('image56.png', 0.9410248), ('image58.png', 0.94894063), ('image59.png', 0.9302525)], 'image56.png': [('image53.png', 0.90333104), ('image54.png', 0.9364393), ('image55.png', 0.9410248), ('image58.png', 0.918398), ('image59.png', 0.94681996)], 'image57.png': [], 'image58.png': [('image53.png', 0.9169454), ('image54.png', 0.95981777), ('image55.png', 0.94894063), ('image56.png', 0.918398), ('image59.png', 0.92951506)], 'image59.png': [('image53.png', 0.9067757), ('image54.png', 0.9343269), ('image55.png', 0.9302525), ('image56.png', 0.94681996), ('image58.png', 0.92951506)], 'image6.png': [('image5.png', 0.93074346)], 'image60.png': [], 'image61.png': [], 'image62.png': [('image65.png', 0.92089856), ('image69.png', 0.90719336), ('image71.png', 0.91296417)], 'image63.png': [('image66.png', 0.9318356), ('image67.png', 0.9502576), ('image69.png', 0.93360347), ('image70.png', 0.9029176), ('image71.png', 0.9614921)], 'image64.png': [('image65.png', 0.9162164), ('image67.png', 0.9098237), ('image73.png', 0.9010482)], 'image65.png': [('image62.png', 0.92089856), ('image64.png', 0.9162164), ('image67.png', 0.9024062), ('image71.png', 0.92213106), ('image72.png', 0.92419463)], 'image66.png': [('image63.png', 0.9318356), ('image67.png', 0.9185381), ('image69.png', 0.9501172), ('image70.png', 0.90259856), ('image71.png', 0.9121348), ('image73.png', 0.9006342)], 'image67.png': [('image63.png', 0.9502576), ('image64.png', 0.9098237), ('image65.png', 0.9024062), ('image66.png', 0.9185381), ('image71.png', 0.92976534), ('image73.png', 0.9187012)], 'image68.png': [], 'image69.png': [('image62.png', 0.90719336), ('image63.png', 0.93360347), ('image66.png', 0.9501172), ('image70.png', 0.9008339), ('image71.png', 0.9222581)], 'image7.png': [('image11.png', 0.9118265), ('image12.png', 0.9085101), ('image14.png', 0.94182235), ('image16.png', 0.959764), ('image18.png', 0.9010002), ('image20.png', 0.9131277)], 'image70.png': [('image63.png', 0.9029176), ('image66.png', 0.90259856), ('image69.png', 0.9008339), ('image71.png', 0.9152819)], 'image71.png': [('image62.png', 0.91296417), ('image63.png', 0.9614921), ('image65.png', 0.92213106), ('image66.png', 0.9121348), ('image67.png', 0.92976534), ('image69.png', 0.9222581), ('image70.png', 0.9152819), ('image72.png', 0.9081404)], 'image72.png': [('image65.png', 0.92419463), ('image71.png', 0.9081404)], 'image73.png': [('image64.png', 0.9010482), ('image66.png', 0.9006342), ('image67.png', 0.9187012)], 'image74.png': [('image141.png', 0.91188), ('image76.png', 0.9384106), ('image77.png', 0.90596235), ('image78.png', 0.95519), ('image97.png', 0.91846454)], 'image75.png': [('image79.png', 0.9054158)], 'image76.png': [('image141.png', 0.92340136), ('image74.png', 0.9384106), ('image77.png', 0.93369615), ('image78.png', 0.9442819)], 'image77.png': [('image74.png', 0.90596235), ('image76.png', 0.93369615), ('image78.png', 0.9032801), ('image87.png', 0.90282905), ('image99.png', 0.90046567)], 'image78.png': [('image141.png', 0.90045446), ('image74.png', 0.95519), ('image76.png', 0.9442819), ('image77.png', 0.9032801), ('image97.png', 0.9011791)], 'image79.png': [('image75.png', 0.9054158)], 'image8.png': [('image11.png', 0.9021534), ('image12.png', 0.9045775), ('image20.png', 0.9061092)], 'image80.png': [], 'image81.png': [], 'image82.png': [], 'image83.png': [('image100.png', 0.9336556), ('image84.png', 0.9320098), ('image85.png', 0.9206197), ('image86.png', 0.9754471), ('image87.png', 0.9008194), ('image88.png', 0.9269708), ('image89.png', 0.94526726), ('image90.png', 0.94901687), ('image92.png', 0.948261), ('image93.png', 0.9211924), ('image94.png', 0.9406297), ('image95.png', 0.95429873), ('image97.png', 0.90625954), ('image98.png', 0.9336565), ('image99.png', 0.90938056)], 'image84.png': [('image100.png', 0.94029236), ('image101.png', 0.92484504), ('image83.png', 0.9320098), ('image85.png', 0.9357996), ('image86.png', 0.94560987), ('image87.png', 0.9238595), ('image88.png', 0.9233467), ('image89.png', 0.93499565), ('image90.png', 0.9193237), ('image91.png', 0.9257249), ('image92.png', 0.92898804), ('image93.png', 0.9164334), ('image94.png', 0.9218082), ('image95.png', 0.9160108), ('image96.png', 0.91751677), ('image97.png', 0.90513825), ('image98.png', 0.92819315), ('image99.png', 0.9185119)], 'image85.png': [('image100.png', 0.96624136), ('image101.png', 0.921789), ('image83.png', 0.9206197), ('image84.png', 0.9357996), ('image86.png', 0.94672084), ('image87.png', 0.93584263), ('image88.png', 0.95474243), ('image89.png', 0.9393886), ('image91.png', 0.9575047), ('image92.png', 0.9566721), ('image93.png', 0.94614804), ('image94.png', 0.94766164), ('image96.png', 0.9661529), ('image98.png', 0.9093032), ('image99.png', 0.9608704)], 'image86.png': [('image100.png', 0.95270556), ('image101.png', 0.90466857), ('image83.png', 0.9754471), ('image84.png', 0.94560987), ('image85.png', 0.94672084), ('image87.png', 0.9302621), ('image88.png', 0.93556523), ('image89.png', 0.9451473), ('image90.png', 0.93358594), ('image91.png', 0.92195714), ('image92.png', 0.9587965), ('image93.png', 0.9311488), ('image94.png', 0.94426256), ('image95.png', 0.9393131), ('image96.png', 0.909492), ('image97.png', 0.9209131), ('image98.png', 0.9426492), ('image99.png', 0.9371631)], 'image87.png': [('image100.png', 0.91503084), ('image101.png', 0.92616916), ('image77.png', 0.90282905), ('image83.png', 0.9008194), ('image84.png', 0.9238595), ('image85.png', 0.93584263), ('image86.png', 0.9302621), ('image88.png', 0.9144976), ('image89.png', 0.9421051), ('image90.png', 0.9068817), ('image91.png', 0.91225964), ('image92.png', 0.92346567), ('image94.png', 0.9416356), ('image95.png', 0.90827584), ('image96.png', 0.9235568), ('image97.png', 0.94440794), ('image98.png', 0.94083), ('image99.png', 0.94340634)], 'image88.png': [('image100.png', 0.9674068), ('image101.png', 0.91239095), ('image83.png', 0.9269708), ('image84.png', 0.9233467), ('image85.png', 0.95474243), ('image86.png', 0.93556523), ('image87.png', 0.9144976), ('image89.png', 0.93033636), ('image90.png', 0.91487217), ('image91.png', 0.9425154), ('image92.png', 0.9378558), ('image93.png', 0.9692812), ('image94.png', 0.94958806), ('image95.png', 0.90385324), ('image96.png', 0.94041604), ('image98.png', 0.911821), ('image99.png', 0.94753677)], 'image89.png': [('image100.png', 0.9193216), ('image83.png', 0.94526726), ('image84.png', 0.93499565), ('image85.png', 0.9393886), ('image86.png', 0.9451473), ('image87.png', 0.9421051), ('image88.png', 0.93033636), ('image90.png', 0.9433856), ('image92.png', 0.9473567), ('image93.png', 0.9181183), ('image94.png', 0.96665895), ('image95.png', 0.9607937), ('image96.png', 0.90655136), ('image97.png', 0.9302972), ('image98.png', 0.936584), ('image99.png', 0.9141985)], 'image9.png': [], 'image90.png': [('image83.png', 0.94901687), ('image84.png', 0.9193237), ('image86.png', 0.93358594), ('image87.png', 0.9068817), ('image88.png', 0.91487217), ('image89.png', 0.9433856), ('image92.png', 0.9033547), ('image94.png', 0.9209361), ('image95.png', 0.9587287), ('image97.png', 0.932611), ('image98.png', 0.94770867)], 'image91.png': [('image100.png', 0.95968944), ('image101.png', 0.92148745), ('image84.png', 0.9257249), ('image85.png', 0.9575047), ('image86.png', 0.92195714), ('image87.png', 0.91225964), ('image88.png', 0.9425154), ('image92.png', 0.92670405), ('image93.png', 0.94980544), ('image94.png', 0.91421396), ('image96.png', 0.97193605), ('image99.png', 0.9469298)], 'image92.png': [('image100.png', 0.9491019), ('image83.png', 0.948261), ('image84.png', 0.92898804), ('image85.png', 0.9566721), ('image86.png', 0.9587965), ('image87.png', 0.92346567), ('image88.png', 0.9378558), ('image89.png', 0.9473567), ('image90.png', 0.9033547), ('image91.png', 0.92670405), ('image93.png', 0.9380702), ('image94.png', 0.96137464), ('image95.png', 0.91555953), ('image96.png', 0.92193484), ('image98.png', 0.9120532), ('image99.png', 0.9409566)], 'image93.png': [('image100.png', 0.9689957), ('image83.png', 0.9211924), ('image84.png', 0.9164334), ('image85.png', 0.94614804), ('image86.png', 0.9311488), ('image88.png', 0.9692812), ('image89.png', 0.9181183), ('image91.png', 0.94980544), ('image92.png', 0.9380702), ('image94.png', 0.9362874), ('image96.png', 0.9362149), ('image99.png', 0.930895)], 'image94.png': [('image100.png', 0.9400719), ('image83.png', 0.9406297), ('image84.png', 0.9218082), ('image85.png', 0.94766164), ('image86.png', 0.94426256), ('image87.png', 0.9416356), ('image88.png', 0.94958806), ('image89.png', 0.96665895), ('image90.png', 0.9209361), ('image91.png', 0.91421396), ('image92.png', 0.96137464), ('image93.png', 0.9362874), ('image95.png', 0.9345614), ('image96.png', 0.91518164), ('image97.png', 0.9218683), ('image98.png', 0.926751), ('image99.png', 0.94931686)], 'image95.png': [('image83.png', 0.95429873), ('image84.png', 0.9160108), ('image86.png', 0.9393131), ('image87.png', 0.90827584), ('image88.png', 0.90385324), ('image89.png', 0.9607937), ('image90.png', 0.9587287), ('image92.png', 0.91555953), ('image94.png', 0.9345614), ('image97.png', 0.92592674), ('image98.png', 0.94172406)], 'image96.png': [('image100.png', 0.947449), ('image101.png', 0.92620283), ('image84.png', 0.91751677), ('image85.png', 0.9661529), ('image86.png', 0.909492), ('image87.png', 0.9235568), ('image88.png', 0.94041604), ('image89.png', 0.90655136), ('image91.png', 0.97193605), ('image92.png', 0.92193484), ('image93.png', 0.9362149), ('image94.png', 0.91518164), ('image99.png', 0.93704534)], 'image97.png': [('image101.png', 0.9051003), ('image74.png', 0.91846454), ('image78.png', 0.9011791), ('image83.png', 0.90625954), ('image84.png', 0.90513825), ('image86.png', 0.9209131), ('image87.png', 0.94440794), ('image89.png', 0.9302972), ('image90.png', 0.932611), ('image94.png', 0.9218683), ('image95.png', 0.92592674), ('image98.png', 0.9679376)], 'image98.png': [('image101.png', 0.91163087), ('image83.png', 0.9336565), ('image84.png', 0.92819315), ('image85.png', 0.9093032), ('image86.png', 0.9426492), ('image87.png', 0.94083), ('image88.png', 0.911821), ('image89.png', 0.936584), ('image90.png', 0.94770867), ('image92.png', 0.9120532), ('image94.png', 0.926751), ('image95.png', 0.94172406), ('image97.png', 0.9679376), ('image99.png', 0.91562355)], 'image99.png': [('image100.png', 0.9547266), ('image101.png', 0.91708857), ('image77.png', 0.90046567), ('image83.png', 0.90938056), ('image84.png', 0.9185119), ('image85.png', 0.9608704), ('image86.png', 0.9371631), ('image87.png', 0.94340634), ('image88.png', 0.94753677), ('image89.png', 0.9141985), ('image91.png', 0.9469298), ('image92.png', 0.9409566), ('image93.png', 0.930895), ('image94.png', 0.94931686), ('image96.png', 0.93704534), ('image98.png', 0.91562355)]}  

This is a surprisingly long list of putative duplicate images identified by the CNN implementation which is benchmarked to be very good at finding near duplicates. The highest score for duplication is 0.9754471 and occurs twice. The two high scores are seen when image86.png is detected as a putative duplicate of image83.png and vice-versa. Now, it looks like we may have a hit for image duplication in this paper !!!

Manual inspection of this close hit will help resolve this issue better. The python package has the functionality to allow such inspection.

 # plot duplicates obtained for a given file using the duplicates dictionary  
 from imagededup.utils import plot_duplicates  
 image_to_check='image83.png'  
 plot_duplicates(image_dir=image_dir,duplicate_map=duplicates_cnn,filename=image_to_check)  

The above code will open a GUI interface (matplotlib) that shows the original image at the top center. All of the putative duplicate images are shown below this image with the scores mentioned in the brackets. Luckily for these faqeers, the images are not exact duplicates. However, the images are pretty darn close as they are screenshots of the IGV software depicting the RNAseq data for the same PLGRKT gene in very different tissues and species. These analyses suggest that none of the supplementary figures are exact duplicates.