Friday, July 23, 2021

Fakery, some more fakery and even more fakery. The fakir is not the most fake

The works "Faker" and "Fakir" are thought to be semantically related in the English language. While a faker is "One who makes false claims", the term fakir/faqir/faqeer is used to denote a holy man in the Indian subcontinent. Why these two words are semantically related is something one could speculate about. This post is not about this. The lack of evidence to support the claims that the great British statesman Sir Winston Leonard Spencer Churchill ever referred to Mahatma Gandhi, as a "seditious fakir" should give us enough insight.

Now coming to the main topic of this post, fakery in science is one of the most perplexing things i have encountered. Yet, claiming to have invented or discovered something which is not true is surprisingly common. Thanks to the selfless efforts of intelligent people like Jennifer Byrne and Elisabeth Bik, many instances of such fakery have begun to come to light. The paper titled "The Prevalence of Inappropriate Image Duplication in Biomedical Research Publications" is probably the most influential among these efforts. By analyzing a large enough dataset of published papers, the authors try to decipher the major patterns in fakery.

  1. The clearest pattern they demonstrate is the effect of the impact factor (https://journals.asm.org/doi/10.1128/mBio.00809-16#fig6). Journals with lower impact factors seem to have more fakery than those with higher impact factors. The most fakery is seen in journals with impact factors between 2 and 3.
  2. The country of origin also seems to have a strong enough effect (https://journals.asm.org/doi/10.1128/mBio.00809-16#fig7). India and China (including Taiwan) have a greater fraction of papers with problematic images than expected. In the words of Bik et al., "Countries plotted above the blue line had a higher-than-expected proportion of problematic papers; countries plotted below the line had a lower-than-expected ratio." 
  3. Another interesting trend identified by Jennifer Byrne is that such fakery involves "targeting less-well-known human genes to produce low-value and possibly fraudulent papers".

After having read this, I realized that the paper by Sharma et.al., 2020 meets all three criteria. It is published in a journal with an impact factor between 2 and 3, all authors are having affiliations from India and the paper is about a less well-known human gene. Hence, it is very likely that this paper may have employed large-scale image duplication and other forms of fakery to get past the review process. This would make it an ideal candidate to investigate some more fakery.

Not everybody is as gifted as Bik, who has done most of her sleuthing just using her eyes. Therefore, the use of AI-based methods is expected to help relieve the pressure on human volunteers and allow the large-scale search of the scientific literature. Unfortunately, AI is yet to deliver on this promise. The python package imagededup (https://github.com/idealo/imagededup) makes major strides in this direction. It provides an interface to search for image duplication in a large set of files using algorithms like Perceptual hashing and CNN. A score is also provided to help prioritize the putative duplicates identified. Further progress in tools such as these would be able to prevent fraudulent use of image duplication.

Returning to the paper published by Sharma et.al., 2020, an astounding 287 supplementary figures are reported by this paper. The code given below is used to look for fakery in this paper's supplementary figures.

 wget https://static-content.springer.com/esm/art%3A10.1007%2Fs00251-020-01186-2/MediaObjects/251_2020_1186_MOESM9_ESM.pptx  
 mv 251_2020_1186_MOESM9_ESM.pptx 251_2020_1186_MOESM9_ESM.zip  
 unzip 251_2020_1186_MOESM9_ESM.zip  

The above code downloads the PowerPoint file containing all the supplementary figures and renames it as a zip file and extracts its contents. By doing these three steps, we have all the images present in the PowerPoint now available in a single folder under ppt/media/. The next steps would involve installing the imagededup package with all the required pre-requisites.

After everything is installed and working properly, the following code should work. Make sure the correct version of NumPy is installed. Fire up the python console and the correct path to the image directory (the folder in which all the images have been extracted by the unzip command above.)

 #Find duplicates using CNN along with scores  
 from imagededup.methods import CNN  
 cnn_encoder = CNN()  
 duplicates_cnn = cnn_encoder.find_duplicates(image_dir=image_dir, scores=True)  
 duplicates_cnn  
After the above code executes (you don't need a GPU, it just complaints that you lack a GPU), the set of putative duplicate images are identified and printed on the screen with the scores.
 {'image1.png': [], 'image10.png': [], 'image100.png': [('image101.png', 0.9183681), ('image83.png', 0.9336556), ('image84.png', 0.94029236), ('image85.png', 0.96624136), ('image86.png', 0.95270556), ('image87.png', 0.91503084), ('image88.png', 0.9674068), ('image89.png', 0.9193216), ('image91.png', 0.95968944), ('image92.png', 0.9491019), ('image93.png', 0.9689957), ('image94.png', 0.9400719), ('image96.png', 0.947449), ('image99.png', 0.9547266)], 'image101.png': [('image100.png', 0.9183681), ('image84.png', 0.92484504), ('image85.png', 0.921789), ('image86.png', 0.90466857), ('image87.png', 0.92616916), ('image88.png', 0.91239095), ('image91.png', 0.92148745), ('image96.png', 0.92620283), ('image97.png', 0.9051003), ('image98.png', 0.91163087), ('image99.png', 0.91708857)], 'image102.png': [('image103.png', 0.9484605)], 'image103.png': [('image102.png', 0.9484605)], 'image104.png': [], 'image105.png': [('image112.png', 0.9010905)], 'image106.png': [], 'image107.png': [('image110.png', 0.90515953)], 'image108.png': [('image110.png', 0.90387726)], 'image109.png': [('image110.png', 0.9003114)], 'image11.png': [('image14.png', 0.9102166), ('image16.png', 0.92004216), ('image20.png', 0.93255997), ('image7.png', 0.9118265), ('image8.png', 0.9021534)], 'image110.png': [('image107.png', 0.90515953), ('image108.png', 0.90387726), ('image109.png', 0.9003114)], 'image111.png': [], 'image112.png': [('image105.png', 0.9010905)], 'image113.jpeg': [('image150.jpeg', 0.9157622)], 'image114.png': [('image171.png', 0.91326845)], 'image115.png': [('image121.png', 0.92301464)], 'image116.png': [], 'image117.png': [('image120.png', 0.90136623)], 'image118.png': [], 'image119.png': [('image120.png', 0.92832935)], 'image12.png': [('image16.png', 0.91794515), ('image7.png', 0.9085101), ('image8.png', 0.9045775)], 'image120.png': [('image117.png', 0.90136623), ('image119.png', 0.92832935), ('image139.png', 0.9278433), ('image140.png', 0.91120446), ('image152.png', 0.9259067), ('image153.png', 0.90862787)], 'image121.png': [('image115.png', 0.92301464), ('image141.png', 0.9063702)], 'image122.png': [('image125.png', 0.9004039)], 'image123.png': [('image201.png', 0.9129802)], 'image124.png': [], 'image125.png': [('image122.png', 0.9004039), ('image202.png', 0.9160283), ('image203.png', 0.9110071), ('image212.png', 0.91093236)], 'image126.png': [], 'image127.png': [], 'image128.png': [], 'image129.png': [], 'image13.png': [], 'image130.png': [], 'image131.png': [], 'image132.png': [], 'image133.jpeg': [], 'image134.png': [], 'image135.png': [], 'image136.png': [], 'image137.png': [], 'image138.png': [], 'image139.png': [('image120.png', 0.9278433)], 'image14.png': [('image11.png', 0.9102166), ('image16.png', 0.94823205), ('image20.png', 0.9343158), ('image7.png', 0.94182235)], 'image140.png': [('image120.png', 0.91120446)], 'image141.png': [('image121.png', 0.9063702), ('image74.png', 0.91188), ('image76.png', 0.92340136), ('image78.png', 0.90045446)], 'image142.png': [], 'image143.png': [('image144.png', 0.9097591)], 'image144.png': [('image143.png', 0.9097591)], 'image145.png': [], 'image146.png': [], 'image147.png': [('image148.png', 0.91800094)], 'image148.png': [('image147.png', 0.91800094)], 'image149.png': [], 'image15.png': [('image18.png', 0.9242378), ('image26.png', 0.91410726)], 'image150.jpeg': [('image113.jpeg', 0.9157622)], 'image151.png': [], 'image152.png': [('image120.png', 0.9259067), ('image153.png', 0.9593252)], 'image153.png': [('image120.png', 0.90862787), ('image152.png', 0.9593252), ('image157.png', 0.9117504)], 'image154.png': [('image158.png', 0.9086372)], 'image155.png': [], 'image156.png': [('image157.png', 0.9200652)], 'image157.png': [('image153.png', 0.9117504), ('image156.png', 0.9200652)], 'image158.png': [('image154.png', 0.9086372)], 'image159.png': [], 'image16.png': [('image11.png', 0.92004216), ('image12.png', 0.91794515), ('image14.png', 0.94823205), ('image20.png', 0.9272426), ('image7.png', 0.959764)], 'image160.png': [], 'image161.png': [], 'image162.png': [], 'image163.png': [], 'image164.jpg': [], 'image165.png': [('image199.png', 0.90960884), ('image209.png', 0.9267041)], 'image166.png': [('image194.png', 0.9086143)], 'image167.png': [], 'image168.png': [], 'image169.png': [], 'image17.png': [('image19.png', 0.90715945)], 'image170.png': [], 'image171.png': [('image114.png', 0.91326845)], 'image172.png': [('image173.png', 0.9054203)], 'image173.png': [('image172.png', 0.9054203)], 'image174.png': [], 'image175.png': [('image211.png', 0.9042617)], 'image176.png': [], 'image177.png': [], 'image178.png': [], 'image179.png': [], 'image18.png': [('image15.png', 0.9242378), ('image26.png', 0.9157932), ('image7.png', 0.9010002)], 'image180.png': [], 'image181.png': [], 'image182.png': [], 'image183.png': [], 'image184.png': [], 'image185.png': [('image191.png', 0.92406636), ('image209.png', 0.90291137)], 'image186.png': [('image194.png', 0.90917015)], 'image187.png': [], 'image188.png': [], 'image189.png': [], 'image19.png': [('image17.png', 0.90715945)], 'image190.png': [], 'image191.png': [('image185.png', 0.92406636)], 'image192.png': [], 'image193.png': [], 'image194.png': [('image166.png', 0.9086143), ('image186.png', 0.90917015)], 'image195.png': [], 'image196.png': [], 'image197.png': [], 'image198.png': [], 'image199.png': [('image165.png', 0.90960884)], 'image2.png': [], 'image20.png': [('image11.png', 0.93255997), ('image14.png', 0.9343158), ('image16.png', 0.9272426), ('image7.png', 0.9131277), ('image8.png', 0.9061092)], 'image200.png': [], 'image201.png': [('image123.png', 0.9129802)], 'image202.png': [('image125.png', 0.9160283), ('image203.png', 0.94643486)], 'image203.png': [('image125.png', 0.9110071), ('image202.png', 0.94643486)], 'image204.png': [], 'image205.png': [], 'image206.png': [], 'image207.png': [], 'image208.png': [], 'image209.png': [('image165.png', 0.9267041), ('image185.png', 0.90291137)], 'image21.png': [('image36.png', 0.9179164)], 'image210.png': [], 'image211.png': [('image175.png', 0.9042617)], 'image212.png': [('image125.png', 0.91093236)], 'image213.png': [], 'image214.png': [], 'image215.jpeg': [], 'image216.png': [], 'image217.png': [('image218.png', 0.91770554)], 'image218.png': [('image217.png', 0.91770554), ('image224.png', 0.9038681)], 'image219.png': [], 'image22.png': [('image35.png', 0.90540993)], 'image220.png': [], 'image221.png': [], 'image222.png': [], 'image223.png': [], 'image224.png': [('image218.png', 0.9038681)], 'image225.png': [], 'image226.png': [], 'image227.png': [], 'image228.png': [], 'image229.png': [], 'image23.png': [('image25.png', 0.9128949), ('image26.png', 0.9407302)], 'image230.jpg': [], 'image231.jpg': [], 'image232.png': [], 'image233.png': [], 'image234.png': [('image235.png', 0.95958734)], 'image235.png': [('image234.png', 0.95958734)], 'image236.png': [], 'image237.png': [('image238.png', 0.9559316), ('image239.png', 0.95943606), ('image257.png', 0.9001226), ('image259.png', 0.91905767)], 'image238.png': [('image237.png', 0.9559316), ('image239.png', 0.93660516), ('image257.png', 0.9084082), ('image259.png', 0.93839127)], 'image239.png': [('image237.png', 0.95943606), ('image238.png', 0.93660516), ('image250.png', 0.9003399), ('image259.png', 0.9138143)], 'image24.png': [], 'image240.png': [], 'image241.png': [('image242.png', 0.9320455), ('image243.png', 0.92742676), ('image250.png', 0.90400875), ('image251.png', 0.90990597)], 'image242.png': [('image241.png', 0.9320455), ('image243.png', 0.9628831)], 'image243.png': [('image241.png', 0.92742676), ('image242.png', 0.9628831)], 'image244.png': [], 'image245.png': [('image246.png', 0.9437679), ('image247.png', 0.93426013), ('image250.png', 0.9031271), ('image253.png', 0.91433334)], 'image246.png': [('image245.png', 0.9437679), ('image247.png', 0.9361171), ('image250.png', 0.9113657)], 'image247.png': [('image245.png', 0.93426013), ('image246.png', 0.9361171), ('image250.png', 0.90679574)], 'image248.png': [], 'image249.png': [], 'image25.png': [('image23.png', 0.9128949), ('image26.png', 0.92126876)], 'image250.png': [('image239.png', 0.9003399), ('image241.png', 0.90400875), ('image245.png', 0.9031271), ('image246.png', 0.9113657), ('image247.png', 0.90679574), ('image251.png', 0.9163149)], 'image251.png': [('image241.png', 0.90990597), ('image250.png', 0.9163149)], 'image252.png': [], 'image253.png': [('image245.png', 0.91433334)], 'image254.png': [], 'image255.png': [], 'image256.png': [], 'image257.png': [('image237.png', 0.9001226), ('image238.png', 0.9084082), ('image258.png', 0.91473335), ('image259.png', 0.9384123)], 'image258.png': [('image257.png', 0.91473335), ('image259.png', 0.92645323)], 'image259.png': [('image237.png', 0.91905767), ('image238.png', 0.93839127), ('image239.png', 0.9138143), ('image257.png', 0.9384123), ('image258.png', 0.92645323)], 'image26.png': [('image15.png', 0.91410726), ('image18.png', 0.9157932), ('image23.png', 0.9407302), ('image25.png', 0.92126876)], 'image260.png': [], 'image261.png': [], 'image262.png': [('image263.png', 0.92793417)], 'image263.png': [('image262.png', 0.92793417)], 'image264.png': [], 'image265.png': [('image266.png', 0.93153894), ('image267.png', 0.94791853)], 'image266.png': [('image265.png', 0.93153894), ('image267.png', 0.9459948)], 'image267.png': [('image265.png', 0.94791853), ('image266.png', 0.9459948)], 'image268.png': [], 'image269.png': [], 'image27.png': [('image28.png', 0.9027702), ('image40.png', 0.91454256)], 'image270.png': [], 'image271.png': [], 'image272.png': [], 'image273.png': [('image274.png', 0.9425941), ('image275.png', 0.9263323)], 'image274.png': [('image273.png', 0.9425941), ('image275.png', 0.9559473)], 'image275.png': [('image273.png', 0.9263323), ('image274.png', 0.9559473)], 'image276.png': [], 'image277.png': [('image278.png', 0.94348145), ('image279.png', 0.93136877)], 'image278.png': [('image277.png', 0.94348145), ('image279.png', 0.9180116)], 'image279.png': [('image277.png', 0.93136877), ('image278.png', 0.9180116)], 'image28.png': [('image27.png', 0.9027702), ('image30.png', 0.9038577)], 'image280.png': [], 'image281.png': [], 'image282.png': [], 'image283.png': [], 'image284.png': [], 'image285.png': [], 'image286.png': [], 'image287.jpg': [], 'image29.png': [('image34.png', 0.90683985)], 'image3.png': [], 'image30.png': [('image28.png', 0.9038577), ('image33.png', 0.9480987), ('image39.png', 0.94268227), ('image40.png', 0.93139154)], 'image31.png': [('image40.png', 0.919406)], 'image32.png': [], 'image33.png': [('image30.png', 0.9480987), ('image34.png', 0.9020339), ('image39.png', 0.9246298), ('image40.png', 0.9213065)], 'image34.png': [('image29.png', 0.90683985), ('image33.png', 0.9020339), ('image40.png', 0.92046106)], 'image35.png': [('image22.png', 0.90540993)], 'image36.png': [('image21.png', 0.9179164)], 'image37.png': [], 'image38.png': [], 'image39.png': [('image30.png', 0.94268227), ('image33.png', 0.9246298), ('image40.png', 0.9188498)], 'image4.png': [], 'image40.png': [('image27.png', 0.91454256), ('image30.png', 0.93139154), ('image31.png', 0.919406), ('image33.png', 0.9213065), ('image34.png', 0.92046106), ('image39.png', 0.9188498)], 'image41.png': [], 'image42.png': [], 'image43.png': [], 'image44.png': [], 'image45.png': [('image47.png', 0.90898323)], 'image46.png': [], 'image47.png': [('image45.png', 0.90898323)], 'image48.png': [], 'image49.png': [], 'image5.png': [('image6.png', 0.93074346)], 'image50.png': [], 'image51.png': [], 'image52.jpg': [], 'image53.png': [('image54.png', 0.9346998), ('image55.png', 0.9301629), ('image56.png', 0.90333104), ('image58.png', 0.9169454), ('image59.png', 0.9067757)], 'image54.png': [('image53.png', 0.9346998), ('image55.png', 0.9548632), ('image56.png', 0.9364393), ('image58.png', 0.95981777), ('image59.png', 0.9343269)], 'image55.png': [('image53.png', 0.9301629), ('image54.png', 0.9548632), ('image56.png', 0.9410248), ('image58.png', 0.94894063), ('image59.png', 0.9302525)], 'image56.png': [('image53.png', 0.90333104), ('image54.png', 0.9364393), ('image55.png', 0.9410248), ('image58.png', 0.918398), ('image59.png', 0.94681996)], 'image57.png': [], 'image58.png': [('image53.png', 0.9169454), ('image54.png', 0.95981777), ('image55.png', 0.94894063), ('image56.png', 0.918398), ('image59.png', 0.92951506)], 'image59.png': [('image53.png', 0.9067757), ('image54.png', 0.9343269), ('image55.png', 0.9302525), ('image56.png', 0.94681996), ('image58.png', 0.92951506)], 'image6.png': [('image5.png', 0.93074346)], 'image60.png': [], 'image61.png': [], 'image62.png': [('image65.png', 0.92089856), ('image69.png', 0.90719336), ('image71.png', 0.91296417)], 'image63.png': [('image66.png', 0.9318356), ('image67.png', 0.9502576), ('image69.png', 0.93360347), ('image70.png', 0.9029176), ('image71.png', 0.9614921)], 'image64.png': [('image65.png', 0.9162164), ('image67.png', 0.9098237), ('image73.png', 0.9010482)], 'image65.png': [('image62.png', 0.92089856), ('image64.png', 0.9162164), ('image67.png', 0.9024062), ('image71.png', 0.92213106), ('image72.png', 0.92419463)], 'image66.png': [('image63.png', 0.9318356), ('image67.png', 0.9185381), ('image69.png', 0.9501172), ('image70.png', 0.90259856), ('image71.png', 0.9121348), ('image73.png', 0.9006342)], 'image67.png': [('image63.png', 0.9502576), ('image64.png', 0.9098237), ('image65.png', 0.9024062), ('image66.png', 0.9185381), ('image71.png', 0.92976534), ('image73.png', 0.9187012)], 'image68.png': [], 'image69.png': [('image62.png', 0.90719336), ('image63.png', 0.93360347), ('image66.png', 0.9501172), ('image70.png', 0.9008339), ('image71.png', 0.9222581)], 'image7.png': [('image11.png', 0.9118265), ('image12.png', 0.9085101), ('image14.png', 0.94182235), ('image16.png', 0.959764), ('image18.png', 0.9010002), ('image20.png', 0.9131277)], 'image70.png': [('image63.png', 0.9029176), ('image66.png', 0.90259856), ('image69.png', 0.9008339), ('image71.png', 0.9152819)], 'image71.png': [('image62.png', 0.91296417), ('image63.png', 0.9614921), ('image65.png', 0.92213106), ('image66.png', 0.9121348), ('image67.png', 0.92976534), ('image69.png', 0.9222581), ('image70.png', 0.9152819), ('image72.png', 0.9081404)], 'image72.png': [('image65.png', 0.92419463), ('image71.png', 0.9081404)], 'image73.png': [('image64.png', 0.9010482), ('image66.png', 0.9006342), ('image67.png', 0.9187012)], 'image74.png': [('image141.png', 0.91188), ('image76.png', 0.9384106), ('image77.png', 0.90596235), ('image78.png', 0.95519), ('image97.png', 0.91846454)], 'image75.png': [('image79.png', 0.9054158)], 'image76.png': [('image141.png', 0.92340136), ('image74.png', 0.9384106), ('image77.png', 0.93369615), ('image78.png', 0.9442819)], 'image77.png': [('image74.png', 0.90596235), ('image76.png', 0.93369615), ('image78.png', 0.9032801), ('image87.png', 0.90282905), ('image99.png', 0.90046567)], 'image78.png': [('image141.png', 0.90045446), ('image74.png', 0.95519), ('image76.png', 0.9442819), ('image77.png', 0.9032801), ('image97.png', 0.9011791)], 'image79.png': [('image75.png', 0.9054158)], 'image8.png': [('image11.png', 0.9021534), ('image12.png', 0.9045775), ('image20.png', 0.9061092)], 'image80.png': [], 'image81.png': [], 'image82.png': [], 'image83.png': [('image100.png', 0.9336556), ('image84.png', 0.9320098), ('image85.png', 0.9206197), ('image86.png', 0.9754471), ('image87.png', 0.9008194), ('image88.png', 0.9269708), ('image89.png', 0.94526726), ('image90.png', 0.94901687), ('image92.png', 0.948261), ('image93.png', 0.9211924), ('image94.png', 0.9406297), ('image95.png', 0.95429873), ('image97.png', 0.90625954), ('image98.png', 0.9336565), ('image99.png', 0.90938056)], 'image84.png': [('image100.png', 0.94029236), ('image101.png', 0.92484504), ('image83.png', 0.9320098), ('image85.png', 0.9357996), ('image86.png', 0.94560987), ('image87.png', 0.9238595), ('image88.png', 0.9233467), ('image89.png', 0.93499565), ('image90.png', 0.9193237), ('image91.png', 0.9257249), ('image92.png', 0.92898804), ('image93.png', 0.9164334), ('image94.png', 0.9218082), ('image95.png', 0.9160108), ('image96.png', 0.91751677), ('image97.png', 0.90513825), ('image98.png', 0.92819315), ('image99.png', 0.9185119)], 'image85.png': [('image100.png', 0.96624136), ('image101.png', 0.921789), ('image83.png', 0.9206197), ('image84.png', 0.9357996), ('image86.png', 0.94672084), ('image87.png', 0.93584263), ('image88.png', 0.95474243), ('image89.png', 0.9393886), ('image91.png', 0.9575047), ('image92.png', 0.9566721), ('image93.png', 0.94614804), ('image94.png', 0.94766164), ('image96.png', 0.9661529), ('image98.png', 0.9093032), ('image99.png', 0.9608704)], 'image86.png': [('image100.png', 0.95270556), ('image101.png', 0.90466857), ('image83.png', 0.9754471), ('image84.png', 0.94560987), ('image85.png', 0.94672084), ('image87.png', 0.9302621), ('image88.png', 0.93556523), ('image89.png', 0.9451473), ('image90.png', 0.93358594), ('image91.png', 0.92195714), ('image92.png', 0.9587965), ('image93.png', 0.9311488), ('image94.png', 0.94426256), ('image95.png', 0.9393131), ('image96.png', 0.909492), ('image97.png', 0.9209131), ('image98.png', 0.9426492), ('image99.png', 0.9371631)], 'image87.png': [('image100.png', 0.91503084), ('image101.png', 0.92616916), ('image77.png', 0.90282905), ('image83.png', 0.9008194), ('image84.png', 0.9238595), ('image85.png', 0.93584263), ('image86.png', 0.9302621), ('image88.png', 0.9144976), ('image89.png', 0.9421051), ('image90.png', 0.9068817), ('image91.png', 0.91225964), ('image92.png', 0.92346567), ('image94.png', 0.9416356), ('image95.png', 0.90827584), ('image96.png', 0.9235568), ('image97.png', 0.94440794), ('image98.png', 0.94083), ('image99.png', 0.94340634)], 'image88.png': [('image100.png', 0.9674068), ('image101.png', 0.91239095), ('image83.png', 0.9269708), ('image84.png', 0.9233467), ('image85.png', 0.95474243), ('image86.png', 0.93556523), ('image87.png', 0.9144976), ('image89.png', 0.93033636), ('image90.png', 0.91487217), ('image91.png', 0.9425154), ('image92.png', 0.9378558), ('image93.png', 0.9692812), ('image94.png', 0.94958806), ('image95.png', 0.90385324), ('image96.png', 0.94041604), ('image98.png', 0.911821), ('image99.png', 0.94753677)], 'image89.png': [('image100.png', 0.9193216), ('image83.png', 0.94526726), ('image84.png', 0.93499565), ('image85.png', 0.9393886), ('image86.png', 0.9451473), ('image87.png', 0.9421051), ('image88.png', 0.93033636), ('image90.png', 0.9433856), ('image92.png', 0.9473567), ('image93.png', 0.9181183), ('image94.png', 0.96665895), ('image95.png', 0.9607937), ('image96.png', 0.90655136), ('image97.png', 0.9302972), ('image98.png', 0.936584), ('image99.png', 0.9141985)], 'image9.png': [], 'image90.png': [('image83.png', 0.94901687), ('image84.png', 0.9193237), ('image86.png', 0.93358594), ('image87.png', 0.9068817), ('image88.png', 0.91487217), ('image89.png', 0.9433856), ('image92.png', 0.9033547), ('image94.png', 0.9209361), ('image95.png', 0.9587287), ('image97.png', 0.932611), ('image98.png', 0.94770867)], 'image91.png': [('image100.png', 0.95968944), ('image101.png', 0.92148745), ('image84.png', 0.9257249), ('image85.png', 0.9575047), ('image86.png', 0.92195714), ('image87.png', 0.91225964), ('image88.png', 0.9425154), ('image92.png', 0.92670405), ('image93.png', 0.94980544), ('image94.png', 0.91421396), ('image96.png', 0.97193605), ('image99.png', 0.9469298)], 'image92.png': [('image100.png', 0.9491019), ('image83.png', 0.948261), ('image84.png', 0.92898804), ('image85.png', 0.9566721), ('image86.png', 0.9587965), ('image87.png', 0.92346567), ('image88.png', 0.9378558), ('image89.png', 0.9473567), ('image90.png', 0.9033547), ('image91.png', 0.92670405), ('image93.png', 0.9380702), ('image94.png', 0.96137464), ('image95.png', 0.91555953), ('image96.png', 0.92193484), ('image98.png', 0.9120532), ('image99.png', 0.9409566)], 'image93.png': [('image100.png', 0.9689957), ('image83.png', 0.9211924), ('image84.png', 0.9164334), ('image85.png', 0.94614804), ('image86.png', 0.9311488), ('image88.png', 0.9692812), ('image89.png', 0.9181183), ('image91.png', 0.94980544), ('image92.png', 0.9380702), ('image94.png', 0.9362874), ('image96.png', 0.9362149), ('image99.png', 0.930895)], 'image94.png': [('image100.png', 0.9400719), ('image83.png', 0.9406297), ('image84.png', 0.9218082), ('image85.png', 0.94766164), ('image86.png', 0.94426256), ('image87.png', 0.9416356), ('image88.png', 0.94958806), ('image89.png', 0.96665895), ('image90.png', 0.9209361), ('image91.png', 0.91421396), ('image92.png', 0.96137464), ('image93.png', 0.9362874), ('image95.png', 0.9345614), ('image96.png', 0.91518164), ('image97.png', 0.9218683), ('image98.png', 0.926751), ('image99.png', 0.94931686)], 'image95.png': [('image83.png', 0.95429873), ('image84.png', 0.9160108), ('image86.png', 0.9393131), ('image87.png', 0.90827584), ('image88.png', 0.90385324), ('image89.png', 0.9607937), ('image90.png', 0.9587287), ('image92.png', 0.91555953), ('image94.png', 0.9345614), ('image97.png', 0.92592674), ('image98.png', 0.94172406)], 'image96.png': [('image100.png', 0.947449), ('image101.png', 0.92620283), ('image84.png', 0.91751677), ('image85.png', 0.9661529), ('image86.png', 0.909492), ('image87.png', 0.9235568), ('image88.png', 0.94041604), ('image89.png', 0.90655136), ('image91.png', 0.97193605), ('image92.png', 0.92193484), ('image93.png', 0.9362149), ('image94.png', 0.91518164), ('image99.png', 0.93704534)], 'image97.png': [('image101.png', 0.9051003), ('image74.png', 0.91846454), ('image78.png', 0.9011791), ('image83.png', 0.90625954), ('image84.png', 0.90513825), ('image86.png', 0.9209131), ('image87.png', 0.94440794), ('image89.png', 0.9302972), ('image90.png', 0.932611), ('image94.png', 0.9218683), ('image95.png', 0.92592674), ('image98.png', 0.9679376)], 'image98.png': [('image101.png', 0.91163087), ('image83.png', 0.9336565), ('image84.png', 0.92819315), ('image85.png', 0.9093032), ('image86.png', 0.9426492), ('image87.png', 0.94083), ('image88.png', 0.911821), ('image89.png', 0.936584), ('image90.png', 0.94770867), ('image92.png', 0.9120532), ('image94.png', 0.926751), ('image95.png', 0.94172406), ('image97.png', 0.9679376), ('image99.png', 0.91562355)], 'image99.png': [('image100.png', 0.9547266), ('image101.png', 0.91708857), ('image77.png', 0.90046567), ('image83.png', 0.90938056), ('image84.png', 0.9185119), ('image85.png', 0.9608704), ('image86.png', 0.9371631), ('image87.png', 0.94340634), ('image88.png', 0.94753677), ('image89.png', 0.9141985), ('image91.png', 0.9469298), ('image92.png', 0.9409566), ('image93.png', 0.930895), ('image94.png', 0.94931686), ('image96.png', 0.93704534), ('image98.png', 0.91562355)]}  

This is a surprisingly long list of putative duplicate images identified by the CNN implementation which is benchmarked to be very good at finding near duplicates. The highest score for duplication is 0.9754471 and occurs twice. The two high scores are seen when image86.png is detected as a putative duplicate of image83.png and vice-versa. Now, it looks like we may have a hit for image duplication in this paper !!!

Manual inspection of this close hit will help resolve this issue better. The python package has the functionality to allow such inspection.

 # plot duplicates obtained for a given file using the duplicates dictionary  
 from imagededup.utils import plot_duplicates  
 image_to_check='image83.png'  
 plot_duplicates(image_dir=image_dir,duplicate_map=duplicates_cnn,filename=image_to_check)  

The above code will open a GUI interface (matplotlib) that shows the original image at the top center. All of the putative duplicate images are shown below this image with the scores mentioned in the brackets. Luckily for these faqeers, the images are not exact duplicates. However, the images are pretty darn close as they are screenshots of the IGV software depicting the RNAseq data for the same PLGRKT gene in very different tissues and species. These analyses suggest that none of the supplementary figures are exact duplicates.

No comments: