Muhammad Umair Tariq Chohan2025-10-022025-10-022020https://escholar.umt.edu.pk/handle/123456789/7763The models of image captioning usually follow a design which is an encoder and a decoder design which use pictures and highlight vectors as an addition to the encoder. Some calculations utilizes include vectors removed from the district proposition got from an item identifier. This study uses Object Relation Transformer, expanding this methodology by expressly joining data about the spatial connection between input distinguished articles through mathematical consideration. The results obtained by qualitative and quantitative approaches show the significance of such mathematical consideration for picture subtitling, prompting enhancements for all basic captioning measurements on the MS-COCO dataset.enChanging of objects into words using image captioningThesis