Have any question ? +44 2030 2627 92

ISSN: 2755-6190 | Open Access

Open Access Journal of Artificial Intelligence and Technology

Volume : 2 Issue : 1

Deep Learning for Amharic Image Captioning: Enhancing Ethiopian Cultural Heritage Accessibility with Optimized Models

Simachew Alamneh* and Mohammed Abebe

ABSTRACT
Visual comprehension in Artificial Intelligence (AI) enables machines to describe images in natural language. However, image captioning research has largely overlooked low-resource languages such as Amharic due to limited domain-specific datasets and linguistic complexity. To address these gaps, this study developed a deep learning-based Amharic image captioning model focused on Ethiopian cultural heritage to promote cultural preservation. A dataset of 4,258 cultural heritage images, each annotated with five expert-verified Amharic captions (21,290 captions in total), compiled from reputable sources, including UNESCO’s World Heritage List, Awaze Tours, and Visit Ethiopia. Text preprocessing handled Amharic’s morphological complexity through tokenization, stop-word removal, character normalization, and abbreviation expansion. The dataset is divided using a 70:15:15 training, validation, and testing split for balanced model evaluation. The proposed model employs a pre-trained ResNet50 encoder with a GRU decoder as the baseline architecture.
Performance is compared using attention and Transformer-based variations, evaluated with BLEU, ROUGE, METEOR, and CIDEr metrics. The ResNet50–GRU baseline with beam search (beam width = 3) achieved the best overall balance between accuracy and efficiency (BLEU-1: 0.5096, BLEU-4: 0.1196, METEOR: 0.2093, CIDEr: 0.2692) among the evaluated models. While the Transformer decoder generated richer captions, its higher computational cost makes the baseline model more suitable for mobile and resource-limited applications. This research demonstrates the potential of deep learning for Amharic image captioning and emphasizes the importance of high-quality datasets and efficient architectures for low-resource languages and cultural heritage preservation.

JOURNAL INDEXING