dc.date.accessioned |
2024-01-07T22:38:24Z |
|
dc.date.available |
2024-01-07T22:38:24Z |
|
dc.date.issued |
2024-01-08 |
|
dc.identifier.uri |
https://journal.uob.edu.bh:443/handle/123456789/5309 |
|
dc.description.abstract |
Multilingual image-based text recognition is
a tough problem with several practical applications. This
work suggests an integrated ViT-YOLO model which
integrates the strengths of the Vision Transformer (ViT)
and You Only Look Once (YOLO) techniques to solve
this challenge. The goal of the model is to correctly
identify text in pictures with text in many languages. The
ViT-YOLO model uses YOLO to locate text sections in
pictures using patch extraction. Taking use of its robust
image-understanding capabilities, the ViT model
processes the derived patches for text recognition. To
enhance the model's performance and robustness, a
Generative Adversarial Network (GAN) is integrated for
data augmentation. Experimental results demonstrate the
superiority of the ViT-YOLO model over traditional
methods and other deep learning models, achieving an
impressive accuracy of 93.49%. These findings
demonstrate that the proposed ViT-YOLO model holds
significant promise in addressing multilingual text
recognition challenges and paves the way for future
advancements in multilingual image-based text
recognition. |
en_US |
dc.language.iso |
en |
en_US |
dc.publisher |
Unversity of Bahrain |
en_US |
dc.subject |
Text recognition, Vision Transformer (ViT), You Only Look Once (YOLO), Generative Adversarial Network (GAN), multilingual-text recognition. |
en_US |
dc.title |
A vision transformer model for multilingual image-based text recognition |
en_US |
dc.identifier.doi |
10.12785/ijcds/xxxxxx |
|
dc.volume |
15 |
en_US |
dc.issue |
1 |
en_US |
dc.pagestart |
1 |
en_US |
dc.pageend |
13 |
en_US |
dc.source.title |
International Journal of Computing and Digital Systems |
en_US |
dc.abbreviatedsourcetitle |
IJCDS |
en_US |