A case study for contextualised image captioning uning foundation models: journalism enhancement with AI

Large language models (LLMs) and large multimodal models (LMMs) have significantly impacted the AI community, industry, and various economic sectors. In journalism, integrating AI poses unique challenges and opportunities, particularly in enhancing the quality and efficiency of news reporting. This study explores how LLMs and LMMs can assist journalistic practice Read more…

Towards self-improving scene understanding with vision-language knowledge integration

Image captioning has seen immense progress in the last few years. However, general-purpose systems often fail to provide personalised, context-aware captions tailored to individual users or domains. In this work, we investigate the task of personalised and contextualised image captioning by leveraging foundational models, including large language models (LLMs) and Read more…