Amir Hossein Kargaran

Profile

I’m a Computer Science Ph.D. student at Munich University, advised by Prof. Hinrich Schütze. I’m also affiliated as a junior member with the Munich Center for Machine Learning.

My current research focuses on Multilingual NLP, specifically on scaling NLP technologies to include more languages:

  • GlotCC: CommonCrawl corpus for more than 1,000 languages.

  • MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment for many langauges.

Before starting my Ph.D., I completed my MSc degree in Computer Engineering at Sharif University of Technology in 2022. I earned two BSc degrees in Electrical Engineering and Computer Engineering from Isfahan University of Technology in 2020. Additionally, I was a research intern with the User Interfaces group at Aalto University during the summers of 2021 and 2022. I contributed to the development of the Aalto Interface Metrics (AIM) project, a service and codebase for computational GUI evaluation.

I am currently looking for a summer internship. If you know of any opportunities or would like to discuss potential collaborations, feel free to reach out!

Publications

GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages

GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages

Amir Hossein Kargaran, Franccois Yvon, Hinrich Schutze

Neural Information Processing Systems 2024

MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment

Amir Hossein Kargaran, Ali Modarressi, Nafiseh Nikeghbal, Jana Diesner, Franccois Yvon, Hinrich Schutze

arXiv.org 2024

How Transliterations Improve Crosslingual Alignment

How Transliterations Improve Crosslingual Alignment

Yihong Liu, Mingyang Wang, Amir Hossein Kargaran, Ayyoob Imani, Orgest Xhelili, Haotian Ye, Chunlan Ma, Franccois Yvon, Hinrich Schutze

International Conference on Computational Linguistics 2024

MaskLID: Code-Switching Language Identification through Iterative Masking

MaskLID: Code-Switching Language Identification through Iterative Masking

Amir Hossein Kargaran, Franccois Yvon, Hinrich Schutze

Annual Meeting of the Association for Computational Linguistics 2024

GIRT-Model: Automated Generation of Issue Report Templates

GIRT-Model: Automated Generation of Issue Report Templates

Nafiseh Nikeghbal, Amir Hossein Kargaran, Abbas Heydarnoori

IEEE Working Conference on Mining Software Repositories 2024

GlotLID: Language Identification for Low-Resource Languages

GlotLID: Language Identification for Low-Resource Languages

Amir Hossein Kargaran, Ayyoob Imani, François Yvon, Hinrich Schütze

Conference on Empirical Methods in Natural Language Processing 2023

GlotScript: A Resource and Tool for Low Resource Writing System Identification

GlotScript: A Resource and Tool for Low Resource Writing System Identification

Amir Hossein Kargaran, François Yvon, Hinrich Schütze

International Conference on Language Resources and Evaluation 2023

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages

Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages

Ayyoob Imani, Peiqin Lin, Amir Hossein Kargaran, Silvia Severini, Masoud Jalili Sabet, Nora Kassner, Chunlan Ma, Helmut Schmid, André F. T. Martins, François Yvon, Hinrich Schütze

Annual Meeting of the Association for Computational Linguistics 2023

GIRT-Data: Sampling GitHub Issue Report Templates

GIRT-Data: Sampling GitHub Issue Report Templates

Nafiseh Nikeghbal, Amir Hossein Kargaran, A. Heydarnoori, Hinrich Schutze

IEEE Working Conference on Mining Software Repositories 2023

MenuCraft: Interactive Menu System Design with Large Language Models

Amir Hossein Kargaran, Nafiseh Nikeghbal, A. Heydarnoori, Hinrich Schütze

arXiv.org 2023

On Detecting Hidden Third-Party Web Trackers with a Wide Dependency Chain Graph: A Representation Learning Approach

On Detecting Hidden Third-Party Web Trackers with a Wide Dependency Chain Graph: A Representation Learning Approach

Amir Hossein Kargaran, Mohammad Sadegh Akhondzadeh, M. R. Heidarpour, M. Manshaei, Kave Salamatian, Masoud Nejad Sattary

arXiv.org 2020

Wide-AdGraph: Detecting Ad Trackers with a Wide Dependency Chain Graph

Wide-AdGraph: Detecting Ad Trackers with a Wide Dependency Chain Graph

Amir Hossein Kargaran, Mohammad Sadegh Akhondzadeh, M. R. Heidarpour, M. Manshaei, Kave Salamatian, Masoud Nejad Sattary

Web Science Conference 2020

Analytical Derivation and Comparison of Alarm Similarity Measures

Analytical Derivation and Comparison of Alarm Similarity Measures

Amir Hossein Kargaran, Amir Neshastegaran, I. Izadi, Ehsan Yazdian

IFAC-PapersOnLine 2020

Analytical Derivation and Comparison of Alarm Similarity Analysis Methods

Analytical Derivation and Comparison of Alarm Similarity Analysis Methods

Amir Hossein Kargaran, Amir Neshastegaran, I. Izadi, Ehsan Yazdian

arXiv.org 2020

Hengam: An Adversarially Trained Transformer for Persian Temporal Tagging

Hengam: An Adversarially Trained Transformer for Persian Temporal Tagging

Sajad Mirzababaei, Amir Hossein Kargaran, Hinrich Schütze, Ehsaneddin Asgari

AACL 2022