Kategorie główne

• Nauka

[2953081]

• Literatura piękna

[1807186]

więcej...

Kategorie szczegółowe BISAC

EXTRACTING PARALLEL PHRASES FROM ENGLISH-PUNJABI CORPORA

ISBN-13: 9786208225414 / Angielski / Miękka / 2024 / 204 str.

Manpreet Singh Lehal

EXTRACTING PARALLEL PHRASES FROM ENGLISH-PUNJABI CORPORA

ISBN-13: 9786208225414 / Angielski / Miękka / 2024 / 204 str.

Manpreet Singh Lehal

cena 363,27
(netto: 345,97 VAT: 5%)

Najniższa cena z 30 dni: 356,56

Termin realizacji zamówienia:
ok. 10-14 dni roboczych.

Darmowa dostawa!

This study presents a novel approach to extract parallel data from a comparable English-Punjabi corpus, addressing the scarcity of parallel corpora for this language pair. Unlike previous research, this approach focuses on creating high-precision parallel data using minimal resources. The data is sourced from diverse domains, including Wikipedia articles, TDIL's noisy parallel sentences, and Gyan Nidhi reports. The methodology consists of three phases: extracting and aligning documents, translating Punjabi texts into English using OpenNMT-py, and calculating content similarity through three measures-Euclidean Distance, Cosine, and Jaccard. These algorithms are run individually, and then their results are integrated to improve accuracy. By combining the scores of all three measures, the system achieves a precision of 93% and an accuracy of 86%. This integrated approach significantly enhances parallel data extraction for English-Punjabi corpora and holds potential for improving Statistical Machine Translation (SMT) models.

Krainaksiazek.pl w programie rzetelna firma

Krainaksiaze.pl - płatności przez paypal

Czytaj nas na:

Zobacz:

1997-2026 DolnySlask.com Agencja Internetowa