Detailseite
Projekt Druckansicht

Sprachübergreifendes Maschinelles Lernen für Patent-Suche, Phase 2: Leicht überwachtes Lernen sprachübergreifender Systeme

Fachliche Zuordnung Bild- und Sprachverarbeitung, Computergraphik und Visualisierung, Human Computer Interaction, Ubiquitous und Wearable Computing
Allgemeine und Vergleichende Sprachwissenschaft, Experimentelle Linguistik, Typologie, Außereuropäische Sprachen
Förderung Förderung von 2012 bis 2019
Projektkennung Deutsche Forschungsgemeinschaft (DFG) - Projektnummer 211613886
 
Erstellungsjahr 2018

Zusammenfassung der Projektergebnisse

Effective information search across languages is a key problem in today’s information society. For example, cross-lingual patent prior art search is an important tool to determine a patent’s novelty and to avoid patent infringement. For high accuracy, machine learning approaches require costly manual annotation of supervision signals such as relevance links across languages for crosslingual retrieval. We could show that cross-lingual rankings can be learned directly from data that are weakly supervised, but are not strictly parallel. Such weak supervision signals can be relevance indicators such as citations in patents or hyperlinks in Wikipedia pages. Our project showed that similar techniques can be successfully applied to optimize cross-lingual retrieval and to train machine translation systems on massive non-parallel data.

Projektbezogene Publikationen (Auswahl)

  • (2012). Joint feature selection in distributed stochastic learning for large-scale discriminative training in SMT. In Proc. of the Meeting of the Association for Computational Linguistics (ACL), Jeju Island, South Korea
    Simianer, P., Riezler, S., and Dyer, C.
  • (2012). Structural and topical dimensions in multi-task patent translation. In Proc. of the Conference of the European Association for Computational Linguistics (EACL), Avignon, France
    Wäschle, K. and Riezler, S.
  • (2013). Boosting cross-language retrieval by learning bilingual phrase associations from relevance rankings. In Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Seattle, Washington, USA
    Sokolov, A., Jehl, L., Hieber, F., and Riezler, S.
  • (2013). Generative and discriminative methods for online adaptation in SMT. In Proc. of the Machine Translation Summit (MTSummit), Nice, France
    Wäschle, K., Simianer, P., Bertoldi, N., Riezler, S., and Federico, M.
  • (2014). Learning translational and knowledge-based similarities from relevance rankings for cross-language retrieval. In Proc. of the Association of Computational Linguistics (ACL), Baltimore, USA
    Schamoni, H., Hieber, F., Sokolov, A., and Riezler, S.
    (Siehe online unter https://dx.doi.org/10.3115/v1/P14-2080)
  • (2014). Online adaptation to post-edits for phrase-based statistical machine translation. Machine Translation, 28:309–339
    Bertoldi, N., Simianer, P., Cettolo, M., Wäschle, K., Federico, M., and Riezler, S.
    (Siehe online unter https://doi.org/10.1007/s10590-014-9159-7)
 
 

Zusatzinformationen

Textvergrößerung und Kontrastanpassung