publications

Publications by categories in reversed chronological order. Generated by jekyll-scholar.

2024

  1. Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models
    Orion Weller, Benjamin Van Durme, Dawn Lawrie, Ashwin Paranjape, Yuhao Zhang, and Jack Hessel
    In 2024
  2. A Waterlog for Detecting and Tracing Synthetic Text from Large Language Models
    Brennon Brimhall,  Orion Weller, Matthew Green, and Ian Miers
    In 2024
  3. FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
    Orion Weller, Benjamin Chang, Sean MacAvaneyKyle LoArman Cohan, Benjamin Van Durme, Dawn Lawrie, and Luca Soldaini
    In arXiv 2024
  4. Dated Data: Tracing Knowledge Cutoffs in Large Language Models
    Jeffrey ChengMarc Marone Orion WellerDawn LawrieDaniel Khashabi, and Benjamin Van Durme
    In Conference on Language Models (CoLM) 2024
  5. CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation
    Abe Bohan Hou,  Orion Weller, Guanghui Qin, Eugene YangDawn Lawrie, Nils Holzenberger, Andrew Blair-Stanek, and Benjamin Van Durme
    In 2024
  6. On the Evaluation of Machine-Generated Reports
    James Mayfield, Eugene YangDawn LawrieSean MacAvaney, Paul McNamee, Douglas W Oard, Luca Soldaini, Ian Soboroff,  Orion Weller, Efsun Kayi, and others
    In SIGIR 2024
  7. Learning to Reason via Program Generation, Emulation, and Search
    Nathaniel Weir, Muhammad Khalifa, Linlu Qiu,  Orion Weller, and Peter Clark
    In arXiv 2024
  8. Self-[In]correct: LLMs Struggle with Refining Self-Generated Responses
    Dongwei Jiang, Jingyu Zhang Orion WellerNathaniel Weir, Benjamin Van Durme, and Daniel Khashabi
    In arXiv 2024
  9. Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic
    Nathaniel Weir, Kate Sanders,  Orion Weller, Shreya Sharma, Dongwei Jiang, Zhengping Zhang, Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Jansen, Peter Clark, and others
    In Empircal Methods in Natural Language Processing (ENNLP) 2024
  10. NevIR: Negation in Neural Information Retrieval
    Orion WellerDawn Lawrie, and Benjamin Van Durme
    In European Chapter of the Association for Computational Linguistics (EACL) 2024
  11. When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets
    Orion WellerKyle LoDavid WaddenDawn Lawrie, Benjamin Van Durme, Arman Cohan, and Luca Soldaini
    In Findings of the European Chapter of the Association for Computational Linguistics (EACL) 2024
  12. “According to..." Prompting Language Models Improves Quoting from Pre-Training Data
    In European Chapter of the Association for Computational Linguistics (EACL) 2024
  13. Defending Against Misinformation Attacks in Open-Domain Question Answering
    In European Chapter of the Association for Computational Linguistics (EACL) 2024

2023

  1. When Do Decompositions Help for Machine Reading?
    In Empirical Methods in Natural Language Processing (EMNLP) 2023
  2. Synthetic Cross-language Information Retrieval Training Data
    James Mayfield, Eugene YangDawn Lawrie, Samuel Barham,  Orion Weller, Marc Mason, Suraj Nair, and Scott Miller
    Preprint 2023
  3. MegaWika: Millions of reports and their sources across 50 diverse languages
    Samuel Barham,  Orion Weller, Michelle Yuan, Kenton Murray, Mahsa Yarmohammadi, Zhengping Jiang, Siddharth Vashishtha, Alexander Martin, Anqi Liu, Aaron Steven White, and others
    Preprint 2023

2022

  1. Pretrained Models for Multilingual Federated Learning
    In North American Chapter of the Association for Computational Linguistics (NAACL) 2022
  2. When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning
    Orion WellerKevin Seppi, and Matt Gardner
    In Association of Computational Linguistics (ACL) 2022
  3. End-to-End Speech Translation for Code Switched Speech
    In Findings of the Association of Computational Linguistics (ACL) 2022

2021

  1. Exploring the Relationship Between Algorithm Performance, Vocabulary, and Run-Time in Text Classification
    Wilson Fearn Orion Weller, and Kevin Seppi
    In North American Chapter of the Association for Computational Linguistics (NAACL) 2021
  2. Streaming Joint Speech Translation and Transcription
    In European Chapter of the Association for Computational Linguistics (EACL) 2021
  3. Predicting Mental Health and Suicidal Ideation Among Adolescents Using the Risk and Protective Factor Framework: A Large Scale Machine Learning Approach
    Orion WellerLuke SagersCarl HansonQuinn Snell, Michael Barnes, and Shannon Tass
    PLoS One 2021

2020

  1. Learning from Task Descriptions
    Empirical Methods in Natural Language Processing (EMNLP) 2020
  2. You Don’t Have Time to Read This: an Exploration of Document Level Reading Time Prediction
    Orion Weller, Jordan Hildebrandt, Ilya Reznik, Christopher Challis, E. Shannon Tass, Quinn Snell, and Kevin Seppi
    Association of Computational Linguistics (ACL) 2020
  3. The rJokes Dataset: a Large Scale Humor Collection
    Orion Weller, and Kevin Seppi
    Language Resources and Evaluation (LREC) 2020
  4. Can Humor Prediction Datasets be used for Humor Generation? Humorous Headline Generation via Style Transfer
    Orion WellerNancy Fulda, and Kevin Seppi
    Second Workshop on Figurative Language Processing @ ACL 2020 2020

2019

  1. Humor Detection: A Transformer gets the Last Laugh
    Orion Weller, and Kevin Seppi
    Empirical Methods in Natural Language Processing (EMNLP) 2019