Publications
* denotes equal contribution
Preprints
2024
- arXivMedian Optimal Treatment RegimesLiu Leqi, and Edward H KennedyarXiv preprint arXiv:2103.01802, 2024
Optimal treatment regimes are personalized policies for making a treatment decision based on subject characteristics, with the policy chosen to maximize some value. It is common to aim to maximize the mean outcome in the population, via a regime assigning treatment only to those whose mean outcome is higher under treatment versus control. However, the mean can be an unstable measure of centrality, resulting in imprecise statistical procedures, as well as unrobust decisions that can be overly influenced by a small fraction of subjects. In this work, we propose a new median optimal treatment regime that instead treats individuals whose conditional median is higher under treatment. This ensures that optimal decisions for individuals from the same group are not overly influenced either by (i) a small fraction of the group (unlike the mean criterion), or (ii) unrelated subjects from different groups (unlike marginal median/quantile criteria). We introduce a new measure of value, the Average Conditional Median Effect (ACME), which summarizes across-group median treatment outcomes of a policy, and which the median optimal treatment regime maximizes. After developing key motivating examples that distinguish median optimal treatment regimes from mean and marginal median optimal treatment regimes, we give a nonparametric efficiency bound for estimating the ACME of a policy, and propose a new doubly robust-style estimator that achieves the efficiency bound under weak conditions. To construct the median optimal treatment regime, we introduce a new doubly robust-style estimator for the conditional median treatment effect. Finite-sample properties are explored via numerical simulations and the proposed algorithm is illustrated using data from a randomized clinical trial in patients with HIV.
- arXivPersonalized Language Modeling from Personalized Human FeedbackXinyu Li*, Zachary Lipton, and Liu Leqi*arXiv preprint arXiv:2402.05133, 2024
Reinforcement Learning from Human Feedback (RLHF) is the current dominating framework to fine-tune large language models to better align with human preferences. However, the underlying premise of algorithms developed under this framework can be problematic when user preferences encoded in human feedback are diverse. In this work, we aim to address this problem by developing methods for building personalized language models. We first formally introduce the task of learning from personalized human feedback and explain why vanilla RLHF can be problematic in this context. We then propose a general Personalized-RLHF (P-RLHF) framework, which requires one to jointly learn a user model and a language (or reward) model. The user model takes in user information and outputs user representations. Its structure encodes our assumptions about user preferences underlying the feedback data. We develop new learning objectives for personalized reward modeling and personalized Direct Preference Optimization. To demonstrate the efficacy of our method, we test it on real-world text summarization data with annotated preferences and annotator information. We fine-tune GPT-J 6B to obtain personalized language (and reward) models, which outperform non-personalized models in terms of aligning with individual preferences.
- arXivA Unified Causal Framework for Auditing Recommender SystemsVibhhu Sharma*, Shantanu Gupta, Nil-Jana Akpinar, Zachary Lipton, and Liu Leqi*Coming soon, 2024
As recommender systems become widely deployed in different domains, they increasingly influence their users’ beliefs and preferences. Auditing recommender systems is crucial as it not only ensures the continuous improvement of recommendation algorithms but also safeguards against potential pitfalls like biases and ethical concerns. In this paper, we view recommender system auditing from a causal lens and provide a general recipe for defining auditing metrics. Under this general causal auditing framework, we categorize existing auditing metrics, and identify gaps in them—the lack of metrics in auditing user agency while accounting for the dynamics of the recommendation process. We leverage our framework and propose two classes of such metrics: future- and past-reacheability and stability. We provide both a gradient-based and a black-box approach for computing these metrics, allowing the auditor to compute them under different levels of access to the recommender system. In our experiments, we demonstrate the efficacy of methods for computing the proposed metrics and inspect the design of recommender systems through these proposed metrics.
- arXivUsing Deep Reinforcement Learning to Promote Sustainable Human Behaviour on a Common Pool Resource ProblemRaphael Koster*, Miruna Pislar*, Andrea Tacchetti, Jan Balaguer, Liu Leqi, Romuald Elie, Oliver P Hauser, Karl Tuyls, Matt Botvinick, and Christopher SummerfieldarXiv preprint arXiv:2404.15059, 2024
A canonical social dilemma arises when finite resources are allocated to a group of people, who can choose to either reciprocate with interest, or keep the proceeds for themselves. What resource allocation mechanisms will encourage levels of reciprocation that sustain the commons? Here, in an iterated multiplayer trust game, we use deep reinforcement learning (RL) to design an allocation mechanism that endogenously promotes sustainable contributions from human participants to a common pool resource. We first trained neural networks to behave like human players, creating a stimulated economy that allowed us to study how different mechanisms influenced the dynamics of receipt and reciprocation. We then used RL to train a social planner to maximise aggregate return to players. The social planner discovered a redistributive policy that led to a large surplus and an inclusive economy, in which players made roughly equal gains. The RL agent increased human surplus over baseline mechanisms based on unrestricted welfare or conditional cooperation, by conditioning its generosity on available resources and temporarily sanctioning defectors by allocating fewer resources to them. Examining the AI policy allowed us to develop an explainable mechanism that performed similarly and was more popular among players. Deep reinforcement learning can be used to discover mechanisms that promote sustainable human behaviour.
- arXivSteering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing FrameworkJingling Li, Zeyu Tang, Xiaoyu Liu, Peter Spirtes, Kun Zhang, Liu Leqi, and Yang LiuarXiv preprint arXiv:2403.08743, 2024
Large language models (LLMs) can easily generate biased and discriminative responses. As LLMs tap into consequential decision-making (e.g., hiring and healthcare), it is of crucial importance to develop strategies to mitigate these biases. This paper focuses on social bias, tackling the association between demographic information and LLM outputs. We propose a causality-guided debiasing framework that utilizes causal understandings of (1) the data-generating process of the training corpus fed to LLMs, and (2) the internal reasoning process of LLM inference, to guide the design of prompts for debiasing LLM outputs through selection mechanisms. Our framework unifies existing de-biasing prompting approaches such as inhibitive instructions and in-context contrastive examples, and sheds light on new ways of debiasing by encouraging bias-free reasoning. Our strong empirical performance on real-world datasets demonstrates that our framework provides principled guidelines on debiasing LLM outputs even with only the black-box access.
- arXivAccounting for AI and Users Shaping One Another: The Role of Mathematical ModelsSarah Dean*, Evan Dong*, Meena Jagadeesan*, and Liu Leqi*arXiv preprint arXiv:2404.12366, 2024
As AI systems enter into a growing number of societal domains, these systems increasingly shape and are shaped by user preferences, opinions, and behaviors. However, the design of AI systems rarely accounts for how AI and users shape one another. In this position paper, we argue for the development of formal interaction models which mathematically specify how AI and users shape one another. Formal interaction models can be leveraged to (1) specify interactions for implementation, (2) monitor interactions through empirical analysis, (3) anticipate societal impacts via counterfactual analysis, and (4) control societal impacts via interventions. The design space of formal interaction models is vast, and model design requires careful consideration of factors such as style, granularity, mathematical complexity, and measurability. Using content recommender systems as a case study, we critically examine the nascent literature of formal interaction models with respect to these use-cases and design axes. More broadly, we call for the community to leverage formal interaction models when designing, evaluating, or auditing any AI system which interacts with users.
Publications
2024
- PNASDeep Mechanism Design: Learning Social and Economic Policies for Human BenefitAndrea Tacchetti, Raphael Koster, Jan Balaguer, Liu Leqi, Miruna Pislar, Matthew M Botvinick, Karl Tuyls, David C Parkes, and Christopher SummerfieldProceedings of the National Academy of Sciences (PNAS), 2024
Human society is coordinated by mechanisms that control how prices are agreed, taxes are set, and electoral votes are tallied. The design of robust and effective mechanisms for human benefit is a core problem in the social, economic and political sciences. Here, we discuss the recent application of modern tools from AI research, including deep neural networks trained with reinforcement learning (RL), to create more desirable mechanisms for people. We review the application of machine learning to design effective auctions, learn optimal tax policies, and discover redistribution policies that win the popular vote among human users. We discuss the challenge of accurately modelling human preferences, and the problem of aligning a mechanism to the wishes of a potentially diverse group. We highlight the importance of ensuring that research into ‘deep mechanism design’ is conducted safely and ethically.
2023
- HCOMPA Taxonomy of Human and ML Strengths in Decision-Making to Investigate Human-ML ComplementarityLiu Leqi*, Charvi Rastogi*, Kenneth Holstein, and Hoda HeidariAAAI Conference on Human Computation and Crowdsourcing (HCOMP), 2023
Hybrid human-ML systems increasingly make consequential decisions in a wide range of domains. These systems are often introduced with the expectation that the combined human-ML system will achieve complementary performance, that is, the combined decision-making system will be an improvement compared with either decision-making agent in isolation. However, empirical results have been mixed, and existing research rarely articulates the sources and mechanisms by which complementary performance is expected to arise. Our goal in this work is to provide conceptual tools to advance the way researchers reason and communicate about human-ML complementarity. Drawing upon prior literature in human psychology, machine learning, and human-computer interaction, we propose a taxonomy characterizing distinct ways in which human and ML-based decision-making can differ. In doing so, we conceptually map potential mechanisms by which combining human and ML decision-making may yield complementary performance, developing a language for the research community to reason about design of hybrid systems in any decision-making domain. To illustrate how our taxonomy can be used to investigate complementarity, we provide a mathematical aggregation framework to examine enabling conditions for complementarity. Through synthetic simulations, we demonstrate how this framework can be used to explore specific aspects of our taxonomy and shed light on the optimal mechanisms for combining human-ML judgments.
- CHIA Field Test of Bandit Algorithms for Recommendations: Understanding the Validity of Assumptions on Human Preferences in Multi-armed BanditsLiu Leqi*, Giulio Zhou*, Fatma Kilinc Karzan, Zachary Lipton, and Alan MontgomeryACM CHI Conference on Human Factors in Computing Systems (CHI), 2023
Personalized recommender systems suffuse modern life, shaping what media we read and what products we consume. Algorithms powering such systems tend to consist of supervised-learning-based heuristics, such as latent factor models with a variety of heuristically chosen prediction targets. Meanwhile, theoretical treatments of recommendation frequently address the decision-theoretic nature of the problem, including the need to balance exploration and exploitation, via the multi-armed bandits (MABs) framework. However, MAB-based approaches rely heavily on assumptions on human preferences. These preference assumptions are seldom tested using human subject studies, partly due to the lack of publicly available toolkits to conduct such studies. In this work, we conduct a study with crowdworkers in a comics recommendation MABs setting. Each arm represents a comic category, and users provide feedback after each recommendation. We check the validity of core MABs assumptions—that human preferences (reward distributions) are fixed over time—and find that they do not hold. This finding suggests that any MAB algorithm used for recommender systems should account for human preference dynamics. While answering these questions, we provide a flexible experimental framework for understanding human preference dynamics and testing MABs algorithms with human users.
- Ph.D. ThesisHuman-Centered Machine Learning: A Statistical and Algorithmic PerspectiveLiu LeqiCarnegie Mellon University, 2023
Building artificial intelligence systems from a human-centered perspective is increasingly urgent, as large-scale machine learning systems ranging from personalized recommender systems to language and image generative models are deployed to interact with people daily. In this thesis, we propose a guideline for building these systems from a human-centered perspective. Our guideline contains three steps: (i) identifying the role of the people of interest and their core characteristics concerned in the learning task; (ii) modeling these characteristics in a useful and reliable manner; and (iii) incorporating these models into the design of learning algorithms in a principled way. We ground this guideline in two applications: personalized recommender systems and decision-support systems. For recommender systems, we follow the guideline by (i) focusing on users’ evolving preferences, (ii) modeling them as dynamical systems, and (iii) developing efficient online learning algorithms with provable guarantees to interact with users sharing different preference dynamics. For decision-support systems, we (i) choose decision-makers’ risk preferences to be the core characteristics of concern, (ii) model them in the objective function of the system, and (iii) provide a general procedure with statistical guarantees for learning models under diverse risk preferences. We conclude by discussing the future of human-centered machine learning and the role of interdisciplinary research in this field.
2022
- ICMLSupervised Learning with General Risk FunctionalsLiu Leqi, Audrey Huang, Zachary Lipton, and Kamyar AzizzadenesheliIn International Conference on Machine Learning (ICML), 2022
Standard uniform convergence results bound the generalization gap of the expected loss over a hypothesis class. The emergence of risk-sensitive learning requires generalization guarantees for functionals of the loss distribution beyond the expectation. While prior works specialize in uniform convergence of particular functionals, our work provides uniform convergence for a general class of Hölder risk functionals for which the closeness in the Cumulative Distribution Function (CDF) entails closeness in risk. We establish the first uniform convergence results for estimating the CDF of the loss distribution, yielding guarantees that hold simultaneously both over all Hölder risk functionals and over all hypotheses. Thus licensed to perform empirical risk minimization, we develop practical gradient-based methods for minimizing distortion risks (widely studied subset of Hölder risks that subsumes the spectral risks, including the mean, conditional value at risk, cumulative prospect theory risks, and others) and provide convergence guarantees. In experiments, we demonstrate the efficacy of our learning procedure, both in settings where uniform convergence results hold and in high-dimensional settings with deep networks.
- ICMLAction-Sufficient State Representation Learning for Control with Structural ConstraintsBiwei Huang, Chaochao Lu, Liu Leqi, José Miguel Hernández-Lobato, Clark Glymour, Bernhard Schölkopf, and Kun ZhangInternational Conference on Machine Learning (ICML), 2022
Perceived signals in real-world scenarios are usually high-dimensional and noisy, and finding and using their representation that contains essential and sufficient information required by downstream decision-making tasks will help improve computational efficiency and generalization ability in the tasks. In this paper, we focus on partially observable environments and propose to learn a minimal set of state representations that capture sufficient information for decision-making, termed \textitAction-Sufficient state Representations (ASRs). We build a generative environment model for the structural relationships among variables in the system and present a principled way to characterize ASRs based on structural constraints and the goal of maximizing cumulative reward in policy learning. We then develop a structured sequential Variational Auto-Encoder to estimate the environment model and extract ASRs. Our empirical results on CarRacing and VizDoom demonstrate a clear advantage of learning and using ASRs for policy learning. Moreover, the estimated environment model and ASRs allow learning behaviors from imagined outcomes in the compact latent space to improve sample efficiency.
- AISTATSOff-Policy Risk Assessment for Markov Decision ProcessesAudrey Huang, Liu Leqi, Zachary C Lipton, and Kamyar AzizzadenesheliInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2022
Addressing such diverse ends as mitigating safety risks, aligning agent behavior with human preferences, and improving the efficiency of learning, an emerging line of reinforcement learning research addresses the entire distribution of returns and various risk functionals that depend upon it. In the contextual bandit setting, recently work on off-policy risk assessment estimates the target policy’s CDF of returns, providing finite sample guarantees that extend to (and hold simultaneously over) plugin estimates of an arbitrarily large set of risk functionals. In this paper, we lift OPRA to Markov decision processes (MDPs), where importance sampling (IS) CDF estimators suffer high variance on longer trajectories due to vanishing (and exploding) importance weights. To mitigate these problems, we incorporate model-based estimation to develop the first doubly robust (DR) estimator for the CDF of returns in MDPs. The DR estimator enjoys significantly less variance and, when the model is well specified, achieves the Cramer-Rao variance lower bound. Moreover, for many risk functionals, the downstream estimates enjoy both lower bias and lower variance. Additionally, we derive the first minimax lower bounds for off-policy CDF and risk estimation, which match our error bounds up to a constant. Finally, we demonstrate the efficacy of our DR CDF estimates experimentally on several different environments.
- ICWSMMany Ways to be Lonely: Fine-grained Characterization of Loneliness and its Potential Changes in COVID-19Yueyi Jiang, Yunfan Jiang, Liu Leqi, and Piotr WinkielmanInternational AAAI Conference on Web and Social Media (ICWSM), 2022
Loneliness has been associated with negative outcomes for physical and mental health. Understanding how people express and cope with various forms of loneliness is critical for early screening and targeted interventions to reduce loneliness, particularly among vulnerable groups such as young adults. To examine how different forms of loneliness and coping strategies manifest in loneliness self-disclosure, we built a dataset, FIG-Loneliness (FIne-Grained Loneliness) by using Reddit posts in two young adult-focused forums and two loneliness related forums consisting of a diverse age group. We provide annotations by trained human annotators for binary and fine-grained loneliness classifications of the posts. Trained on FIG-Loneliness, two BERT-based models were used to understand loneliness forms and authors’ coping strategies in these forums. Our binary loneliness classification achieved an accuracy above 97%, and fine-grained loneliness category classification reached an average accuracy of 77% across all labeled categories. With FIG-Loneliness and model predictions, we found that loneliness expressions in the young adult related forums are distinct from other forums. Those in young adult-focused forums are more likely to express concerns pertaining to peer relationship, and are potentially more sensitive to geographical isolation impacted by the COVID-19 pandemic lockdown. Also, we show that different forms of loneliness have differential use in coping strategies. For example, those who experienced transient and social loneliness had stronger desires to reach out and connect with others, whereas individuals who experienced physical and romantic loneliness were more likely to seek advice from others.
- AAAIModeling Attrition in Recommender Systems with Departing BanditsOmer Ben-Porat*, Lee Cohen*, Liu Leqi*, Zachary C Lipton, and Yishay MansourAAAI Conference on Artificial Intelligence (AAAI), 2022
Traditionally, when recommender systems are formalized as multi-armed bandits, the policy of the recommender system in- fluences the rewards accrued, but not the length of interaction. However, in real-world systems, dissatisfied users may depart (and never come back). In this work, we propose a novel multi- armed bandit setup that captures such policy-dependent hori- zons. Our setup consists of a finite set of user types, and multi- ple arms with Bernoulli payoffs. Each (user type, arm) tuple corresponds to an (unknown) reward probability. Each user’s type is initially unknown and can only be inferred through their response to recommendations. Moreover, if a user is dis- satisfied with their recommendation, they might depart the system. We first address the case where all users share the same type, demonstrating that a recent UCB-based algorithm is optimal. We then move forward to the more challenging case, where users are divided among two types. While naive approaches cannot handle this setting, we provide an efficient learning algorithm that achieves \tildeO(√T) regret, where T is the number of users.
- WorkshopRiskyZoo: A Library for Risk-Sensitive Supervised LearningWilliam Wong, Audrey Huang, Liu Leqi, Kamyar Azizzadenesheli, and Zachary C LiptonICML workshop on Responsible Decision Making in Dynamic Environments, 2022
Supervised learning models are increasingly used in algorithmic decision-making. The traditional assumption on the training and testing data being independently and identically distributed is often violated in practical learning settings, due to distribution shifts. To mitigate the effects of such nonstationarities, risk-sensitive learning is proposed to train models under different (risk) functionals beyond the expected loss. For example, learning under the conditional value-at-risk of the losses is equivalent to training a model under a particular type of worst-case distribution shift. While many risk functionals and learning procedures have been proposed, their implementations are either nonexistent or in individualized repositories. With no common implementations and baseline test beds, it is difficult to decide which risk functionals and learning procedures to use. To address this, we introduce a library (RiskyZoo) for risk-sensitive supervised learning. The library contains implementations of risk-sensitive learning objectives and optimization procedures that can be used as add-ons to the PyTorch library. We also provide datasets to compare these learning methods. We demonstrate usage of our library through comparing models learned under different risk objectives, optimization performances of different methods for a single objective, and risk assessments of pretrained ImageNet models.
2021
- NeurIPSOff-Policy Risk Assessment in Contextual BanditsAudrey Huang, Liu Leqi, Zachary Lipton, and Kamyar AzizzadenesheliAdvances in Neural Information Processing Systems (NeurIPS), 2021
Even when unable to run experiments, practitioners can evaluate prospective policies, using previously logged data. However, while the bandits literature has adopted a diverse set of objectives, most research on off-policy evaluation to date focuses on the expected reward. In this paper, we introduce Lipschitz risk functionals, a broad class of objectives that subsumes conditional value-at-risk (CVaR), variance, mean-variance, many distorted risks, and CPT risks, among others. We propose Off-Policy Risk Assessment (OPRA), a framework that first estimates a target policy’s CDF and then generates plugin estimates for any collection of Lipschitz risks, providing finite sample guarantees that hold simultaneously over the entire class. We instantiate OPRA with both importance sampling and doubly robust estimators. Our primary theoretical contributions are (i) the first uniform concentration inequalities for both CDF estimators in contextual bandits and (ii) error bounds on our Lipschitz risk estimates, which all converge at a rate of O(1/sqrt(n).
- NeurIPSRebounding Bandits for Modeling Satiation EffectsLiu Leqi, Fatma Kilinc Karzan, Zachary Lipton, and Alan MontgomeryAdvances in Neural Information Processing Systems (NeurIPS), 2021
Psychological research shows that enjoyment of many goods is subject to satiation, with short-term satisfaction declining after repeated exposures to the same item. Nevertheless, proposed algorithms for powering recommender systems seldom model these dynamics, instead proceeding as though user preferences were fixed in time. In this work, we introduce rebounding bandits, a multi-armed bandit setup, where satiation dynamics are modeled as time-invariant linear dynamical systems. Expected rewards for each arm decline monotonically with consecutive exposures to it and rebound towards the initial reward whenever that arm is not pulled. Unlike classical bandit settings, methods for tackling rebounding bandits must plan ahead and model-based methods rely on estimating the parameters of the satiation dynamics. We characterize the planning problem, showing that the greedy policy is optimal when the arms exhibit identical deterministic dynamics. To address stochastic satiation dynamics with unknown parameters, we propose Explore-Estimate-Plan (EEP), an algorithm that pulls arms methodically, estimates the system dynamics, and then plans accordingly.
- CACMWhen Curation Becomes Creation: Algorithms, microcontent, and the vanishing distinction between platforms and creatorsLiu Leqi, Dylan Hadfield-Menell, and Zachary C LiptonCommunications of the ACM (CACM), 2021
Ever since social activity on the Internet began migrating from the wilds of the open web to the walled gardens erected by so-called platforms, debates have raged about the responsibilities that these platforms ought to bear. And yet, despite intense scrutiny from the news media and grassroots movements of outraged users, platforms continue to operate, from a legal standpoint, on the friendliest terms. Under the current regulatory framework, platforms simultaneously benefit from: (1) broad discretion to organize (and censor) content however they choose; (2) powerful algorithms for curating a practically limitless supply of user-posted microcontent according to whatever ends they wish; and (3) absolution from the sorts of liability born by creators of the underlying content. In this paper, we contest the very validity of the platform-creator distinction, arguing that it is ill-adapted to the modern social media landscape where, in a real sense, platforms are creating derivative media products. We argue that any coherent regulatory framework must adapt to this reality, recognizing the subtle continuum of activities that span the curation-creation spectrum, providing a finer system of categorization and clearer guidance for precisely when platforms assume the responsibilities associated with content creation.
2020
- UAIAutomated Dependence PlotsDavid Inouye, Liu Leqi, Joon Sik Kim, Bryon Aragam, and Pradeep RavikumarIn Conference on Uncertainty in Artificial Intelligence (UAI), 2020
In practical applications of machine learning, it is necessary to look beyond standard metrics such as test accuracy in order to validate various qualitative properties of a model. Partial dependence plots (PDP), including instance-specific PDPs (i.e., ICE plots), have been widely used as a visual tool to understand or validate a model. Yet, current PDPs suffer from two main drawbacks: (1) a user must manually sort or select interesting plots, and (2) PDPs are usually limited to plots along a single feature. To address these drawbacks, we formalize a method for automating the selection of interesting PDPs and extend PDPs beyond showing single features to show the model response along arbitrary directions, for example in raw feature space or a latent space arising from some generative model. We demonstrate the usefulness of our automated dependence plots (ADP) across multiple use-cases and datasets including model selection, bias detection, understanding out-of-sample behavior, and exploring the latent space of a generative model.
- ICMLUniform Convergence of Rank-Weighted LearningJustin Khim, Liu Leqi, Adarsh Prasad, and Pradeep RavikumarIn International Conference on Machine Learning (ICML), 2020
The decision-theoretic foundations of classical machine learning models have largely focused on estimating model parameters that minimize the expectation of a given loss function. However, as machine learning models are deployed in varied contexts, such as in high-stakes decision-making and societal settings, it is clear that these models are not just evaluated by their average performances. In this work, we study a novel notion of L-Risk based on the classical idea of rank-weighted learning. These L-Risks, induced by rank-dependent weighting functions with bounded variation, is a unification of popular risk measures such as conditional value-at-risk and those defined by cumulative prospect theory. We give uniform convergence bounds of this broad class of risk measures and study their consequences on a logistic regression example.
- WorkshopOn the Convergence and Optimality of Policy Gradient for Markov Coherent RiskAudrey Huang, Liu Leqi, Zachary C Lipton, and Kamyar AzizzadenesheliWorkshop on Challenges of Real World Reinforcement Learning (RWRL) at NeurIPS, 2020
In order to model risk aversion in reinforcement learning, an emerging line of research adapts familiar algorithms to optimize coherent risk functionals, a class that includes conditional value-at-risk (CVaR). Because optimizing the coherent risk is difficult in Markov decision processes, recent work tends to focus on the Markov coherent risk (MCR), a time-consistent surrogate. While, policy gradient (PG) updates have been derived for this objective, it remains unclear (i) whether PG finds a global optimum for MCR; (ii) how to estimate the gradient in a tractable manner. In this paper, we demonstrate that, in general, MCR objectives (unlike the expected return) are not gradient dominated and that stationary points are not, in general, guaranteed to be globally optimal. Moreover, we present a tight upper bound on the suboptimality of the learned policy, characterizing its dependence on the nonlinearity of the objective and the degree of risk aversion. Addressing (ii), we propose a practical implementation of PG that uses state distribution reweighting to overcome previous limitations. Through experiments, we demonstrate that when the optimality gap is small, PG can learn risk-sensitive policies. However, we find that instances with large suboptimality gaps are abundant and easy to construct, outlining an important challenge for future research.
2019
- NeurIPSGame Design for Eliciting Distinguishable BehaviorFan Yang, Liu Leqi, Yifan Wu, Zachary Lipton, Pradeep K Ravikumar, Tom M Mitchell, and William W CohenAdvances in Neural Information Processing Systems (NeurIPS), 2019
The ability to inferring latent psychological traits from human behavior is key to developing personalized human-interacting machine learning systems. Approaches to infer such traits range from surveys to manually-constructed experiments and games. However, these traditional games are limited because they are typically designed based on heuristics. In this paper, we formulate the task of designing behavior diagnostic games that elicit distinguishable behavior as a mutual information maximization problem, which can be solved by optimizing a variational lower bound. Our framework is instantiated by using prospect theory to model varying player traits, and Markov Decision Processes to parameterize the games. We validate our approach empirically, showing that our designed games can successfully distinguish among players with different traits, outperforming manually-designed ones by a large margin.
- NeurIPSOn Human-Aligned Risk MinimizationLiu Leqi, Adarsh Prasad, and Pradeep K RavikumarAdvances in Neural Information Processing Systems (NeurIPS), 2019
The statistical decision theoretic foundations of modern machine learning have largely focused on the minimization of the expectation of some loss function for a given task. However, seminal results in behavioral economics have shown that human decision-making is based on different risk measures than the expectation of any given loss function. In this paper, we pose the following simple question: in contrast to minimizing expected loss, could we minimize a better human-aligned risk measure? While this might not seem natural at first glance, we analyze the properties of such a revised risk measure, and surprisingly show that it might also better align with additional desiderata like fairness that have attracted considerable recent attention. We focus in particular on a class of human-aligned risk measures inspired by cumulative prospect theory. We empirically study these risk measures, and demonstrate their improved performance on desiderata such as fairness, in contrast to the traditional workhorse of expected loss minimization.
2018
- NeurIPSThe Sample Complexity of Semi-Supervised Learning with Nonparametric Mixture ModelsChen Dan, Liu Leqi, Bryon Aragam, Pradeep K Ravikumar, and Eric P XingAdvances in Neural Information Processing Systems (NeurIPS), 2018
We study the sample complexity of semi-supervised learning (SSL) and introduce new assumptions based on the mismatch between a mixture model learned from unlabeled data and the true mixture model induced by the (unknown) class conditional distributions. Under these assumptions, we establish an Ω(KlogK) labeled sample complexity bound without imposing parametric assumptions, where Kis the number of classes. Our results suggest that even in nonparametric settings it is possible to learn a near-optimal classifier using only a few labeled samples. Unlike previous theoretical work which focuses on binary classification, we consider general multiclass classification (K>2), which requires solving a difficult permutation learning problem. This permutation defines a classifier whose classification error is controlled by the Wasserstein distance between mixing measures, and we provide finite-sample results characterizing the behaviour of the excess risk of this classifier. Finally, we describe three algorithms for computing these estimators based on a connection to bipartite graph matching, and perform experiments to illustrate the superiority of the MLE over the majority vote estimator.
2016
- ICWSMAnalyzing Personality through Social Media Profile Picture ChoiceLiu Leqi*, Daniel Preotiuc-Pietro*, Zahra Riahi Samani, Mohsen E Moghaddam, and Lyle UngarInternational AAAI Conference on Web and Social Media (ICWSM), 2016
The content of images users post to their social media is driven in part by personality. In this study, we analyze how Twitter profile images vary with the personality of the users posting them. In our main analysis, we use profile images from over 66,000 users whose personality we estimate based on their tweets. To facilitate interpretability, we focus our analysis on aesthetic and facial features and control for demographic variation in image features and personality. Our results show significant differences in profile picture choice between personality traits, and that these can be harnessed to predict personality traits with robust accuracy. For example, agreeable and conscientious users display more positive emotions in their profile pictures, while users high in openness prefer more aesthetic photos.