Skip to main content


  • A Deep Learning Approach for Population Estimation from Satellite Imagery
    Caleb Robinson (Georgia Institute of Technology)

    In previous literature, questions of "where" people live and "how many" people live there have been studied independently. Population estimates are made for different administrative areas (e.g. US counties) based on cohort components or other methods; however, the locations of the people inside of the administrative area is largely unknown. Similarly, dasymetric modelling techniques for disaggregating existing population counts within administrative areas have been studied; however, there is no ground truth data with which to validate these methods. In addition, there are often large gaps between census counts in many countries. In the US, a national census is taken every 10 years, and it has only recently been supplemented with annual, non-comprehensive surveys, i.e. the American Community Survey. In some countries several decades can pass without a new census. In general, taking a census is extremely expensive and requires a great deal of organization and time. Knowing where people live and how many people live in a place are fundamental aspects of community planning at every scale. Considering this, we aim to create high resolution gridded population estimates using only satellite imagery. This jointly answers the questions of where people live, and how many people live there. Specifically, we train deep convolutional neural networks to estimate population in the US at a ~1km^2 resolution using a concatenation of Landsat 7 and Nighttime Light satellite imagery. To train our networks we disaggregate population counts at the census tract level to use as targets. We validate our model on held out data by aggregating individual grid cell estimates at the county level then comparing to the ground truth values.

  • A Hybrid Spatio-temporal Model for Wildlife Poaching Prediction
    Shahrzad Gholami (University of Southern California)

    Worldwide, conservation agencies employ rangers to protect conservation areas from poachers. However, agencies lack the manpower to have rangers effectively patrol these vast areas frequently. In this study, we present a hybrid spatio-temporal model that predicts poaching threat levels in Uganda's Queen Elizabeth Protected Area. We present two major contributions. First, our hybrid model consists of two components: (i) an ensemble model which can work with the limited data common to this domain and (ii) a spatio-temporal model to boost the ensemble's predictions when sufficient data are available. When evaluated on real-world historical data from QEPA, our hybrid model achieves significantly better performance than previous approaches with either temporally-aware dynamic Bayesian networks or an ensemble of spatially-aware models.

  • A Probabilistic Approach for Learning with LabelProportions Applied to the US Presidential Election
    Tao Sun (UMass Amherst)

    Ecological inference (EI) is a classical problem from political science to model voting behavior of individuals given only aggregate election results. Flaxman et al. recently formulated EI as machine learning problem using distribution regression, and applied it to analyze US presidential elections. However, distribution regression unnecessarily aggregates individual-level covariates available from census microdata, and ignores known structure of the aggregation mechanism. We instead formulate the problem as learning with label proportions (LLP), and develop a new, probabilistic, LLP method to solve it. Our model is the straightforward one where individual votes are latent variables. We use cardinality potentials to efficiently perform exact inference over latent variables during learning, and introduce a novel message-passing algorithm to extend cardinality potentials to multivariate probability models for use within multiclass LLP problems. We show experimentally that LLP outperforms distribution regression for predicting individual-level attributes, and that our method is as good as or better than existing state-ofthe-art LLP methods.

  • Accurate and Efficient Numerical Calculation of the Voigt Profile via Optimized Quadrature and Asymptotics
    Sebastian Ament (Cornell University)

    The Voigt profile is defined as the convolution of a Gaussian and a Lorentzian function. It occurs frequently in spectroscopy and astronomy, but is approximated in applications by sums of Gaussian and Lorentzian functions, called pseudo-Voigt profiles. These approximations tend to be accurate to only 3-4 digits, which is insufficient for high precision data. The goal of this work is to provide an accurate and efficient means of calculating the Voigt profile and its gradient directly. In addition, a method for the calculation for the full width at half maximum (FWHM) of the Voigt profile is given. The novel methods are based on the low-rank approximation of the Voigt characteristic function, generalized Gaussian quadrature, and specially derived asymptotic expansions. Numerical experiments of the schemes display their speed and machine-precision accuracy.

  • Analysis of Plant-Pollinator Interaction Networks Using Recommendation Techniques
    Eugene Seo (Oregon State University)

    Plant-pollinator interaction networks reveal the ecological relationship between plants and pollinators, indicating which pollinators approach which plants. Considering the crucial role of pollination in terms of productivity and extinction problems, analysis of connections and prediction of possible pairwise interactions between pollinators and plants are regarded as important in raising awareness of environment issues. The underlying mechanisms that explain the interactions have been actively researched in ecology or biology domains, but little research has been done from the computer science perspective. Previous studies in biological science have predicted metrics of pollination networks, but they ended up with inaccurate prediction of pairwise interactions. To correctly quantify the pairwise interactions and foresee the coming interactions, we need a computational model learning the behavior of pollinators that are interacting with plants. In my research, I explore recommendation and link prediction techniques, which are widely used in business or e-commerce, assuming the behavior of pollinators can be modeled with the framework of user-based recommendation systems. Throughout this work, I will try to show that the recommendation framework accustomed to pollination networks will capture the underlying preference of pollinators for plants and make accurate link predictions for missing or future interactions. Currently I am developing a recommendation model using matrix factorization techniques and evaluate the model with a dataset collected from the central Cascades, Oregon through the Eco-Informatics Summer Institute (EISI) program. I expect that our experiment results will indicate if our recommendation models can discover the latent features of plants-pollinators interactions or if there are some difficulties in applying user-focused approaches to plant-pollinator interactions.

  • Dynamic influence maximization under network uncertainties
    Bryan Wilder (University of Southern California)

    This work focuses on new challenges in influence maximization inspired by non-profits' use of social networks to effect behavioral change in their target populations. Influence maximization is a multiagent problem where the challenge is to select the most influential agents from a population connected by a social network. Specifically, this work is motivated by the problem of spreading messages about HIV prevention among homeless youth using their social network. We show how to compute solutions which are provably close to optimal when the parameters of the influence process are unknown. We then extend our algorithm to a dynamic setting where information about the network is revealed at each stage. A real-world pilot study using this algorithm was carried out in conjunction with a homeless shelter in the Los Angeles area. Results from the study show the benefits of an algorithmic approach over the heuristic currently used by shelters.

  • Effective Feature Learning based on Deep Networks for Urban Heat Island Prediction
    Sungyong Seo (University of Southern California)

    Climate change and urban air pollution are two of our society's great sustainability challenges. Understanding the most effective methods for increasing sustainability in urban areas is critical as over half the world's population lives in cities. An important problem in urban sustainability is the urban heat island (UHI) effect, a phenomenon whereby urban areas have higher surface and atmospheric temperatures than surrounding suburban and rural regions. These higher temperatures lead to increases in building cooling energy use during summer, can negatively impact outdoor human thermal comfort, and can exacerbate urban air pollution and heat waves, the leading cause of mortality and morbidity from any type of natural disaster. In this work, we propose the framework which aims to automatically infer high-level feature representations from large-scale observational UHI data through deep networks. With these latent features, we show that the complex non-linear transformation of observed variables can be captured and used to predict changes in temperatures.

  • Exact Inference for Integer Latent-Variable Models
    Kevin Winner (UMass Amherst)

    Graphical models with count-valued latent variables arise in a number of areas. However, standard inference algorithms do not apply to these models due to the infinite support of the latent variables. Winner & Sheldon (2016) recently developed a new exact inference technique that represents and manipulates countably infinite factors using probability generating functions (PGFs). Winner & Sheldon (2017) extended the technique of PGF inference to a broad class of integer latent-variable models using techniques from the autodiff literature. In this talk/poster I will cover both the fundamentals of and latest advances in performing PGF inference and demonstrate how to apply the technique to solve important problems in ecology and epidemiology.

  • Exploration of Objective Functions for Optimal Placement of Weather Stations
    Amelia Snyder (Oregon State University)

    Many regions of Earth lack ground-based sensing of weather variables. For example, most countries in Sub-Saharan Africa do not have reliable weather station networks. This absence of sensor data has many consequences ranging from public safety (poor prediction and detection of severe weather events), to agriculture (lack of crop insurance), to science (reduced quality of world-wide weather forecasts, climate change measurement, etc.). The Trans-African Hydro-Meteorological Observatory ( project seeks to address these problems by deploying and operating a large network of weather stations throughout Sub-Saharan Africa. To design the TAHMO network, we must determine where to locate each weather station. We can formulate this as the following optimization problem: Determine a set of N sites that jointly optimize the value of an objective function. The purpose of this poster is to propose and assess several objective functions. In addition to standard objectives (e.g., minimizing the summed squared error of interpolated values over the entire region), we consider objectives that minimize the maximum error over the region and objectives that optimize the detection of extreme events. An additional issue is that each station measures more than 10 variables- how should we balance the accuracy of our interpolated maps for each variable? Weather sensors inevitably drift out of calibration or fail altogether. How can we incorporate robustness to failed sensors into our network design? Another important requirement is that the network should make it possible to detect failed sensors by comparing their readings with those of other stations. How can this requirement be met? We invite everyone to join the discussion at our poster by proposing additional objectives, identifying additional issues to consider, and expanding our bibliography of relevant papers.

  • Firebird: Predicting Fire Risk and Prioritizing Fire Inspections in Atlanta
    Wenwen Zhang (Georgia Tech / Virginia Tech)

    The Atlanta Fire Rescue Department (AFRD), like many municipal fire departments, actively works to reduce fire risk by inspecting commercial properties for potential hazards and fire code violations. However, AFRD's fire inspection practices relied on tradition and intuition, with no existing data-driven process for prioritizing fire inspections or identifying new properties requiring inspection. In collaboration with AFRD, we developed the Firebird framework to help municipal fire departments identify and prioritize commercial property fire inspections, using machine learning, geocoding, and information visualization. Firebird computes fire risk scores for over 5,000 buildings in the city, with true positive rates of up to 71% in predicting fires. It has identified 6,096 new potential commercial properties to inspect, based on AFRD's criteria for inspection. Furthermore, through an interactive map, Firebird integrates and visualizes fire incidents, property information and risk scores to help AFRD make informed decisions about fire inspections. Firebird has already begun to make positive impact at both local and national levels. It is improving AFRD's inspection processes and Atlanta residents' safety, and was highlighted by National Fire Protection Association (NFPA) as a best practice for using data to inform fire inspections.

  • Grid-based land-use composition and configuration optimization for watershed stormwater management
    Ge Zhang (Georgia Tech)

    This paper demonstrates a new method of optimizing land-use patterns to reduce the negative impacts of urbanization on watershed stormwater systems. The Yong-Ding watershed in western Beijing, China, serves as a case study for this research. A regression model that estimates watershed hydrology response to land use pattern changes is integrated with a land-use allocation model to determine the optimal landuse pattern for minimizing peak flow or total volume at the watershed outlet. This system also uses the CLUE-S model to generate empirical land-use patterns under different development intensities and then determines the land use pattern change constraints for each optimization process. The impacts of optimization are detected by comparing the land use pattern characteristics and watershed hydrology of empirical and optimal scenarios under the same development intensity. The results of the hydrological evaluation suggest that, compared to land-use location control, land-use composition and configuration control may be a more powerful method for minimizing the negative hydrological impact of urbanization.

  • illiad: InteLLigent Invariant and Anomaly Detection in Cyber Physical Systems
    Nikhil Muralidhar (Virginia Tech)

    Cyber physical systems (CPSs) are today ubiquitous in urban environments. Such systems now serve as the backbone to numerous critical infrastructure applications, from smart grids to IoT installations. Scalable and seamless operation of such CPSs requires sophisticated tools for monitoring the time series progression of the system, dynamically tracking relationships, and issuing alerts about anomalies to operators. We present an online monitoring system (illiad) that models the state of the CPS as a function of its relationships between constituent components, using a combination of model-based and data-driven strategies. In addition to accurate inference for state estimation and anomaly tracking, illiad also exploits the underlying network structure of the CPS (wired or wireless) for state estimation purposes. We demonstrate the application of illiad to two diverse settings: a wireless sensor motes application and an IEEE 33-bus microgrid.

  • Learning in Discrete Optimization
    Elias Khalil (Georgia Tech)

    Mixed Integer Programs (MIP) are solved exactly by tree-based branch-and-bound search. However, various components of the algorithm involve making decisions that are currently addressed heuristically. Instead, we propose to use machine learning (ML) to make better-informed, input-specific decisions during MIP branch-and-bound. This line of work aims at improving the overall performance of MIP solvers. To illustrate the potential for ML in MIP, we have so far tackled branching variable selection, and primal heuristic selection, both crucial components of the branch-and-bound algorithm. Our experimental results show that, for both tasks, ML approaches can significantly improve the performance of a solver on heterogeneous benchmark instances from MIPLIB and certain homogeneous families of instances. For certain classical combinatorial optimization problems, we also propose a deep graph embedding framework for learning powerful greedy heuristics, and show that the learned heuristics are competitive with specialized heuristics or approximation algorithms for Vertex Cover, Max Cut and the TSP.

  • Machine Learning for Wildlife Conservation with UAVs
    Elizabeth Bondi (University of Southern California)

    With the recent increases in poaching, particularly of elephants and rhinoceroses in Africa, new efforts are being made to protect animals. One such effort is to use unmanned aerial vehicles (UAVs) to fly over parks and search for poachers, so that park rangers can be sent to intercept them. Our goal is to automatically detect poachers and animals rather than have people monitor the video all night. This will be done with deep learning, particularly with Faster RCNN. By using a pre-trained, fine-tuned network, we have been able to achieve an mAP of 0.046. The next step is to send this prototype to our collaborators for testing in Africa, and further research will incorporate motion.

  • Maximizing the Spread of Cascade with Correlated Stochastic Events: Model and Algorithms
    Yexiang Xue (Cornell University)

    Stochastic network design maximizes the expected spread of cascades in networks by adding nodes or edges. Previous approaches rely on independent coin flipping to decide the connectivity of each edge, which requires the assumption that the stochasticity of one edge is independent of others. In this paper, we consider a more realistic setting where the stochasticity of multiple edges can be correlated, due to natural disasters or regional events that affect all edges in a given area. We propose to use Markov Random Field to model the correlation and define a new stochastic network design framework. We provide a novel algorithm based on Sample Average Approximation (SAA) coupled with a Gibbs or XOR sampler. We also propose a novel way to evaluate policies within constant approximation based on hashing based solution couting. The experimental results on real road network data show that the policies produced by SAA with XOR sampler have higher quality and lower variance compared to SAA with Gibbs sampler.

  • Network Analysis of Global Embodied Fossil Fuel Energy Flows
    Anthony Harding (Georgia Institute of Technology)

    The world energy system is a global, interdependent system joining the environmental system with the economic system. With increased globalization, the spatial location of demand for goods and services and the location of energy extraction and exploitation to produce these goods and services are increasingly distinct. This makes climate and energy policy design increasingly complex. In this paper, leveraging data from the World Input-Output Database, we combine input-output analysis techniques with network science techniques to describe and examine the inter-country flows of embodied fossil fuel energies across 35 sectors and 41 countries for three fossil fuel sources and their aggregate. We apply several network- and node-level properties to the relevant edges in each network. From our, analysis we are able to identify several countries consistently identified as central to the network. We posit that these countries have the potential to be leaders in implementing policy to mitigate energy consumption and energy leakage through production processes.

  • Optimal recovery planning for endangered species under regime uncertainty
    Ryan Finseth (Cornell University)

    The Endangered Species Act requires agencies to rehabilitate populations until population-based delisting criteria are met and management and protections are no longer necessary. Oftentimes, lingering threats within the environment require management to continue after delisting criteria have been achieved. The presence of threats in the environment often results in demonstrably different system dynamics and this variation in system dynamics can be modeled as a regime shift. We propose to develop an optimization model based on the extended POMDP framework developed by Fackler and Pacifici (2014) that seeks to find a management plan that achieves predetermined delisting criteria for an endangered species with regime uncertainty at minimum economic cost. Solving the model generates state-dependent decision rules that comprise a recovery plan for the species. We apply our model to the management of the California condor, a critically endangered species under threat from pervasive lead poisoning throughout their range in the Western United States and Mexico.

  • Optimal Wildlife Reserve Design using Spatial Capture-Recapture Models
    Amrita Gupta (Georgia Institute of Technology)

    Wildlife reserves are a key strategy for counteracting the loss of biodiversity due to human activity. In order for reserves to be effective in the long run, they need to protect both current populations of target species as well as the ecological processes on which they rely. Spatial capture recapture models have recently emerged as a powerful tool for modeling ecological processes with spatial dependencies, such as foraging and migration. In this work, we utilize spatial capture-recapture derived species models to drive reserve design optimization. In particular, we study the effects of estimation uncertainty in the species models on the optimization of different wildlife reserve design objectives.

  • Optimizing CitiBike
    Aaron Ferber (Georgia Institute of Technology)

    The sustainable operation of Bikesharing systems involves optimization problems as diverse as online repair scheduling, incentive design, and inventory management. For all of these, machine learning and simulation tools can be used to predict ridership and examine the effectiveness of the optimization. This poster reviews methods developed by Cornell's CitiBike research group.

  • Relaxation Methods for Constrained Matrix Factorization Problems: Solving the Phase Mapping Problem in Materials Discovery
    Junwen Bai (Cornell University)

    Matrix factorization is a robust and widely adopted technique in data science, in which a given matrix is decomposed as the product of low rank matrices. We study a challenging constrained matrix factorization problem in materials discovery, the so-called phase mapping problem. We introduce a novel "lazy" Interleaved Agile Factor Decomposition (IAFD) approach that relaxes and postpones non-convex constraint sets (the lazy constraints), iteratively enforcing them when violations are detected. IAFD interleaves multiplicative gradient-based updates with efficient modular algorithms that detect and repair constraint violations, while still ensuring fast run times. Experimental results show that IAFD is several orders of magnitude faster and its solutions are also in general considerably better than previous approaches. IAFD solves a key problem in materials discovery while also paving the way towards tackling constrained matrix factorization problems in general, with broader implications for data science.

  • Social Network-Based Substance Abuse Interventions for Homeless Youth
    Aida Rahmattalabi (University of Southern California)

    Influence maximization has been widely studied in the past decade with different applications ranging from viral marketing to raising awareness about HIV. In most of these studies, however, the actions include selecting seed nodes or key edges, plus usually the assumption is that the networks are static. Motivated by social network-based interventions conducted for reducing substance use, in this work, we focus on an influence based partitioning of the social networks. Therefore, given the social network of a group of youth who either use drugs or are at the risk, we aim to find the optimal grouping. This is important since during these interventions, their social network undergoes some changes: They are encouraged to form more supportive and useful ties and to break the harmful ones and this way the interventionists try to reduce the drug consumption. However, if not chosen carefully, these groups can result in opposite result also known as "deviancy training". As a result, we are faced with a dynamic network and two different influence processes in this network and we aim to find the optimal grouping so that we minimize the negative influence as much as possible. This is a challenging problem within social work research and we aim to use AI to assist understanding different aspects of this problem.

  • Violence Minimization in Homeless Youth
    Robin Petering & Ajitesh Srivastava (University of Southern California)

    Violence among homeless young adults (HYA) is a complex phenomenon that has proximal physical consequences including, severe injury and death, as well as related to indirect negative psychological and behavioral health outcomes. Reducing violence is imperative for improving HYAs' ability to safely and successfully exit homelessness and lead fulfilling lives. However, effective interventions targeting violence in HYA populations and other at-risk populations are difficult to develop and implement because of the complex constellation of contributing intrinsic and extrinsic factors related to violence. The proposed pilot proposes to reduce violence in HYA by developing and implementing a Mindfulness and Yoga Peer Change Agent (MYPCA) intervention that harnesses Artificial Intelligence (AI) to optimize intervention effectiveness. The pilot leverages existing social network analysis methods, PCA recruitment methods and intervention dissemination structures that have previously demonstrated effectiveness in this population and setting and uses them to deliver novel content to reduce violence and promote adaptive emotion regulation behavior. The pilot will contribute to three innovations in HYA intervention approaches and implementation science including: 1) the development of a mindfulness and yoga intervention that utilizes a train-the-trainer approach, 2) the measurement of intervention impact with physiological and cognitive indicators, and 3) the assessment of how the diffusion of violent behavior is interrupted within a social network. Additionally, the proposed study will further refine an AI algorithm that enhances the selection of PCAs using a non-progressive network diffusion model under assumptions of network data uncertainty. The results of this pilot will allow the research team to subsequently apply for an R21 or R01 to test the effectiveness of the intervention in a large scale randomized clinical trial with the National Center for Complementary and Integrative Health or the National Institute of Mental Health.

Organizing Committee
Local Arrangements