Georgia Tech @ KDD 2019

Aug. 4-8
Anchorage, Alaska
Data Mining Experts Showcase Research at KDD 2019 

Data mining allows us to extract meaningful insights from raw data which influences decisions made about almost every aspect of our lives. From healthcare, education, to social media, the 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2019) is the premier conference that brings data mining leaders across industries together to explore and discuss the latest findings in this quickly-evolving field of research. 
 
This year, Georgia Tech will present eight papers from 18 different authors at KDD. These papers discuss everything from identifying concepts within social science publications, unsupervised construction of knowledge graphs, to defense of image classification machine learning models, and more. 
 
Many of those attending and presenting from Georgia Tech hail from the 
School of Computational Science and Engineering (CSE) and the Georgia Tech Research Institute (GTRI). CSE Assistant Professor Chao Zhang leads Georgia Tech’s presence with three research papers and will serve as a committee member of the Program Committee Research track.


Read on to find out more about Georgia Tech's involvement in this  year's KDD conference.
 

"Data science is transforming our society, and KDD is the flagship conference in data science. So many of our talented students and faculty are showcasing cutting-edge research results, giving invited talks, or organizing workshops at KDD."

Chao Zhang, Assistant Professor, School of Computational Science and Engineering

Georgia Tech Papers

 

CubeNet: Multi-Facet Hierarchical Heterogeneous Network Construction, Analysis, and Mining
Carl Yang, Dai Teng, Siyang Liu, Sayantani Basu, Jieyu Zhang, Jiaming Shen, Chao Zhang, Jingbo Shang, Lance Kaplan, Timothy Harratty, Jiawei Han

Extracting Knowledge for Adversarial Detection and Defense in Deep Learning
Scott Freitas, Shang-Tse Chen, Polo Chau

The Efficacy of SHIELD under Different Threat Models
Cory Cornelius, Nilaksh Das, Shang-Tse Chen, Li Chen, Michael Kounavis, Polo Chau

Unsupervised Construction of Knowledge Graphs from Text and Code
Kun Cao, James Fairbanks
Curating for Next-Generation Social Science
Erica Briscoe, Alexandra Trani, Scott Appling

MLsploit: A Framework for Interactive Experimentation with Adversarial Machine Learning Research
Nilaksh Das, Siwei Li, Chanil Jeon, Jinho Jung, Shang-Tse Chen, Carter Yagemann, Evan Downing, Haekyu Park, Evan Yang, Li Chen, Michael Kounavis, Ravi Sahita, David Durham, Scott Buck, Polo Chau, Taesoo Kim, Wenke Lee

State-Sharing Sparse Hidden Markov Models for Personalized Sequences
Hongzhi Shi, Chao Zhang, Quanming Yao, Yong Li, Funing Sun, Depeng Jin

TopicMine: User-Guided Topic Mining by Category-Oriented Embedding
Yu Meng, Jiaxin Huang, Zihan Wang, Chenyu Fan, Guangyuan Wang, Chao Zhang, Jingbo Shang, Lance Kaplan, Jiawei Han
WORKSHOPS

Georgia Tech’s involvement continues outside of published research with several faculty and students participating through committee involvement or acting as guest speakers for several workshops, including:

Research Highlights

MLsploit: A Framework for Interactive Experimentation with Adversarial Machine Learning Research

Nilaksh Das, Siwei Li, Chanil Jeon, Jinho Jung, Shang-Tse Chen, Carter Yagemann, Evan Downing, Haekyu Park, Evan Yang, Li Chen, Michael Kounavis, Ravi Sahita, David Durham, Scott Buck, Polo Chau, Taesoo Kim, Wenke Lee


MLsploit is the first user-friendly, cloud-based system that enables researchers and practitioners to rapidly evaluate and compare state-of-the-art adversarial attacks and defenses for machine learning models. The tool was jointly developed by researchers at Georgia Tech and Intel, and is open-source.

Stop by the Project Showcase, Wednesday, August 7, 2019 from 8:30AM - 3:30 PM at the Idlughet Hall 3, Street Level in the Dena’ina Center.

State-Sharing Sparse Hidden Markov Models for Personalized Sequences

Hongzhi Shi, Chao Zhang, Quanming Yao, Yong Li, Funing Sun, Depeng Jin

Sequential behavior modeling is crucial in many applications, but is often difficult to be performed at a personalized level due to the notorious data scarcity bottleneck. The paper proposes a new technique to address this bottleneck in personalized sequential modeling. The technique, which is based on the classic hidden Markov model, leverages all the sequences to collectively infer state emissions from scarce data, while maintaining small personalized transition matrices to capture personalized sequential patterns. The technique can be used to improve personalized sequential prediction in applications including online recommendation, mobility modeling, and personalized medicine.

This paper will be presented Wednesday, Aug. 7, 10AM - 12 PM at the Research Track Session RT12 on the Ground Level of the Egan Center.

Hands-on Tutorial: Cloud-based Data Science at the Speed of Thought Using RAPIDS - the Open GPU Data Science Ecosystem
Brad Rees, Bartley Richardson, Tom Drabas, Keith Kraus, Corey Nolet, Juan-Arturo Herrera, Haekyu Park


NVIDIA's RAPIDS suite offers open source software libraries built to execute end-to-end data science and analytics pipelines on GPUs. This tutorial presents a collection of data science problems that introduce components and features of RAPIDS with a focus on accelerating a large data science workflow in Python on a multiple GPU.

This tutorial also includes a segment of research contributed by  CSE Ph.D. student and NVIDIA Intern Haekyu Park. Park's work presents a personalized PageRank algorithm on temporal dynamic graphs, and the algorithm's potential use for visual analytics. 

The tutorial will be held Tuesday August 6, 2019 from 1:30PM - 4:30 PM at Kahtnu, Level 2 in the Dena’ina Center.

JOIN THE
CONVERSATION

Use #KDD2019 and #CSE