Graph Data Science

Fri September 10, 01:30 PM–02:00 PM • Back to program
Session Type Live
Start time 13:30
End time 14:00
Countdown link Open timer

Graphs are amazing: there's the emerging graph ML work in deep learning, the rise of knowledge graphs in industry, powerful graph algorithms, ginormous graph databases, interactive graph visualizations, and so on. While these might seem to require special platforms and specialized expertise, in fact there's so much that can be done with graph-based technologies in Python. A team of open source developers has been working on kglab to integrate many different libraries related to graph work into an abstraction that plays well with popular data science tools in Python: pandas, NumPy, scikit-learn, Parquet, spaCy, PyTorch, RAPIDS, etc. This session will introduce Graph Data Science, showing how you can leverage these techniques into your data analytics work.

Graphs are amazing: there's the emerging graph ML work in deep learning, the rise of knowledge graphs in industry, semantic technologies, powerful graph algorithms, ginormous graph databases, statistical relational learning with probabilistic graphs, interactive graph visualizations, and so on. While these different "camps" among graph experts have typically not shared much common ground, and often required special platforms, that's changed.

A team of open source developers has been working on kglab to integrate many different libraries related to graph work into an abstraction that plays well with popular data science tools in Python: pandas, NumPy, scikit-learn, Parquet, spaCy, PyTorch, RAPIDS, etc. Moreover, the graph abstraction layer is engineered to follow data engineering practices and integrate with popular distributed systems for scale-out.

This session will introduce Graph Data Science, showing how you can leverage these techniques into your data analytics work. We'll show a demo using an open dataset from Kaggle which has a large number of recipes, for a progressive example of graph data science: data preparation, queries, visualizations, inference, etc., as components of a self-supervised pipeline in Python. The intent is to provide integrations for many complementary techniques where one can "mix & match" to build Hybrid AI solutions. Not much code, with all code samples wrapped as Jupyter notebooks that will run on a laptop. Most of it should feel quite familiar to anyone who works with pandas.

Paco Nathan he/him

Known as a "player/coach", with core expertise in data science, natural language, cloud computing; ~40 years tech industry experience, ranging from Bell Labs to early-stage start-ups. Advisor for Amplify Partners, IBM Data Science Community, Recognai, KUNGFU.AI, Primer. Lead committer PyTextRank, kglab. Formerly: Director, Community Evangelism @ Databricks and Apache Spark. https://derwen.ai/paco