From 0f970e73fe55277643a5ffea6437eed6df0eef90 Mon Sep 17 00:00:00 2001 From: Daniel Date: Mon, 25 Mar 2024 14:20:49 +0800 Subject: [PATCH 1/5] cleaned up notebooks added to gfql demo folder, readme update --- README.md | 4 +- demos/gfql/gfql_cpv_gpu_enchmark.ipynb | 2976 ++++++++++++++++++++++++ demos/gfql/simple_gfql_notebook.ipynb | 264 +++ 3 files changed, 3242 insertions(+), 2 deletions(-) create mode 100644 demos/gfql/gfql_cpv_gpu_enchmark.ipynb create mode 100644 demos/gfql/simple_gfql_notebook.ipynb diff --git a/README.md b/README.md index b4dcae3167..3a8fbb2166 100644 --- a/README.md +++ b/README.md @@ -147,7 +147,7 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit g2.plot() ``` -* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb)) +* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), [chain and hop demo](demos/gfql/simple_gfql_notebook.ipynb), [benchmark](demos/gfql/gfql_cpv_gpu_enchmark.ipynb) Run Cypher-style graph queries natively on dataframes without going to a database or Java with GFQL: @@ -1248,7 +1248,7 @@ assert 'pagerank' in g2._nodes.columns PyGraphistry supports GFQL, its PyData-native variant of the popular Cypher graph query language, meaning you can do graph pattern matching directly from Pandas dataframes without installing a database or Java -See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb) and the CPU/GPU [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb) +See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), [starting with chain and hop](demos/gfql/simple_gfql_notebook.ipynb) and the CPU/GPU [benchmark](demos/gfql/gfql_cpv_gpu_enchmark.ipynb) Traverse within a graph, or expand one graph against another diff --git a/demos/gfql/gfql_cpv_gpu_enchmark.ipynb b/demos/gfql/gfql_cpv_gpu_enchmark.ipynb new file mode 100644 index 0000000000..9860cc105e --- /dev/null +++ b/demos/gfql/gfql_cpv_gpu_enchmark.ipynb @@ -0,0 +1,2976 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GZxoiU8sQDk_" + }, + "source": [ + "# GFQL CPU, GPU Benchmark\n", + "\n", + "This notebook examines GFQL property graph query performance on 1-8 hop queries using CPU + GPU modes on various real-world 100K - 100M edge graphs. The data comes from a variety of popular social networks. The single-threaded CPU mode benefits from GFQL's novel dataframe engine, and the GPU mode further adds single-GPU acceleration. Both the `chain()` and `hop()` methods are examined.\n", + "\n", + "The benchmark does not examine bigger-than-memory and distributed scenarios. The provided results here are from running on a free Google Colab T4 runtime, with a 2.2GHz Intel CPU (12 GB CPU RAM) and T4 Nvidia GPU (16 GB GPU RAM).\n", + "\n", + "## Data\n", + "From [SNAP](https://snap.stanford.edu/data/)\n", + "\n", + "| Network | Nodes | Edges |\n", + "|-------------|-----------|--------------|\n", + "| [**Facebook**](#fb)| 4,039 | 88,234 |\n", + "| [**Twitter**](#tw) | 81,306 | 2,420,766 |\n", + "| [**GPlus**](#gpl) | 107,614 | 30,494,866 |\n", + "| [**Orkut**](#ork) | 3,072,441 | 117,185,082 |\n", + "\n", + "## Results\n", + "\n", + "Definitions:\n", + "\n", + "* GTEPS: Giga (billion) edges traversed per second\n", + "\n", + "* T edges / \\$: Estimated trillion edges traversed for 1\\$ USD based on observed GTEPS and a 3yr AWS reservation (as of 12/2023)\n", + "\n", + "Tasks:\n", + "\n", + "1. `chain()` - includes complex pre/post processing\n", + "\n", + " **Task**: `g.chain([n({'id': some_id}), e_forward(hops=some_n)])`\n", + "\n", + "\n", + "| **Dataset** | Max GPU Speedup | CPU GTEPS | GPU GTEPS | T CPU edges / \\$ (t3.l) | T GPU edges / \\$ (g4dn.xl) |\n", + "|-------------|--------------|-------------|-------------|----------------------------|--------------------------------|\n", + "| [**Facebook**](#fb)| 1.1X | 0.66 | 0.61 | 65.7 | 10.4 |\n", + "| [**Twitter**](#tw) | 17.4X | 0.17 | 2.81 | 16.7 | 48.1 |\n", + "| [**GPlus**](#gpl) | 43.8X | 0.09 | 2.87 | 8.5 | 49.2 |\n", + "| [**Orkut**](#ork) | N/A | N/A | 12.15 | N/A | 208.3 |\n", + "| **AVG** | 20.7X | 0.30 | 4.61 | 30.3 | 79.0\n", + "| **MAX** | 43.8X | 0.66 | 12.15 | 65.7 | 208.3\n", + "\n", + "\n", + "2. `hop()` - core property search primitive similar to BFS\n", + "\n", + " **Task**: `g.hop(nodes=[some_id], direction='forward', hops=some_n)`\n", + "\n", + "\n", + "| **Dataset** | Max GPU Speedup | CPU GTEPS | GPU GTEPS | T CPU edges / \\$ (t3.l) | T GPU edges / \\$ (g4dn.xl) |\n", + "|-------------|-------------|-----------|-----------|--------------------|--------------------------------|\n", + "| [**Facebook**](#fb)| 3X | 0.47 | 1.47 | 47.0 | 25.2 |\n", + "| [**Twitter**](#tw) | 42X | 0.50 | 10.51 | 50.2 | 180.2 |\n", + "| [**GPlus**](#gpl) | 21X | 0.26 | 4.11 | 26.2 | 70.4 |\n", + "| [**Orkut**](#ork) | N/A | N/A | 41.50 | N/A | 711.4 |\n", + "| **AVG** | 22X | 0.41 | 14.4 | 41.1 | 246.8\n", + "| **MAX** | 42X | 0.50 | 41.50 | 50.2 | 711.4\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SAj8lhREEOwS" + }, + "source": [ + "## Optional: GPU setup - Google Colab" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4hrEEAAm7DTO" + }, + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "W2MF6ZsjDv3B", + "outputId": "d09118ee-55d5-49cf-b950-d1e232aa4eb2" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Mon Feb 19 04:14:57 2024 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 39C P8 9W / 70W | 0MiB / 15360MiB | 0% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "| No running processes found |\n", + "+---------------------------------------------------------------------------------------+\n" + ] + } + ], + "source": [ + "# Report GPU used when GPU benchmarking\n", + "! nvidia-smi" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Aikh0x4ID_wK" + }, + "outputs": [], + "source": [ + "# if in google colab\n", + "# !git clone https://github.com/rapidsai/rapidsai-csp-utils.git\n", + "# !python rapidsai-csp-utils/colab/pip-install.py\n", + "!pip install --extra-index-url=https://pypi.nvidia.com cuml-cu12 cudf-cu12 #==23.12.00 #cugraph-cu11 pylibraft_cu11 raft_dask_cu11 dask_cudf_cu11 pylibcugraph_cu11 pylibraft_cu11\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "id": "Lwekdei1dH3N", + "outputId": "a506b4fb-0dba-4e90-884e-df16cb19eebd" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'24.02.01'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 2 + } + ], + "source": [ + "import cudf\n", + "cudf.__version__" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QQpsrtwBT7sa" + }, + "source": [ + "# 1. Install & configure" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "cYjRbgkU9Sx8", + "outputId": "7d592f98-36af-4657-b8eb-80eeabd98e2f" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.3/3.3 MB\u001b[0m \u001b[31m14.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m244.4/244.4 kB\u001b[0m \u001b[31m2.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m332.3/332.3 kB\u001b[0m \u001b[31m8.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h" + ] + } + ], + "source": [ + "#! pip install graphistry[igraph]\n", + "\n", + "!pip install -q igraph\n", + "!pip install -q graphistry\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ff6Tt9DhkePl" + }, + "source": [ + "## Imports" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "id": "S5_y0CbLkjft", + "outputId": "909ff8a7-650e-4e65-aaf9-40f650d50145" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'0.33.0'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 4 + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "import graphistry, time\n", + "\n", + "from graphistry import (\n", + "\n", + " # graph operators\n", + " n, e_undirected, e_forward, e_reverse,\n", + "\n", + " # attribute predicates\n", + " is_in, ge, startswith, contains, match as match_re\n", + ")\n", + "graphistry.__version__" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "I7Fg75jsG4co" + }, + "outputs": [], + "source": [ + "import cudf" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "uLZKph2-a5M4" + }, + "outputs": [], + "source": [ + "#work around google colab shell encoding bugs\n", + "\n", + "import locale\n", + "locale.getpreferredencoding = lambda: \"UTF-8\"" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eU9SyauNUHtR" + }, + "source": [ + "# 2. Perf benchmarks" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NA0Ym11fkB8j" + }, + "source": [ + "\n", + "### Facebook: 88K edges" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 224 + }, + "id": "vXuQogHekClJ", + "outputId": "de95808f-d1c9-4864-e8d9-b197f3d38413" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(88234, 2)\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " s d\n", + "0 0 1\n", + "1 0 2\n", + "2 0 3\n", + "3 0 4\n", + "4 0 5" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sd
001
102
203
304
405
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "df", + "summary": "{\n \"name\": \"df\",\n \"rows\": 88234,\n \"fields\": [\n {\n \"column\": \"s\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 910,\n \"min\": 0,\n \"max\": 4031,\n \"samples\": [\n 1624,\n 101,\n 377\n ],\n \"num_unique_values\": 3663,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"d\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 893,\n \"min\": 1,\n \"max\": 4038,\n \"samples\": [\n 2193,\n 150,\n 879\n ],\n \"num_unique_values\": 4037,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 7 + } + ], + "source": [ + "df = pd.read_csv('https://raw.githubusercontent.com/graphistry/pygraphistry/master/demos/data/facebook_combined.txt', sep=' ', names=['s', 'd'])\n", + "print(df.shape)\n", + "df.head(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 224 + }, + "id": "jEma7hvvkzkN", + "outputId": "98501752-d3df-4f5d-95b8-ff085a17894a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(4039, 1) (88234, 2)\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " id\n", + "0 0\n", + "1 1\n", + "2 2\n", + "3 3\n", + "4 4" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id
00
11
22
33
44
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"fg\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 0,\n \"max\": 4,\n \"samples\": [\n 1,\n 4,\n 2\n ],\n \"num_unique_values\": 5,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 8 + } + ], + "source": [ + "fg = graphistry.edges(df, 's', 'd').materialize_nodes()\n", + "print(fg._nodes.shape, fg._edges.shape)\n", + "fg._nodes.head(5)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "with 2 and 5 hop `chain` comparison we see a slight/negligable speedup enabled by setting g. to `cudf`" + ], + "metadata": { + "id": "2gVDho9cn2Et" + } + }, + { + "cell_type": "code", + "source": [ + "for n_hop in [2,5]:\n", + " start0 = time.time()\n", + " for i in range(100):\n", + " fg2 = fg.chain([n({'id': 0}), e_forward(hops=n_hop)]) # using n notation\n", + " mid0 = time.time()\n", + " for i in range(100):\n", + " fg2 = fg.chain([e_forward(source_node_match={'id': 0}, hops=n_hop)]) # using source_node_match in e_forward\n", + " end0 = time.time()\n", + " T0 = mid0-start0\n", + " T1 = end0-mid0\n", + " fg_gdf = fg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + " start1 = time.time()\n", + " for i in range(100):\n", + " fg2 = fg_gdf.chain([n({'id': 0}), e_forward(hops=n_hop)])\n", + " mid1 = time.time()\n", + " for i in range(100):\n", + " fg2 = fg_gdf.chain([e_forward(source_node_match={'id': 0}, hops=n_hop)])\n", + " end1 = time.time()\n", + " # print(fg._nodes.shape, fg._edges.shape)\n", + " # print(fg2._nodes.shape, fg2._edges.shape)\n", + " del fg_gdf\n", + " del fg2\n", + " T2 = mid1-start1\n", + " T3 = end1-mid1\n", + " print('\\nhops:',n_hop,'\\nCPU n_notation time:',np.round(T0,4),'\\nGPU n_notation time:',np.round(T2,4),'\\nspeedup:', np.round(T0/T2,4),\n", + " '\\nCPU source_node_match time:',np.round(T1,4),'\\nGPU source_node_match time:',np.round(T3,4),'\\nspeedup:', np.round(T1/T3,4), )" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ZKzoqGcdxekr", + "outputId": "344b9823-6998-4f71-d3b5-7aff9fec7912" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "hops: 2 \n", + "CPU n_notation time: 13.6357 \n", + "GPU n_notation time: 12.2177 \n", + "n_notation speedup: 1.1161 \n", + "CPU source_node_match time: 21.2028 \n", + "GPU source_node_match time: 14.3844 \n", + "source_node_match speedup: 1.474\n", + "hops: 5 \n", + "CPU n_notation time: 36.8941 \n", + "GPU n_notation time: 21.3562 \n", + "n_notation speedup: 1.7276 \n", + "CPU source_node_match time: 17.8739 \n", + "GPU source_node_match time: 14.8514 \n", + "source_node_match speedup: 1.2035\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "and with simple 2 and 5 hop `hop` comparison we see a 2x speedup enabled by setting g. to `cudf`" + ], + "metadata": { + "id": "5-7M9sPEAf5Z" + } + }, + { + "cell_type": "code", + "source": [ + "for n_hop in [2,5]:\n", + " start_nodes = pd.DataFrame({fg._node: [0]})\n", + " start0 = time.time()\n", + " for i in range(100):\n", + " fg2 = fg.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " start_nodes = cudf.DataFrame({fg._node: [0]})\n", + " fg_gdf = fg.nodes(cudf.from_pandas(fg._nodes)).edges(cudf.from_pandas(fg._edges))\n", + " start1 = time.time()\n", + " for i in range(100):\n", + " fg2 = fg_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end1 = time.time()\n", + " # print(fg._nodes.shape, fg._edges.shape)\n", + " # print(fg2._nodes.shape, fg2._edges.shape)\n", + " del fg_gdf\n", + " del fg2\n", + " T1 = end1-start1\n", + " print('\\nCPU',n_hop,'hop time:',np.round(T0,4),'\\nGPU',n_hop,'hop time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Tki_0-_j3XKG", + "outputId": "7edbb4e9-1f49-4b7a-fdb4-9df0867512cd" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "CPU 2 hop time: 5.7415 \n", + "GPU 2 hop time: 2.7301 2 \n", + "hop speedup: 2.103\n", + "\n", + "CPU 5 hop time: 14.3391 \n", + "GPU 5 hop time: 6.9998 5 \n", + "hop speedup: 2.0485\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KrJKjXy2KLos" + }, + "source": [ + "\n", + "## Twitter\n", + "\n", + "- edges: 2420766\n", + "- nodes: 81306" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "fO2qasGqpubr", + "outputId": "957c5ea8-0da9-4101-ecf8-7db8d5d13f49" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "--2024-02-20 09:48:59-- https://snap.stanford.edu/data/twitter_combined.txt.gz\n", + "Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80\n", + "Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 10621918 (10M) [application/x-gzip]\n", + "Saving to: ‘twitter_combined.txt.gz’\n", + "\n", + "twitter_combined.tx 100%[===================>] 10.13M 9.10MB/s in 1.1s \n", + "\n", + "2024-02-20 09:49:00 (9.10 MB/s) - ‘twitter_combined.txt.gz’ saved [10621918/10621918]\n", + "\n" + ] + } + ], + "source": [ + "! wget 'https://snap.stanford.edu/data/twitter_combined.txt.gz'\n", + "#! curl -L 'https://snap.stanford.edu/data/twitter_combined.txt.gz' -o twitter_combined.txt.gz" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "fn7zeA3SGlEo" + }, + "outputs": [], + "source": [ + "! gunzip twitter_combined.txt.gz" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "68TAZkhLGz9g", + "outputId": "b861bc74-a142-4880-dc43-8068f7ef6b04" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "214328887 34428380\n", + "17116707 28465635\n", + "380580781 18996905\n", + "221036078 153460275\n", + "107830991 17868918\n" + ] + } + ], + "source": [ + "! head -n 5 twitter_combined.txt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "QU2wNeGXG2GC", + "outputId": "dff04552-efc7-49a6-9f25-460f31a288be" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(2420766, 2)" + ] + }, + "metadata": {}, + "execution_count": 31 + } + ], + "source": [ + "te_df = pd.read_csv('twitter_combined.txt', sep=' ', names=['s', 'd'])\n", + "te_df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "EK5gQH2iG5UU" + }, + "outputs": [], + "source": [ + "import graphistry" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ZtIW-eFGG_R4", + "outputId": "53bd4e96-f3c4-4d45-abfd-e8a454a8ac7e" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(81306, 1)" + ] + }, + "metadata": {}, + "execution_count": 33 + } + ], + "source": [ + "g = graphistry.edges(te_df, 's', 'd').materialize_nodes()\n", + "g._nodes.shape" + ] + }, + { + "cell_type": "markdown", + "source": [ + "on the twitter data, simpler `chain` operations over several different hops -- **10-20x** *italicized text* speed increases" + ], + "metadata": { + "id": "yR9Qr8tGww3b" + } + }, + { + "cell_type": "code", + "source": [ + "for n_hop in [1,2,8]:\n", + " start_nodes = pd.DataFrame({fg._node: [0]})\n", + " start0 = time.time()\n", + " for i in range(10):\n", + " g2 = g.chain([n({'id': 17116707}), e_forward(hops=n_hop)])\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " g_gdf = g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + " start1 = time.time()\n", + " for i in range(10):\n", + " out = g_gdf.chain([n({'id': 17116707}), e_forward(hops=n_hop)])._nodes\n", + " end1 = time.time()\n", + " # print(fg._nodes.shape, fg._edges.shape)\n", + " # print(fg2._nodes.shape, fg2._edges.shape)\n", + " del g_gdf\n", + " del out\n", + " T1 = end1-start1\n", + " print('\\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "rCsvQJa-6U0x", + "outputId": "70887aac-1bc8-499a-dc9e-40a1803e8fe3" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "CPU 1 hop chain time: 20.1676 \n", + "GPU 1 hop chain time: 1.0259 \n", + " 1 hop chain speedup: 19.6579\n", + "\n", + "CPU 2 hop chain time: 21.7168 \n", + "GPU 2 hop chain time: 2.2507 \n", + " 2 hop chain speedup: 9.6488\n", + "\n", + "CPU 8 hop chain time: 157.5035 \n", + "GPU 8 hop chain time: 7.8694 \n", + " 8 hop chain speedup: 20.0147\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "and similarly for these `hop` operations -- **10-40x** speed increases" + ], + "metadata": { + "id": "gHHhyYlzArjw" + } + }, + { + "cell_type": "code", + "source": [ + "for n_hop in [1,2,8]:\n", + " start_nodes = pd.DataFrame({g._node: [17116707]})\n", + " start0 = time.time()\n", + " for i in range(10):\n", + " g2 = g.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " start_nodes = cudf.DataFrame({g._node: [17116707]})\n", + " g_gdf = g.nodes(cudf.from_pandas(g._nodes)).edges(cudf.from_pandas(g._edges))\n", + " start1 = time.time()\n", + " for i in range(10):\n", + " g2 = g_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=5)\n", + " end1 = time.time()\n", + " # print(fg._nodes.shape, fg._edges.shape)\n", + " # print(fg2._nodes.shape, fg2._edges.shape)\n", + " del start_nodes\n", + " del g_gdf\n", + " del g2\n", + " T1 = end1-start1\n", + " print('\\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "cnILbPnG7tf4", + "outputId": "bd4aa370-3b54-4c18-c92d-7aba957f658a" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "CPU 1 hop chain time: 12.3446 \n", + "GPU 1 hop chain time: 1.204 \n", + "speedup: 10.2526\n", + "\n", + "CPU 2 hop chain time: 13.2377 \n", + "GPU 2 hop chain time: 1.1608 \n", + "speedup: 11.4037\n", + "\n", + "CPU 8 hop chain time: 52.2491 \n", + "GPU 8 hop chain time: 1.2148 \n", + "speedup: 43.012\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9dZzAAVONCD2" + }, + "source": [ + "\n", + "## GPlus\n", + "\n", + "- edges: 30494866\n", + "- nodes: 107614" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "-nhWGNekKpcZ", + "outputId": "0ce5e0c8-e5a4-4e8e-b595-3c544192bf24" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "--2024-02-20 09:59:24-- https://snap.stanford.edu/data/gplus_combined.txt.gz\n", + "Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80\n", + "Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 398930514 (380M) [application/x-gzip]\n", + "Saving to: ‘gplus_combined.txt.gz’\n", + "\n", + "gplus_combined.txt. 100%[===================>] 380.45M 39.7MB/s in 9.9s \n", + "\n", + "2024-02-20 09:59:34 (38.5 MB/s) - ‘gplus_combined.txt.gz’ saved [398930514/398930514]\n", + "\n" + ] + } + ], + "source": [ + "! wget https://snap.stanford.edu/data/gplus_combined.txt.gz" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "g5wgA_c2KqwJ" + }, + "outputs": [], + "source": [ + "! gunzip gplus_combined.txt.gz" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "52hgDbr0Kti6", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 260 + }, + "outputId": "1b81b614-f1b9-4031-db25-42ed9500c9c7" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(30494866, 2)\n", + "CPU times: user 16.8 s, sys: 1.41 s, total: 18.2 s\n", + "Wall time: 18.4 s\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " s d\n", + "0 116374117927631468606 101765416973555767821\n", + "1 112188647432305746617 107727150903234299458\n", + "2 116719211656774388392 100432456209427807893\n", + "3 117421021456205115327 101096322838605097368\n", + "4 116407635616074189669 113556266482860931616" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sd
0116374117927631468606101765416973555767821
1112188647432305746617107727150903234299458
2116719211656774388392100432456209427807893
3117421021456205115327101096322838605097368
4116407635616074189669113556266482860931616
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"s\",\n \"properties\": {\n \"dtype\": \"string\",\n \"samples\": [\n \"112188647432305746617\",\n \"116407635616074189669\",\n \"116719211656774388392\"\n ],\n \"num_unique_values\": 5,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"d\",\n \"properties\": {\n \"dtype\": \"string\",\n \"samples\": [\n \"107727150903234299458\",\n \"113556266482860931616\",\n \"100432456209427807893\"\n ],\n \"num_unique_values\": 5,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 41 + } + ], + "source": [ + "%%time\n", + "ge_df = pd.read_csv('gplus_combined.txt', sep=' ', names=['s', 'd'])\n", + "print(ge_df.shape)\n", + "ge_df.head(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "w5YkN-nLK6UV", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 260 + }, + "outputId": "9c818e80-34b1-431a-d965-3305b45c1bb2" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(30494866, 2) (107614, 1)\n", + "CPU times: user 4.41 s, sys: 1.29 s, total: 5.7 s\n", + "Wall time: 5.69 s\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " id\n", + "0 116374117927631468606\n", + "1 112188647432305746617\n", + "2 116719211656774388392\n", + "3 117421021456205115327\n", + "4 116407635616074189669" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id
0116374117927631468606
1112188647432305746617
2116719211656774388392
3117421021456205115327
4116407635616074189669
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"string\",\n \"samples\": [\n \"112188647432305746617\",\n \"116407635616074189669\",\n \"116719211656774388392\"\n ],\n \"num_unique_values\": 5,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 42 + } + ], + "source": [ + "%%time\n", + "gg = graphistry.edges(ge_df, 's', 'd').materialize_nodes()\n", + "gg = graphistry.edges(ge_df, 's', 'd').nodes(gg._nodes, 'id')\n", + "print(gg._edges.shape, gg._nodes.shape)\n", + "gg._nodes.head(5)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "NKtz54uELX-8", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 116 + }, + "outputId": "aaccd673-85c1-40a4-ccdc-f13388e7a01a" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "CPU times: user 471 ms, sys: 307 ms, total: 779 ms\n", + "Wall time: 776 ms\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " id\n", + "0 116374117927631468606" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id
0116374117927631468606
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 1,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"string\",\n \"samples\": [\n \"116374117927631468606\"\n ],\n \"num_unique_values\": 1,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 43 + } + ], + "source": [ + "%%time\n", + "gg.chain([ n({'id': '116374117927631468606'})])._nodes" + ] + }, + { + "cell_type": "markdown", + "source": [ + "on the GPlus data, simpler `chain` operations over several different hops -- **100-200x** speed increases" + ], + "metadata": { + "id": "e4ZchWvrBKdY" + } + }, + { + "cell_type": "code", + "source": [ + "for n_hop in [1,2,3,4,5]:\n", + " start_nodes = pd.DataFrame({fg._node: [0]})\n", + " start0 = time.time()\n", + " out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=n_hop)])._nodes\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + " start1 = time.time()\n", + " out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=n_hop)])\n", + " end1 = time.time()\n", + " # print(fg._nodes.shape, fg._edges.shape)\n", + " # print(fg2._nodes.shape, fg2._edges.shape)\n", + " del gg_gdf\n", + " del out\n", + " T1 = end1-start1\n", + " print('\\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 507 + }, + "id": "fTnU8MLr8tV5", + "outputId": "40a2f455-9d79-4a3c-abeb-aac93713a424" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "CPU 1 hop chain time: 70.7013 \n", + "GPU 1 hop chain time: 0.2911 \n", + "speedup: 242.9049\n", + "\n", + "CPU 2 hop chain time: 84.2395 \n", + "GPU 2 hop chain time: 0.6138 \n", + "speedup: 137.252\n" + ] + }, + { + "output_type": "error", + "ename": "KeyboardInterrupt", + "evalue": "", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mstart_nodes\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mDataFrame\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m{\u001b[0m\u001b[0mfg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_node\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mstart0\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mout\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mchain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m \u001b[0mn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m{\u001b[0m\u001b[0;34m'id'\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'116374117927631468606'\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0me_forward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mhops\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mn_hop\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_nodes\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0mend0\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mT0\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mend0\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0mstart0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/graphistry/compute/ComputeMixin.py\u001b[0m in \u001b[0;36mchain\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 391\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 392\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mchain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 393\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mchain_base\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 394\u001b[0m \u001b[0mchain\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__doc__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mchain_base\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__doc__\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/graphistry/compute/chain.py\u001b[0m in \u001b[0;36mchain\u001b[0;34m(self, ops, engine)\u001b[0m\n\u001b[1;32m 285\u001b[0m )\n\u001b[1;32m 286\u001b[0m g_step = (\n\u001b[0;32m--> 287\u001b[0;31m op(\n\u001b[0m\u001b[1;32m 288\u001b[0m \u001b[0mg\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;31m# transition via any original edge\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 289\u001b[0m \u001b[0mprev_node_wavefront\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mprev_step_nodes\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/graphistry/compute/ast.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, g, prev_node_wavefront, target_wave_front, engine)\u001b[0m\n\u001b[1;32m 326\u001b[0m \u001b[0mlogger\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdebug\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'----------------------------------------'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 327\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 328\u001b[0;31m out_g = g.hop(\n\u001b[0m\u001b[1;32m 329\u001b[0m \u001b[0mnodes\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mprev_node_wavefront\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 330\u001b[0m \u001b[0mhops\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhops\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/graphistry/compute/ComputeMixin.py\u001b[0m in \u001b[0;36mhop\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 379\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 380\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mhop\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 381\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mhop_base\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 382\u001b[0m \u001b[0mhop\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__doc__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mhop_base\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__doc__\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 383\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/graphistry/compute/hop.py\u001b[0m in \u001b[0;36mhop\u001b[0;34m(self, nodes, hops, to_fixed_point, direction, edge_match, source_node_match, destination_node_match, source_node_query, destination_node_query, edge_query, return_as_wave_front, target_wave_front, engine)\u001b[0m\n\u001b[1;32m 188\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mdirection\u001b[0m \u001b[0;32min\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m'forward'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'undirected'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 189\u001b[0m hop_edges_forward = (\n\u001b[0;32m--> 190\u001b[0;31m wave_front_iter.merge(\n\u001b[0m\u001b[1;32m 191\u001b[0m \u001b[0medges_indexed\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mg2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_source\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mg2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_destination\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mEDGE_ID\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0massign\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0;34m{\u001b[0m\u001b[0mg2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_node\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0medges_indexed\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mg2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_source\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 192\u001b[0m \u001b[0mhow\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'inner'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py\u001b[0m in \u001b[0;36mmerge\u001b[0;34m(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)\u001b[0m\n\u001b[1;32m 10091\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mpandas\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcore\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmerge\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mmerge\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10092\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m> 10093\u001b[0;31m return merge(\n\u001b[0m\u001b[1;32m 10094\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10095\u001b[0m \u001b[0mright\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py\u001b[0m in \u001b[0;36mmerge\u001b[0;34m(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)\u001b[0m\n\u001b[1;32m 122\u001b[0m \u001b[0mvalidate\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mvalidate\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 123\u001b[0m )\n\u001b[0;32m--> 124\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mop\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcopy\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcopy\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 125\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 126\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py\u001b[0m in \u001b[0;36mget_result\u001b[0;34m(self, copy)\u001b[0m\n\u001b[1;32m 771\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mleft\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mright\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_indicator_pre_merge\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mleft\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mright\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 772\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 773\u001b[0;31m \u001b[0mjoin_index\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mleft_indexer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mright_indexer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_join_info\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 774\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 775\u001b[0m result = self._reindex_and_concat(\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py\u001b[0m in \u001b[0;36m_get_join_info\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 1024\u001b[0m )\n\u001b[1;32m 1025\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1026\u001b[0;31m \u001b[0;34m(\u001b[0m\u001b[0mleft_indexer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mright_indexer\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_join_indexers\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1027\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1028\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mright_index\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py\u001b[0m in \u001b[0;36m_get_join_indexers\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 998\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_join_indexers\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m->\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mnpt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mNDArray\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mintp\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnpt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mNDArray\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mintp\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 999\u001b[0m \u001b[0;34m\"\"\"return the join indexers\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1000\u001b[0;31m return get_join_indexers(\n\u001b[0m\u001b[1;32m 1001\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mleft_join_keys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mright_join_keys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msort\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msort\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhow\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhow\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1002\u001b[0m )\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py\u001b[0m in \u001b[0;36mget_join_indexers\u001b[0;34m(left_keys, right_keys, sort, how, **kwargs)\u001b[0m\n\u001b[1;32m 1583\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mn\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mleft_keys\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1584\u001b[0m )\n\u001b[0;32m-> 1585\u001b[0;31m \u001b[0mzipped\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mzip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0mmapped\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1586\u001b[0m \u001b[0mllab\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrlab\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mshape\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mx\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mzipped\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1587\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py\u001b[0m in \u001b[0;36m\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m 1580\u001b[0m \u001b[0;31m# get left & right join labels and num. of levels at each location\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1581\u001b[0m mapped = (\n\u001b[0;32m-> 1582\u001b[0;31m \u001b[0m_factorize_keys\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mleft_keys\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mright_keys\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msort\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msort\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhow\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mhow\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1583\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mn\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mleft_keys\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1584\u001b[0m )\n", + "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py\u001b[0m in \u001b[0;36m_factorize_keys\u001b[0;34m(lk, rk, sort, how)\u001b[0m\n\u001b[1;32m 2331\u001b[0m \u001b[0;31m# \"Union[ndarray[Any, dtype[signedinteger[_64Bit]]],\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2332\u001b[0m \u001b[0;31m# ndarray[Any, dtype[object_]]]\"; expected \"ndarray[Any, dtype[object_]]\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2333\u001b[0;31m \u001b[0mrlab\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrizer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfactorize\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrk\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# type: ignore[arg-type]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2334\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mllab\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mintp\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mllab\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2335\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mrlab\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mintp\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrlab\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", + "\u001b[0;31mKeyboardInterrupt\u001b[0m: " + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "and similarly for these hop operations -- **100x** speed increases" + ], + "metadata": { + "id": "80bs6Y5pBWb2" + } + }, + { + "cell_type": "code", + "source": [ + "for n_hop in [1,2,3,4,5]:\n", + " start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", + " start0 = time.time()\n", + " for i in range(1):\n", + " g2 = gg.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", + " gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", + " start1 = time.time()\n", + " for i in range(1):\n", + " g2 = gg_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end1 = time.time()\n", + " # print(fg._nodes.shape, fg._edges.shape)\n", + " # print(fg2._nodes.shape, fg2._edges.shape)\n", + " del start_nodes\n", + " del gg_gdf\n", + " del g2\n", + " T1 = end1-start1\n", + " print('\\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "N2-gDFod9vc3", + "outputId": "c6967f1f-fa01-41a6-a776-02e2892f300f" + }, + "execution_count": null, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\n", + "CPU 1 hop chain time: 38.0714 \n", + "GPU 1 hop chain time: 0.2615 \n", + "speedup: 145.5678\n", + "\n", + "CPU 2 hop chain time: 52.949 \n", + "GPU 2 hop chain time: 0.4553 \n", + "speedup: 116.2876\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "R03M_swxarKC" + }, + "source": [ + "\n", + "## Orkut\n", + "- 117M edges\n", + "- 3M nodes" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "QoabYR2maxPo", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "f58e2837-9417-490b-882c-b1f478ed53f8" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "--2024-02-19 06:02:00-- https://snap.stanford.edu/data/bigdata/communities/com-orkut.ungraph.txt.gz\n", + "Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80\n", + "Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 447251958 (427M) [application/x-gzip]\n", + "Saving to: ‘com-orkut.ungraph.txt.gz’\n", + "\n", + "com-orkut.ungraph.t 100%[===================>] 426.53M 31.8MB/s in 11s \n", + "\n", + "2024-02-19 06:02:11 (37.4 MB/s) - ‘com-orkut.ungraph.txt.gz’ saved [447251958/447251958]\n", + "\n" + ] + } + ], + "source": [ + "! wget https://snap.stanford.edu/data/bigdata/communities/com-orkut.ungraph.txt.gz" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "BvvfFPKWbAVJ" + }, + "outputs": [], + "source": [ + "! gunzip com-orkut.ungraph.txt.gz" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "YsWwRoPqbPIb", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "5c12501a-e724-44d5-f651-e1860a8638af" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "# Undirected graph: ../../data/output/orkut.txt\n", + "# Orkut\n", + "# Nodes: 3072441 Edges: 117185083\n", + "# FromNodeId\tToNodeId\n", + "1\t2\n", + "1\t3\n", + "1\t4\n" + ] + } + ], + "source": [ + "! head -n 7 com-orkut.ungraph.txt" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "cbMC8r2ldjbW", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "87ee2818-72d9-46e7-d0d8-5d16958e10c8" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "('24.02.01', '0.33.0')" + ] + }, + "metadata": {}, + "execution_count": 8 + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "import graphistry\n", + "\n", + "from graphistry import (\n", + "\n", + " # graph operators\n", + " n, e_undirected, e_forward, e_reverse,\n", + "\n", + " # attribute predicates\n", + " is_in, ge, startswith, contains, match as match_re\n", + ")\n", + "\n", + "import cudf\n", + "\n", + "#work around google colab shell encoding bugs\n", + "import locale\n", + "locale.getpreferredencoding = lambda: \"UTF-8\"\n", + "\n", + "cudf.__version__, graphistry.__version__" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "TopFxAvnh_Cv", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "52288638-691e-47e4-87fc-4f761a3e9302" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Mon Feb 19 06:02:29 2024 \n", + "+---------------------------------------------------------------------------------------+\n", + "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", + "|-----------------------------------------+----------------------+----------------------+\n", + "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", + "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", + "| | | MIG M. |\n", + "|=========================================+======================+======================|\n", + "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", + "| N/A 64C P0 29W / 70W | 111MiB / 15360MiB | 0% Default |\n", + "| | | N/A |\n", + "+-----------------------------------------+----------------------+----------------------+\n", + " \n", + "+---------------------------------------------------------------------------------------+\n", + "| Processes: |\n", + "| GPU GI CI PID Type Process name GPU Memory |\n", + "| ID ID Usage |\n", + "|=======================================================================================|\n", + "+---------------------------------------------------------------------------------------+\n" + ] + } + ], + "source": [ + "! nvidia-smi" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Oczs87ITbJgw", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "b894db9b-13d1-426a-b162-dea780abce3b" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(117185082, 2)\n", + " s d\n", + "0 1 3\n", + "1 1 4\n", + "2 1 5\n", + "3 1 6\n", + "4 1 7\n", + "s int64\n", + "d int64\n", + "dtype: object\n", + "CPU times: user 2.34 s, sys: 1.29 s, total: 3.63 s\n", + "Wall time: 3.77 s\n" + ] + } + ], + "source": [ + "%%time\n", + "co_df = cudf.read_csv('com-orkut.ungraph.txt', sep='\\t', names=['s', 'd'], skiprows=5).to_pandas()\n", + "print(co_df.shape)\n", + "print(co_df.head(5))\n", + "print(co_df.dtypes)\n", + "#del co_df" + ] + }, + { + "cell_type": "markdown", + "source": [ + "from load into gpu and back to cpu again" + ], + "metadata": { + "id": "2QLDI3vdAtkf" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gGSDjTtveFAT", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 260 + }, + "outputId": "d8501df9-7070-4bfa-c5e2-49a581df5f56" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "(3072441, 1) (117185082, 2)\n", + "CPU times: user 2.06 s, sys: 7.93 s, total: 10 s\n", + "Wall time: 11.2 s\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " id\n", + "0 1\n", + "1 2\n", + "2 3\n", + "3 4\n", + "4 5" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id
01
12
23
34
45
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 1,\n \"max\": 5,\n \"samples\": [\n 2,\n 5,\n 3\n ],\n \"num_unique_values\": 5,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 11 + } + ], + "source": [ + "%%time\n", + "co_g = graphistry.edges(cudf.DataFrame(co_df), 's', 'd').materialize_nodes(engine='cudf')\n", + "co_g = co_g.nodes(lambda g: g._nodes.to_pandas()).edges(lambda g: g._edges.to_pandas())\n", + "print(co_g._nodes.shape, co_g._edges.shape)\n", + "co_g._nodes.head(5)" + ] + }, + { + "cell_type": "markdown", + "source": [ + "on the Orkut data, simpler chain operations over several different hops -- **10-50x** speed increases" + ], + "metadata": { + "id": "G4f19-djBd7J" + } + }, + { + "cell_type": "code", + "source": [ + "for n_hop in [1,2,3,4,5,6]:\n", + " start_nodes = pd.DataFrame({fg._node: [0]})\n", + " start0 = time.time()\n", + " for i in range(10):\n", + " out = co_g.chain([ n({'id': 1}), e_forward(hops=n_hop)])._nodes\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + " start1 = time.time()\n", + " for i in range(10):\n", + " out = co_gdf.chain([ n({'id': 1}), e_forward(hops=n_hop)]) end1 = time.time()\n", + " # print(fg._nodes.shape, fg._edges.shape)\n", + " # print(fg2._nodes.shape, fg2._edges.shape)\n", + " del co_gdf\n", + " del out\n", + " T1 = end1-start1\n", + " print('\\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))" + ], + "metadata": { + "id": "yWabsh_k-tgy" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "and similarly for these hop operations -- 10-40x speed increases" + ], + "metadata": { + "id": "v-gVx5trBeSl" + } + }, + { + "cell_type": "code", + "source": [ + "for n_hop in [1,2,3,4,5]:\n", + " start_nodes = pd.DataFrame({'id': [1]})\n", + " start0 = time.time()\n", + " for i in range(1):\n", + " g2 = co_g.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " start_nodes = cudf.DataFrame({'id': [1]})\n", + " co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + " start1 = time.time()\n", + " for i in range(1):\n", + " g2 = gg_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end1 = time.time()\n", + " # print(fg._nodes.shape, fg._edges.shape)\n", + " # print(fg2._nodes.shape, fg2._edges.shape)\n", + " del start_nodes\n", + " del co_gdf\n", + " del g2\n", + " T1 = end1-start1\n", + " print('\\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))" + ], + "metadata": { + "id": "kHZatWCB_qwd" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "eiXFImxF-rzw" + }, + "outputs": [], + "source": [ + "!lscpu\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "wJohLi58-sN5", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "07a499d6-3109-486e-d387-002abd133d22" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " total used free shared buff/cache available\n", + "Mem: 12Gi 5.8Gi 1.6Gi 1.0Gi 5.2Gi 5.5Gi\n", + "Swap: 0B 0B 0B\n" + ] + } + ], + "source": [ + "!free -h\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Ygc2nrkznlCu" + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [], + "toc_visible": true, + "include_colab_link": true + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file diff --git a/demos/gfql/simple_gfql_notebook.ipynb b/demos/gfql/simple_gfql_notebook.ipynb new file mode 100644 index 0000000000..5f27f7b619 --- /dev/null +++ b/demos/gfql/simple_gfql_notebook.ipynb @@ -0,0 +1,264 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [], + "gpuType": "T4", + "include_colab_link": true + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + }, + "accelerator": "GPU" + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "source": [ + "# simple GFQL demo on Twitter data\n", + "\n", + "* Twitter\tNetwork with 81,306 Nodes\tand 2,420,766 Edges\n", + "\n", + "* The single-threaded CPU mode benefits from GFQL's novel dataframe engine, and the GPU mode further adds single-GPU acceleration. Both the chain() and hop() methods are examined.\n", + "\n", + "* The benchmark does not examine bigger-than-memory and distributed scenarios. The provided results here are from running on a free Google Colab T4 runtime, with a 2.2GHz Intel CPU (12 GB CPU RAM) and T4 Nvidia GPU (16 GB GPU RAM)." + ], + "metadata": { + "id": "Sm80AgJOJ3-c" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Install, Import, Load" + ], + "metadata": { + "id": "g7s-qBKqE9eC" + } + }, + { + "cell_type": "code", + "source": [ + "# !pip install --extra-index-url=https://pypi.nvidia.com cuml-cu12 cudf-cu12\n", + "import cudf\n", + "cudf.__version__\n", + "\n", + "!pip install -q igraph\n", + "!pip install -q graphistry\n", + "\n", + "import pandas as pd\n", + "import graphistry, time, cProfile\n", + "\n", + "from graphistry import (\n", + "\n", + " # graph operators\n", + " n, e_undirected, e_forward, e_reverse,\n", + "\n", + " # attribute predicates\n", + " is_in, ge, startswith, contains, match as match_re\n", + ")\n", + "graphistry.__version__" + ], + "metadata": { + "id": "JTdSJgquBnGd", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 35 + }, + "outputId": "4d1da186-28ab-4236-a569-4dd8760c4715" + }, + "execution_count": 5, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "'0.33.0'" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "string" + } + }, + "metadata": {}, + "execution_count": 5 + } + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "yLJjLYAWbRXH", + "outputId": "e0177c6d-7d74-449a-c72b-28241516aaf0" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(81306, 1)" + ] + }, + "metadata": {}, + "execution_count": 2 + } + ], + "source": [ + "te_df = pd.read_csv('https://snap.stanford.edu/data/twitter_combined.txt.gz', sep=' ', names=['s', 'd'])\n", + "g = graphistry.edges(te_df, 's', 'd').materialize_nodes()" + ] + }, + { + "cell_type": "markdown", + "source": [ + "## .chain() CPU v GPU" + ], + "metadata": { + "id": "c3vuo0yVFDCs" + } + }, + { + "cell_type": "code", + "source": [ + "start = time.time()\n", + "\n", + "for i in range(10):\n", + " g2 = g.chain([n({'id': 17116707}), e_forward(hops=1)])\n", + "g2._nodes.shape, g2._edges.shape\n", + "\n", + "end1 = time.time()\n", + "T1 = end1 - start" + ], + "metadata": { + "id": "wEzyOOymCcsj" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "start = time.time()\n", + "\n", + "g_gdf = g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + "for i in range(10):\n", + " out = g_gdf.chain([n({'id': 17116707}), e_forward(hops=1)])._nodes\n", + "del g_gdf\n", + "del out\n", + "\n", + "end2 = time.time()\n", + "T2= end2 - start\n", + "print('CPU time:',T1, '\\nGPU time:', T2, '\\nspeedup:', T1/T2)" + ], + "metadata": { + "id": "yKoNh5UgClIr", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "5c9545e1-5a6c-45db-b0ab-a199939e8ebd" + }, + "execution_count": 17, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "CPU time: 17.837570190429688 \n", + "GPU time: 2.0647764205932617 \n", + "speedup: 8.638983868919091\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## .hop() CPU v GPU\n", + "\n", + "* simpler tasks can witness greater speedup\n", + "\n" + ], + "metadata": { + "id": "KrXZ7ajHFJ3z" + } + }, + { + "cell_type": "code", + "source": [ + "start = time.time()\n", + "start_nodes = pd.DataFrame({g._node: [17116707]})\n", + "for i in range(10):\n", + " g2 = g.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=8)\n", + "\n", + "end1 = time.time()\n", + "T1 = end1 - start" + ], + "metadata": { + "id": "CJt_8YTPCtZM" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "start = time.time()\n", + "start_nodes = cudf.DataFrame({g._node: [17116707]})\n", + "g_gdf = g.nodes(cudf.from_pandas(g._nodes)).edges(cudf.from_pandas(g._edges))\n", + "for i in range(10):\n", + " g2 = g_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " engine = 'cudf', # one can also set `engine = cudf`\n", + " hops=8)\n", + "del start_nodes\n", + "del g_gdf\n", + "del g2\n", + "\n", + "end2 = time.time()\n", + "T2= end2 - start\n", + "print('CPU time:',T1, '\\nGPU time:', T2, '\\nspeedup:', T1/T2)" + ], + "metadata": { + "id": "fOC7ODIeFTI6", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "ba349462-caee-4f42-8f45-fa7c883c54bc" + }, + "execution_count": 26, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "CPU time: 40.91506862640381 \n", + "GPU time: 2.8351004123687744 \n", + "speedup: 14.431611821543413\n" + ] + } + ] + } + ] +} \ No newline at end of file From ae4524237210121040ca1d583113f1a6f123a070 Mon Sep 17 00:00:00 2001 From: Daniel Date: Tue, 9 Jul 2024 16:11:46 +0200 Subject: [PATCH 2/5] revise initial notebook, print df, loop hops --- demos/gfql/benchmark_hops_cpu_gpu.ipynb | 5899 ++++++++++------------- demos/gfql/gfql_cpv_gpu_enchmark.ipynb | 2976 ------------ demos/gfql/simple_gfql_notebook.ipynb | 264 - 3 files changed, 2535 insertions(+), 6604 deletions(-) delete mode 100644 demos/gfql/gfql_cpv_gpu_enchmark.ipynb delete mode 100644 demos/gfql/simple_gfql_notebook.ipynb diff --git a/demos/gfql/benchmark_hops_cpu_gpu.ipynb b/demos/gfql/benchmark_hops_cpu_gpu.ipynb index bf17b630e7..869344af4c 100644 --- a/demos/gfql/benchmark_hops_cpu_gpu.ipynb +++ b/demos/gfql/benchmark_hops_cpu_gpu.ipynb @@ -1,27 +1,14 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [], - "gpuType": "T4" - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - }, - "accelerator": "GPU" - }, "cells": [ { "cell_type": "markdown", + "metadata": { + "id": "GZxoiU8sQDk_" + }, "source": [ "# GFQL CPU, GPU Benchmark\n", "\n", - "This notebook examines GFQL progerty graph query performance on 1-8 hop queries using CPU + GPU modes on various real-world 100K - 100M edge graphs. The data comes from a variety of popular social networks. The single-threaded CPU mode benefits from GFQL's novel dataframe engine, and the GPU mode further adds single-GPU acceleration. Both the `chain()` and `hop()` methods are examined.\n", + "This notebook examines GFQL property graph query performance on 1-8 hop queries using CPU + GPU modes on various real-world 100K - 100M edge graphs. The data comes from a variety of popular social networks. The single-threaded CPU mode benefits from GFQL's novel dataframe engine, and the GPU mode further adds single-GPU acceleration. Both the `chain()` and `hop()` methods are examined.\n", "\n", "The benchmark does not examine bigger-than-memory and distributed scenarios. The provided results here are from running on a free Google Colab T4 runtime, with a 2.2GHz Intel CPU (12 GB CPU RAM) and T4 Nvidia GPU (16 GB GPU RAM).\n", "\n", @@ -30,10 +17,10 @@ "\n", "| Network | Nodes | Edges |\n", "|-------------|-----------|--------------|\n", - "| **Facebook**| 4,039 | 88,234 |\n", - "| **Twitter** | 81,306 | 2,420,766 |\n", - "| **GPlus** | 107,614 | 30,494,866 |\n", - "| **Orkut** | 3,072,441 | 117,185,082 |\n", + "| [**Facebook**](#fb)| 4,039 | 88,234 |\n", + "| [**Twitter**](#tw) | 81,306 | 2,420,766 |\n", + "| [**GPlus**](#gpl) | 107,614 | 30,494,866 |\n", + "| [**Orkut**](#ork) | 3,072,441 | 117,185,082 |\n", "\n", "## Results\n", "\n", @@ -52,10 +39,10 @@ "\n", "| **Dataset** | Max GPU Speedup | CPU GTEPS | GPU GTEPS | T CPU edges / \\$ (t3.l) | T GPU edges / \\$ (g4dn.xl) |\n", "|-------------|--------------|-------------|-------------|----------------------------|--------------------------------|\n", - "| **Facebook**| 1.1X | 0.66 | 0.61 | 65.7 | 10.4 |\n", - "| **Twitter** | 17.4X | 0.17 | 2.81 | 16.7 | 48.1 |\n", - "| **GPlus** | 43.8X | 0.09 | 2.87 | 8.5 | 49.2 |\n", - "| **Orkut** | N/A | N/A | 12.15 | N/A | 208.3 |\n", + "| [**Facebook**](#fb)| 1.1X | 0.66 | 0.61 | 65.7 | 10.4 |\n", + "| [**Twitter**](#tw) | 17.4X | 0.17 | 2.81 | 16.7 | 48.1 |\n", + "| [**GPlus**](#gpl) | 43.8X | 0.09 | 2.87 | 8.5 | 49.2 |\n", + "| [**Orkut**](#ork) | N/A | N/A | 12.15 | N/A | 208.3 |\n", "| **AVG** | 20.7X | 0.30 | 4.61 | 30.3 | 79.0\n", "| **MAX** | 43.8X | 0.66 | 12.15 | 65.7 | 208.3\n", "\n", @@ -67,53 +54,46 @@ "\n", "| **Dataset** | Max GPU Speedup | CPU GTEPS | GPU GTEPS | T CPU edges / \\$ (t3.l) | T GPU edges / \\$ (g4dn.xl) |\n", "|-------------|-------------|-----------|-----------|--------------------|--------------------------------|\n", - "| **Facebook**| 3X | 0.47 | 1.47 | 47.0 | 25.2 |\n", - "| **Twitter** | 42X | 0.50 | 10.51 | 50.2 | 180.2 |\n", - "| **GPlus** | 21X | 0.26 | 4.11 | 26.2 | 70.4 |\n", - "| **Orkut** | N/A | N/A | 41.50 | N/A | 711.4 |\n", + "| [**Facebook**](#fb)| 3X | 0.47 | 1.47 | 47.0 | 25.2 |\n", + "| [**Twitter**](#tw) | 42X | 0.50 | 10.51 | 50.2 | 180.2 |\n", + "| [**GPlus**](#gpl) | 21X | 0.26 | 4.11 | 26.2 | 70.4 |\n", + "| [**Orkut**](#ork) | N/A | N/A | 41.50 | N/A | 711.4 |\n", "| **AVG** | 22X | 0.41 | 14.4 | 41.1 | 246.8\n", "| **MAX** | 42X | 0.50 | 41.50 | 50.2 | 711.4\n" - ], - "metadata": { - "id": "GZxoiU8sQDk_" - } + ] }, { "cell_type": "markdown", - "source": [ - "## Optional: GPU setup - Google Colab" - ], "metadata": { "id": "SAj8lhREEOwS" - } + }, + "source": [ + "## Optional: GPU setup - Google Colab" + ] }, { "cell_type": "markdown", - "source": [], "metadata": { "id": "4hrEEAAm7DTO" - } + }, + "source": [] }, { "cell_type": "code", - "source": [ - "# Report GPU used when GPU benchmarking\n", - "! nvidia-smi" - ], + "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "W2MF6ZsjDv3B", - "outputId": "46088cbc-2db9-4529-f724-dc57ed85dfb7" + "outputId": "ad2ab798-617d-49db-e379-5670debe4951" }, - "execution_count": 1, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "Tue Dec 26 00:50:30 2023 \n", + "Tue Jul 9 13:29:05 2024 \n", "+---------------------------------------------------------------------------------------+\n", "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", "|-----------------------------------------+----------------------+----------------------+\n", @@ -122,7 +102,7 @@ "| | | MIG M. |\n", "|=========================================+======================+======================|\n", "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 54C P8 10W / 70W | 0MiB / 15360MiB | 0% Default |\n", + "| N/A 41C P8 10W / 70W | 0MiB / 15360MiB | 0% Default |\n", "| | | N/A |\n", "+-----------------------------------------+----------------------+----------------------+\n", " \n", @@ -135,198 +115,176 @@ "+---------------------------------------------------------------------------------------+\n" ] } - ] - }, - { - "cell_type": "code", - "source": [ - "# if in google colab\n", - "!git clone https://github.com/rapidsai/rapidsai-csp-utils.git\n", - "!python rapidsai-csp-utils/colab/pip-install.py" ], - "metadata": { - "id": "Aikh0x4ID_wK" - }, - "execution_count": 8, - "outputs": [] + "source": [ + "# Report GPU used when GPU benchmarking\n", + "! nvidia-smi" + ] }, { "cell_type": "code", - "source": [ - "import cudf\n", - "cudf.__version__" - ], + "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "id": "Lwekdei1dH3N", - "outputId": "71f5b01d-7917-4283-8338-969167d6e1e8" + "outputId": "51562461-432e-4b8d-f697-0a6b559ac8b0" }, - "execution_count": 3, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "'23.12.01'" + "'24.04.01'" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" } }, "metadata": {}, - "execution_count": 3 + "execution_count": 2 } + ], + "source": [ + "import cudf\n", + "cudf.__version__" ] }, { "cell_type": "markdown", - "source": [ - "# 1. Install & configure" - ], "metadata": { "id": "QQpsrtwBT7sa" - } + }, + "source": [ + "# 1. Install & configure" + ] }, { "cell_type": "code", - "source": [ - "#! pip install graphistry[igraph]\n", - "\n", - "!pip install -q igraph\n", - "#!pip install -q git+https://github.com/graphistry/pygraphistry.git@dev/cugfql\n", - "!pip install -q graphistry\n" - ], + "execution_count": 3, "metadata": { - "id": "cYjRbgkU9Sx8", "colab": { "base_uri": "https://localhost:8080/" }, - "outputId": "2cf25531-9b8b-4715-ccc7-e79094d84ebd" + "id": "cYjRbgkU9Sx8", + "outputId": "c8e454a2-e537-467e-afc6-830c51ad869c" }, - "execution_count": 2, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n" + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.1/3.1 MB\u001b[0m \u001b[31m13.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m250.5/250.5 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m332.3/332.3 kB\u001b[0m \u001b[31m9.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h" ] } + ], + "source": [ + "#! pip install graphistry[igraph]\n", + "\n", + "!pip install -q igraph\n", + "!pip install -q graphistry\n" ] }, { "cell_type": "markdown", - "source": [ - "## Imports" - ], "metadata": { "id": "Ff6Tt9DhkePl" - } + }, + "source": [ + "## Imports" + ] }, { "cell_type": "code", - "source": [ - "import pandas as pd\n", - "\n", - "import graphistry\n", - "\n", - "from graphistry import (\n", - "\n", - " # graph operators\n", - " n, e_undirected, e_forward, e_reverse,\n", - "\n", - " # attribute predicates\n", - " is_in, ge, startswith, contains, match as match_re\n", - ")\n", - "graphistry.__version__" - ], + "execution_count": 4, "metadata": { - "id": "S5_y0CbLkjft", "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, - "outputId": "a68a9c4b-c9c5-4b8b-ea4f-7bf1e4ddf315" + "id": "S5_y0CbLkjft", + "outputId": "c8afe192-51c8-45d2-a79e-c1902200e6a3" }, - "execution_count": 3, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "'0.32.0+12.g72e778c'" + "'0.33.9'" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "string" } }, "metadata": {}, - "execution_count": 3 + "execution_count": 4 } + ], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "import graphistry, time\n", + "\n", + "from graphistry import (\n", + "\n", + " # graph operators\n", + " n, e_undirected, e_forward, e_reverse,\n", + "\n", + " # attribute predicates\n", + " is_in, ge, startswith, contains, match as match_re\n", + ")\n", + "graphistry.__version__" ] }, { "cell_type": "code", - "source": [ - "import cudf" - ], + "execution_count": 5, "metadata": { - "id": "I7Fg75jsG4co" + "id": "uLZKph2-a5M4" }, - "execution_count": 6, - "outputs": [] - }, - { - "cell_type": "code", + "outputs": [], "source": [ "#work around google colab shell encoding bugs\n", "\n", "import locale\n", "locale.getpreferredencoding = lambda: \"UTF-8\"" - ], - "metadata": { - "id": "uLZKph2-a5M4" - }, - "execution_count": 7, - "outputs": [] + ] }, { "cell_type": "markdown", - "source": [ - "# 2. Perf benchmarks" - ], "metadata": { "id": "eU9SyauNUHtR" - } + }, + "source": [ + "# 2. Perf benchmarks" + ] }, { "cell_type": "markdown", - "source": [ - "### Facebook: 88K edges" - ], "metadata": { "id": "NA0Ym11fkB8j" - } + }, + "source": [ + "\n", + "### Facebook: 88K edges" + ] }, { "cell_type": "code", - "source": [ - "df = pd.read_csv('https://raw.githubusercontent.com/graphistry/pygraphistry/master/demos/data/facebook_combined.txt', sep=' ', names=['s', 'd'])\n", - "print(df.shape)\n", - "df.head(5)" - ], + "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 224 }, "id": "vXuQogHekClJ", - "outputId": "64db92c0-2704-438b-d0e4-25865acbb5e9" + "outputId": "e984cfbd-ad39-4902-918d-598f342e6f06" }, - "execution_count": 10, "outputs": [ { "output_type": "stream", @@ -348,7 +306,7 @@ ], "text/html": [ "\n", - "
\n", + "
\n", "
\n", "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
sd
0116374117927631468606101765416973555767821
1112188647432305746617107727150903234299458
2116719211656774388392100432456209427807893
3117421021456205115327101096322838605097368
4116407635616074189669113556266482860931616
\n", - "
\n", - "
\n", - "\n", - "
\n", - " \n", - "\n", - " \n", - "\n", - " \n", - "
\n", - "\n", - "\n", - "
\n", - " \n", - "\n", - "\n", - "\n", - " \n", - "
\n", - "
\n", - "
\n" - ] - }, - "metadata": {}, - "execution_count": 6 - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "gg = graphistry.edges(ge_df, 's', 'd').materialize_nodes()\n", - "gg = graphistry.edges(ge_df, 's', 'd').nodes(gg._nodes, 'id')\n", - "print(gg._edges.shape, gg._nodes.shape)\n", - "gg._nodes.head(5)" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 258 - }, - "id": "w5YkN-nLK6UV", - "outputId": "dc98380d-54c2-4b36-c56e-5e8401c4ffa4" - }, - "execution_count": 7, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(30494866, 2) (107614, 1)\n", - "CPU times: user 4.49 s, sys: 1.25 s, total: 5.74 s\n", - "Wall time: 5.97 s\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - " id\n", - "0 116374117927631468606\n", - "1 112188647432305746617\n", - "2 116719211656774388392\n", - "3 117421021456205115327\n", - "4 116407635616074189669" + " 0 1\n", + "hops 2 5\n", + "CPU n_notation time (s) 11.8076 25.4098\n", + "GPU n_notation time (s) 10.3238 14.4829\n", + "n_notation speedup 1.1437 1.7545\n", + "CPU source_node_match time (s) 12.0969 10.2662\n", + "GPU source_node_match time (s) 11.2681 11.199\n", + "source_node_match speedup 1.0736 0.9167" ], "text/html": [ "\n", - "
\n", + "
\n", "
\n", "\n", - "\n", - " \n", - "
\n", - "
\n", - "
\n" - ] - }, - "metadata": {}, - "execution_count": 7 - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "gg.chain([ n({'id': '116374117927631468606'})])._nodes" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 115 - }, - "id": "NKtz54uELX-8", - "outputId": "5d8f3eef-893d-47cc-e7a9-c5cbfec8270c" - }, - "execution_count": 49, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "CPU times: user 534 ms, sys: 598 ms, total: 1.13 s\n", - "Wall time: 1.65 s\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - " id\n", - "0 116374117927631468606" - ], - "text/html": [ - "\n", - "
\n", - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
id
0116374117927631468606
\n", - "
\n", - "
\n", - "\n", - "
\n", - " \n", + " animation:\n", + " spin 1s steps(1) infinite;\n", + " }\n", "\n", - " \n", - "\n", - " \n", - "
\n", + " quickchartButtonEl.classList.remove('colab-df-spinner');\n", + " quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n", + " }\n", + " (() => {\n", + " let quickchartButtonEl =\n", + " document.querySelector('#df-0a712080-5f34-4df9-a79a-4e3af31230b0 button');\n", + " quickchartButtonEl.style.display =\n", + " google.colab.kernel.accessAllowed ? 'block' : 'none';\n", + " })();\n", + " \n", + "
\n", "\n", "
\n", "
\n" - ] - }, - "metadata": {}, - "execution_count": 49 - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=1)])._nodes\n", - "out.shape" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "iNWdi00VLmZG", - "outputId": "ecfb56a6-c564-4bf6-f43f-2c95a103f4be" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "CPU times: user 27.5 s, sys: 11.1 s, total: 38.5 s\n", - "Wall time: 39.5 s\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "(1473, 1)" - ] - }, - "metadata": {}, - "execution_count": 75 - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=1)])\n", - "print(out._nodes.shape, out._edges.shape)\n", - "del gg_gdf\n", - "del out" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "Q6p3h6uCOABh", - "outputId": "817fc80f-ef5d-4070-eb48-a12344be709c" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(1473, 1) (13375, 2)\n", - "CPU times: user 4.57 s, sys: 2.11 s, total: 6.68 s\n", - "Wall time: 7.63 s\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=2)])._nodes\n", - "out.shape" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "6UdCcMdqLw-P", - "outputId": "70742c79-b22b-4db2-c548-cb1e25d572eb" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "CPU times: user 45.8 s, sys: 17 s, total: 1min 2s\n", - "Wall time: 1min 5s\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "(44073, 1)" - ] + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"results_df\",\n \"rows\": 7,\n \"fields\": [\n {\n \"column\": 0,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 1.0736,\n \"max\": 12.0969,\n \"num_unique_values\": 7,\n \"samples\": [\n 2,\n 11.8076,\n 11.2681\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 1,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.9167,\n \"max\": 25.4098,\n \"num_unique_values\": 7,\n \"samples\": [\n 5,\n 25.4098,\n 11.199\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } }, "metadata": {}, - "execution_count": 77 - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=2)])\n", - "print(out._nodes.shape, out._edges.shape)\n", - "del gg_gdf\n", - "del out" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "QElqatDyNYCS", - "outputId": "0e15bd3e-d2d9-4965-df7d-c8856d036680" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(44073, 1) (2069325, 2)\n", - "CPU times: user 4.97 s, sys: 2.36 s, total: 7.34 s\n", - "Wall time: 10.6 s\n" - ] + "execution_count": 10 } ] }, { - "cell_type": "code", + "cell_type": "markdown", "source": [ - "%%time\n", - "out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=3)])._nodes\n", - "out.shape" + "and with simple 2 and 5 hop `hop` comparison we see a 2x speedup enabled by setting g. to `cudf`" ], "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "3HJOItZ4MQMG", - "outputId": "f5be7bb4-7f09-4f80-c549-e703e99f5067" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "CPU times: user 3min 45s, sys: 1min 5s, total: 4min 50s\n", - "Wall time: 4min 52s\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "(102414, 1)" - ] - }, - "metadata": {}, - "execution_count": 79 - } - ] + "id": "5-7M9sPEAf5Z" + } }, { "cell_type": "code", "source": [ - "%%time\n", - "gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=3)])\n", - "print(out._nodes.shape, out._edges.shape)\n", - "del gg_gdf\n", - "del out" + "results_df = pd.DataFrame(columns=['hops', 'CPU hop time (s)', 'GPU hop time (s)', 'n_notation speedup'])\n", + "\n", + "\n", + "\n", + "for n_hop in [2,5]:\n", + " start_nodes = pd.DataFrame({fg._node: [0]})\n", + " start0 = time.time()\n", + " for i in range(100):\n", + " fg2 = fg.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " start_nodes = cudf.DataFrame({fg._node: [0]})\n", + " fg_gdf = fg.nodes(cudf.from_pandas(fg._nodes)).edges(cudf.from_pandas(fg._edges))\n", + " start1 = time.time()\n", + " for i in range(100):\n", + " fg2 = fg_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end1 = time.time()\n", + "\n", + " del fg_gdf\n", + " del fg2\n", + " T1 = end1-start1\n", + "\n", + " new_row = pd.DataFrame({\n", + " 'hops': [n_hop],\n", + " 'CPU hop time (s)': [np.round(T0, 4)],\n", + " 'GPU hop time (s)': [np.round(T1, 4)],\n", + " 'n_notation speedup': [np.round(T0 / T1, 4)]\n", + " })\n", + "\n", + " results_df = pd.concat([results_df, new_row], ignore_index=True)\n", + "\n", + "# print(results_df)" ], "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "G32t_xthOUle", - "outputId": "7721741f-9c86-41aa-eb0b-2c8f0db2ed54" + "id": "Tki_0-_j3XKG" }, "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(102414, 1) (24851333, 2)\n", - "CPU times: user 6.95 s, sys: 2.63 s, total: 9.57 s\n", - "Wall time: 9.84 s\n" - ] - } - ] + "outputs": [] }, { "cell_type": "code", "source": [ - "%%time\n", - "out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=4)])\n", - "print(out._nodes.shape, out._edges.shape)" + "results_df.T" ], "metadata": { "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 175 }, - "id": "bXy2yyJsMsEG", - "outputId": "911f2680-067c-44f2-9ba2-7f27d3c9bc6b" + "id": "J_shIUugtU4D", + "outputId": "0877c04b-e1fc-4cdc-c928-54058ae184c8" }, - "execution_count": 8, + "execution_count": 13, "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "(105479, 1) (30450354, 2)\n", - "CPU times: user 4min 36s, sys: 1min 25s, total: 6min 2s\n", - "Wall time: 6min 4s\n" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + " 0 1\n", + "hops 2 5\n", + "CPU hop time (s) 5.8614 10.1756\n", + "GPU hop time (s) 2.3729 5.4458\n", + "n_notation speedup 2.4701 1.8685" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01
hops25
CPU hop time (s)5.861410.1756
GPU hop time (s)2.37295.4458
n_notation speedup2.47011.8685
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"results_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": 0,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 2,\n \"max\": 5.8614,\n \"num_unique_values\": 4,\n \"samples\": [\n 5.8614,\n 2.4701,\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 1,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 1.8685,\n \"max\": 10.1756,\n \"num_unique_values\": 4,\n \"samples\": [\n 10.1756,\n 1.8685,\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 13 } ] }, { - "cell_type": "code", - "source": [ - "%%time\n", - "gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=4)])\n", - "print(out._nodes.shape, out._edges.shape)\n", - "del gg_gdf\n", - "del out" - ], + "cell_type": "markdown", "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "Vt8hhjWDP_W_", - "outputId": "824ae644-e1cf-4239-bda9-84aecde52ad8" + "id": "KrJKjXy2KLos" }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(105479, 1) (30450354, 2)\n", - "CPU times: user 7.44 s, sys: 2.45 s, total: 9.88 s\n", - "Wall time: 9.9 s\n" - ] - } - ] - }, - { - "cell_type": "code", "source": [ - "%%time\n", - "out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=5)])\n", - "print(out._nodes.shape, out._edges.shape)" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "_z4KpNZaOH8t", - "outputId": "2417f78b-e1b7-452d-8e26-7df259620c88" - }, - "execution_count": 9, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(105604, 1) (30468335, 2)\n", - "CPU times: user 5min 36s, sys: 1min 39s, total: 7min 16s\n", - "Wall time: 7min 15s\n" - ] - } + "\n", + "## Twitter\n", + "\n", + "- edges: 2420766\n", + "- nodes: 81306" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=5)])\n", - "print(out._nodes.shape, out._edges.shape)\n", - "del gg_gdf\n", - "del out" - ], + "execution_count": 15, "metadata": { - "id": "spUBH9EHSz2O", "colab": { "base_uri": "https://localhost:8080/" }, - "outputId": "22340ce3-e8d4-4a72-b485-9839c667b965" + "id": "fO2qasGqpubr", + "outputId": "63c76f29-28ef-4e6d-ff83-13365d680632" }, - "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "(105604, 1) (30468335, 2)\n", - "CPU times: user 8.82 s, sys: 2.71 s, total: 11.5 s\n", - "Wall time: 11.9 s\n" + "--2024-07-09 13:36:53-- https://snap.stanford.edu/data/twitter_combined.txt.gz\n", + "Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80\n", + "Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:443... connected.\n", + "HTTP request sent, awaiting response... 200 OK\n", + "Length: 10621918 (10M) [application/x-gzip]\n", + "Saving to: ‘twitter_combined.txt.gz’\n", + "\n", + "twitter_combined.tx 100%[===================>] 10.13M 19.6MB/s in 0.5s \n", + "\n", + "2024-07-09 13:36:54 (19.6 MB/s) - ‘twitter_combined.txt.gz’ saved [10621918/10621918]\n", + "\n" ] } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", - "for i in range(1):\n", - " g2 = gg.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=1)\n", - "print(g2._nodes.shape, g2._edges.shape)" ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "vCsdmc62A7OM", - "outputId": "adc05d29-c628-49ed-cd6d-8921c6dcd206" - }, - "execution_count": 50, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(1473, 1) (13375, 2)\n", - "CPU times: user 19.9 s, sys: 9.36 s, total: 29.2 s\n", - "Wall time: 41.8 s\n" - ] - } - ] - }, - { - "cell_type": "code", "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", - "gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", - "for i in range(1):\n", - " g2 = gg_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=1)\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del gg_gdf\n", - "del g2" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "J3kV8NBYBQdW", - "outputId": "76073248-43e1-4c3c-c004-67324cc1d312" - }, - "execution_count": 52, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(1473, 1) (13375, 2)\n", - "CPU times: user 3.71 s, sys: 2.09 s, total: 5.8 s\n", - "Wall time: 6.05 s\n" - ] - } + "! wget 'https://snap.stanford.edu/data/twitter_combined.txt.gz'\n", + "#! curl -L 'https://snap.stanford.edu/data/twitter_combined.txt.gz' -o twitter_combined.txt.gz" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", - "for i in range(1):\n", - " g2 = gg.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=2)\n", - "print(g2._nodes.shape, g2._edges.shape)" - ], + "execution_count": 16, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "ONv1RQeWBeeK", - "outputId": "58d57fa4-be72-45bc-abfa-5de9d1102f55" + "id": "fn7zeA3SGlEo" }, - "execution_count": 53, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(44073, 1) (2069325, 2)\n", - "CPU times: user 27.8 s, sys: 13.2 s, total: 41 s\n", - "Wall time: 43.9 s\n" - ] - } - ] - }, - { - "cell_type": "code", + "outputs": [], "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", - "gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", - "for i in range(1):\n", - " g2 = gg_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=2)\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del gg_gdf\n", - "del g2" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "ke5SZZ01BgqR", - "outputId": "4173fd28-a11b-4300-d28b-6fdb87e8e9f3" - }, - "execution_count": 54, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(44073, 1) (2069325, 2)\n", - "CPU times: user 4.26 s, sys: 2.37 s, total: 6.63 s\n", - "Wall time: 7.91 s\n" - ] - } + "! gunzip twitter_combined.txt.gz" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", - "for i in range(1):\n", - " g2 = gg.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=3)\n", - "print(g2._nodes.shape, g2._edges.shape)" - ], + "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, - "id": "U795pIBUBiZV", - "outputId": "d499433c-cc0c-4bbf-c69f-36b5d55402d9" + "id": "68TAZkhLGz9g", + "outputId": "156f3da8-50c9-4e30-d1e5-9dba63e7f93d" }, - "execution_count": 55, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "(102414, 1) (24851333, 2)\n", - "CPU times: user 1min 3s, sys: 22.7 s, total: 1min 26s\n", - "Wall time: 1min 35s\n" + "214328887 34428380\n", + "17116707 28465635\n", + "380580781 18996905\n", + "221036078 153460275\n", + "107830991 17868918\n" ] } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", - "gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", - "for i in range(1):\n", - " g2 = gg_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=3)\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del gg_gdf\n", - "del g2" ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "kIZYwSe1Bj2e", - "outputId": "b7e1ed9f-47d1-412e-9593-ecc436ac1486" - }, - "execution_count": 56, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(102414, 1) (24851333, 2)\n", - "CPU times: user 3.96 s, sys: 2.11 s, total: 6.07 s\n", - "Wall time: 6.05 s\n" - ] - } - ] - }, - { - "cell_type": "code", "source": [ - "%%time\n", - "start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", - "for i in range(1):\n", - " g2 = gg.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=4)\n", - "print(g2._nodes.shape, g2._edges.shape)" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "YTI5sD6YBpYL", - "outputId": "b37bf2df-07dc-404c-8a83-a83f28e38bf6" - }, - "execution_count": 57, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(105479, 1) (30450354, 2)\n", - "CPU times: user 1min 34s, sys: 30.6 s, total: 2min 5s\n", - "Wall time: 2min 5s\n" - ] - } + "! head -n 5 twitter_combined.txt" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", - "gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", - "for i in range(1):\n", - " g2 = gg_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=4)\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del gg_gdf\n", - "del g2" - ], + "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, - "id": "d5WBazICBrSz", - "outputId": "ef95e893-3a0f-4d47-ede4-bd8a6faebf98" + "id": "QU2wNeGXG2GC", + "outputId": "86439d2a-e85a-4e84-ee53-dfa5e0e89ce2" }, - "execution_count": 58, "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "(105479, 1) (30450354, 2)\n", - "CPU times: user 5.25 s, sys: 2.41 s, total: 7.67 s\n", - "Wall time: 7.69 s\n" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + "(2420766, 2)" + ] + }, + "metadata": {}, + "execution_count": 18 } + ], + "source": [ + "te_df = pd.read_csv('twitter_combined.txt', sep=' ', names=['s', 'd'])\n", + "te_df.shape" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", - "for i in range(1):\n", - " g2 = gg.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=5)\n", - "print(g2._nodes.shape, g2._edges.shape)" - ], + "execution_count": 19, "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "ozQlRPaFBtPD", - "outputId": "4f1655c4-38fd-47f9-942d-836585e0d866" + "id": "EK5gQH2iG5UU" }, - "execution_count": 59, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(105604, 1) (30468335, 2)\n", - "CPU times: user 2min 16s, sys: 39.1 s, total: 2min 55s\n", - "Wall time: 2min 58s\n" - ] - } + "outputs": [], + "source": [ + "import graphistry" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", - "gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", - "for i in range(1):\n", - " g2 = gg_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=5)\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del gg_gdf\n", - "del g2" - ], + "execution_count": 20, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, - "id": "-ACkMG20B6HM", - "outputId": "f26c03a9-9f25-4f93-c7d3-0e8676694040" + "id": "ZtIW-eFGG_R4", + "outputId": "4082a078-3af9-4c4e-dc98-ddeee50a2489" }, - "execution_count": 60, "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "(105604, 1) (30468335, 2)\n", - "CPU times: user 5.79 s, sys: 2.51 s, total: 8.3 s\n", - "Wall time: 8.29 s\n" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + "(81306, 1)" + ] + }, + "metadata": {}, + "execution_count": 20 } + ], + "source": [ + "g = graphistry.edges(te_df, 's', 'd').materialize_nodes()\n", + "g._nodes.shape" ] }, { "cell_type": "markdown", "source": [ - "### Orkut\n", - "- 117M edges\n", - "- 3M nodes" + "on the twitter data, simpler `chain` operations over several different hops -- **10-20x** *italicized text* speed increases" ], "metadata": { - "id": "R03M_swxarKC" + "id": "yR9Qr8tGww3b" } }, { "cell_type": "code", "source": [ - "! wget https://snap.stanford.edu/data/bigdata/communities/com-orkut.ungraph.txt.gz" + "results_df = pd.DataFrame(columns=['hops', 'CPU hop chain time (s)', 'GPU hop chain time (s)', 'n_notation speedup'])\n", + "\n", + "\n", + "for n_hop in [1,2,8]:\n", + " start_nodes = pd.DataFrame({fg._node: [0]})\n", + " start0 = time.time()\n", + " for i in range(10):\n", + " g2 = g.chain([n({'id': 17116707}), e_forward(hops=n_hop)])\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " g_gdf = g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + " start1 = time.time()\n", + " for i in range(10):\n", + " out = g_gdf.chain([n({'id': 17116707}), e_forward(hops=n_hop)])._nodes\n", + " end1 = time.time()\n", + "\n", + " del g_gdf\n", + " del out\n", + " T1 = end1-start1\n", + "\n", + " new_row = pd.DataFrame({\n", + " 'hops': [n_hop],\n", + " 'CPU hop chain time (s)': [np.round(T0, 4)],\n", + " 'GPU hop chain time (s)': [np.round(T1, 4)],\n", + " 'n_notation speedup': [np.round(T0 / T1, 4)]\n", + " })\n", + "\n", + "\n", + " results_df = pd.concat([results_df, new_row], ignore_index=True)\n", + "\n", + "results_df.T" ], "metadata": { "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "QoabYR2maxPo", - "outputId": "2bb6275d-46bb-42da-ec05-d0e5a58b1f77" - }, - "execution_count": 8, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "--2023-12-26 00:55:52-- https://snap.stanford.edu/data/bigdata/communities/com-orkut.ungraph.txt.gz\n", - "Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80\n", - "Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 447251958 (427M) [application/x-gzip]\n", - "Saving to: ‘com-orkut.ungraph.txt.gz’\n", - "\n", - "com-orkut.ungraph.t 100%[===================>] 426.53M 45.1MB/s in 9.7s \n", - "\n", - "2023-12-26 00:56:02 (44.0 MB/s) - ‘com-orkut.ungraph.txt.gz’ saved [447251958/447251958]\n", - "\n" - ] + "base_uri": "https://localhost:8080/", + "height": 175 + }, + "id": "rCsvQJa-6U0x", + "outputId": "a8d52e0a-cd32-436d-a889-c997c6289055" + }, + "execution_count": 21, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " 0 1 2\n", + "hops 1 2 8\n", + "CPU hop chain time (s) 19.3802 17.21 84.5977\n", + "GPU hop chain time (s) 0.7395 1.5332 4.4011\n", + "n_notation speedup 26.2058 11.2246 19.2218" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
012
hops128
CPU hop chain time (s)19.380217.2184.5977
GPU hop chain time (s)0.73951.53324.4011
n_notation speedup26.205811.224619.2218
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"results_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": 0,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.7395,\n \"max\": 26.2058,\n \"num_unique_values\": 4,\n \"samples\": [\n 19.3802,\n 26.2058,\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 1,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 1.5332,\n \"max\": 17.21,\n \"num_unique_values\": 4,\n \"samples\": [\n 17.21,\n 11.2246,\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 2,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 4.4011,\n \"max\": 84.5977,\n \"num_unique_values\": 4,\n \"samples\": [\n 84.5977,\n 19.2218,\n 8\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 21 } ] }, { - "cell_type": "code", + "cell_type": "markdown", "source": [ - "! gunzip com-orkut.ungraph.txt.gz" + "and similarly for these `hop` operations -- **10-30x** speed increases" ], "metadata": { - "id": "BvvfFPKWbAVJ" - }, - "execution_count": 9, - "outputs": [] + "id": "gHHhyYlzArjw" + } }, { - "cell_type": "code", - "source": [ - "! head -n 7 com-orkut.ungraph.txt" - ], + "cell_type": "markdown", "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "YsWwRoPqbPIb", - "outputId": "2eb4f862-b4e1-42bf-ff5d-eec10b27cedc" + "id": "9dZzAAVONCD2" }, - "execution_count": 10, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "# Undirected graph: ../../data/output/orkut.txt\n", - "# Orkut\n", - "# Nodes: 3072441 Edges: 117185083\n", - "# FromNodeId\tToNodeId\n", - "1\t2\n", - "1\t3\n", - "1\t4\n" - ] - } + "source": [ + "\n", + "## GPlus\n", + "\n", + "- edges: 30494866\n", + "- nodes: 107614" ] }, { "cell_type": "code", "source": [ - "import pandas as pd\n", - "\n", - "import graphistry\n", + "results_df = pd.DataFrame(columns=['hops', 'CPU hop chain time (s)', 'GPU hop chain time (s)', 'n_notation speedup'])\n", "\n", - "from graphistry import (\n", "\n", - " # graph operators\n", - " n, e_undirected, e_forward, e_reverse,\n", + "for n_hop in [1,2,8]:\n", + " start_nodes = pd.DataFrame({g._node: [17116707]})\n", + " start0 = time.time()\n", + " for i in range(10):\n", + " g2 = g.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " start_nodes = cudf.DataFrame({g._node: [17116707]})\n", + " g_gdf = g.nodes(cudf.from_pandas(g._nodes)).edges(cudf.from_pandas(g._edges))\n", + " start1 = time.time()\n", + " for i in range(10):\n", + " g2 = g_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=5)\n", + " end1 = time.time()\n", "\n", - " # attribute predicates\n", - " is_in, ge, startswith, contains, match as match_re\n", - ")\n", + " del start_nodes\n", + " del g_gdf\n", + " del g2\n", + " T1 = end1-start1\n", "\n", - "import cudf\n", + " new_row = pd.DataFrame({\n", + " 'hops': [n_hop],\n", + " 'CPU hop chain time (s)': [np.round(T0, 4)],\n", + " 'GPU hop chain time (s)': [np.round(T1, 4)],\n", + " 'n_notation speedup': [np.round(T0 / T1, 4)]\n", + " })\n", "\n", - "#work around google colab shell encoding bugs\n", - "import locale\n", - "locale.getpreferredencoding = lambda: \"UTF-8\"\n", + " results_df = pd.concat([results_df, new_row], ignore_index=True)\n", "\n", - "cudf.__version__, graphistry.__version__" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "cbMC8r2ldjbW", - "outputId": "82688d53-7d56-4563-d65e-7c5cd32ac14e" - }, - "execution_count": 11, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "('23.12.01', '0.32.0+12.g72e778c')" - ] - }, - "metadata": {}, - "execution_count": 11 - } - ] - }, - { - "cell_type": "code", - "source": [ - "! nvidia-smi" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "TopFxAvnh_Cv", - "outputId": "cc9d9dc9-e594-4190-fe84-3f1b6dce8a1a" - }, - "execution_count": 12, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Tue Dec 26 00:56:27 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 47C P0 27W / 70W | 103MiB / 15360MiB | 0% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "co_df = cudf.read_csv('com-orkut.ungraph.txt', sep='\\t', names=['s', 'd'], skiprows=5).to_pandas()\n", - "print(co_df.shape)\n", - "print(co_df.head(5))\n", - "print(co_df.dtypes)\n", - "#del co_df" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "Oczs87ITbJgw", - "outputId": "ac203ddd-e684-4eb9-a586-f6a49fd1625d" - }, - "execution_count": 13, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(117185082, 2)\n", - " s d\n", - "0 1 3\n", - "1 1 4\n", - "2 1 5\n", - "3 1 6\n", - "4 1 7\n", - "s int64\n", - "d int64\n", - "dtype: object\n", - "CPU times: user 2.56 s, sys: 4.2 s, total: 6.76 s\n", - "Wall time: 6.76 s\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "co_g = graphistry.edges(cudf.DataFrame(co_df), 's', 'd').materialize_nodes(engine='cudf')\n", - "co_g = co_g.nodes(lambda g: g._nodes.to_pandas()).edges(lambda g: g._edges.to_pandas())\n", - "print(co_g._nodes.shape, co_g._edges.shape)\n", - "co_g._nodes.head(5)" + "(results_df.T)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", - "height": 258 + "height": 175 }, - "id": "gGSDjTtveFAT", - "outputId": "e7b38f4f-dc07-4f35-9bab-9c80a80bbf0b" + "id": "cnILbPnG7tf4", + "outputId": "3d2e0ca7-8b07-45ab-e020-222197639dc6" }, - "execution_count": 14, + "execution_count": 22, "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(3072441, 1) (117185082, 2)\n", - "CPU times: user 1.96 s, sys: 2.95 s, total: 4.91 s\n", - "Wall time: 4.92 s\n" - ] - }, { "output_type": "execute_result", "data": { "text/plain": [ - " id\n", - "0 1\n", - "1 2\n", - "2 3\n", - "3 4\n", - "4 5" + " 0 1 2\n", + "hops 1 2 8\n", + "CPU hop chain time (s) 18.8525 12.5991 43.39\n", + "GPU hop chain time (s) 1.0538 1.0413 1.4334\n", + "n_notation speedup 17.8901 12.0998 30.2698" ], "text/html": [ "\n", - "
\n", + "
\n", "
\n", "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
sd
0116374117927631468606101765416973555767821
1112188647432305746617107727150903234299458
2116719211656774388392100432456209427807893
3117421021456205115327101096322838605097368
4116407635616074189669113556266482860931616
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"s\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"112188647432305746617\",\n \"116407635616074189669\",\n \"116719211656774388392\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"d\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"107727150903234299458\",\n \"113556266482860931616\",\n \"100432456209427807893\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 25 } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " out = co_gdf.chain([ n({'id': 1}), e_forward(hops=4)])\n", - "! nvidia-smi\n", - "print(out._nodes.shape, out._edges.shape)\n", - "del co_gdf\n", - "del out" ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "buutj-ZjhrEe", - "outputId": "ae11addd-6bea-44e9-81c0-b431e1db8089" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Mon Dec 25 06:26:04 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 61C P0 29W / 70W | 1927MiB / 15360MiB | 36% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Mon Dec 25 06:26:13 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 65C P0 71W / 70W | 2931MiB / 15360MiB | 90% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(718640, 1) (2210961, 2)\n", - "CPU times: user 9.01 s, sys: 1.03 s, total: 10 s\n", - "Wall time: 9.84 s\n" - ] - } - ] - }, - { - "cell_type": "code", "source": [ "%%time\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " out = co_gdf.chain([ n({'id': 1}), e_forward(hops=5)])\n", - "! nvidia-smi\n", - "print(out._nodes.shape, out._edges.shape)\n", - "del co_gdf\n", - "del out" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "bK4C9Ly0hso-", - "outputId": "8a9a32ab-03e2-42b4-8b71-2bcf797b31b1" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Mon Dec 25 06:27:18 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 60C P0 29W / 70W | 1927MiB / 15360MiB | 28% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Mon Dec 25 06:27:57 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 72C P0 43W / 70W | 4351MiB / 15360MiB | 100% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(3041556, 1) (47622917, 2)\n", - "CPU times: user 34.9 s, sys: 4.76 s, total: 39.6 s\n", - "Wall time: 39.2 s\n" - ] - } + "ge_df = pd.read_csv('gplus_combined.txt', sep=' ', names=['s', 'd'])\n", + "print(ge_df.shape)\n", + "ge_df.head(5)" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "out = co_gdf.chain([ n({'id': 1}), e_forward(hops=6)])._nodes\n", - "print(out.shape)\n", - "del co_gdf\n", - "del out" - ], - "metadata": { - "id": "qrga-la0hwhh" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "!lscpu\n" - ], + "execution_count": 26, "metadata": { + "id": "w5YkN-nLK6UV", "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 260 }, - "id": "eiXFImxF-rzw", - "outputId": "b807cc3d-ed1a-4bef-c6e0-bfc2df7356ff" + "outputId": "89c4b0a5-a355-4558-8b3c-187b0efe471a" }, - "execution_count": null, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "Architecture: x86_64\n", - " CPU op-mode(s): 32-bit, 64-bit\n", - " Address sizes: 46 bits physical, 48 bits virtual\n", - " Byte Order: Little Endian\n", - "CPU(s): 2\n", - " On-line CPU(s) list: 0,1\n", - "Vendor ID: GenuineIntel\n", - " Model name: Intel(R) Xeon(R) CPU @ 2.20GHz\n", - " CPU family: 6\n", - " Model: 79\n", - " Thread(s) per core: 2\n", - " Core(s) per socket: 1\n", - " Socket(s): 1\n", - " Stepping: 0\n", - " BogoMIPS: 4399.99\n", - " Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clf\n", - " lush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_\n", - " good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fm\n", - " a cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hyp\n", - " ervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp fsgsb\n", - " ase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsa\n", - " veopt arat md_clear arch_capabilities\n", - "Virtualization features: \n", - " Hypervisor vendor: KVM\n", - " Virtualization type: full\n", - "Caches (sum of all): \n", - " L1d: 32 KiB (1 instance)\n", - " L1i: 32 KiB (1 instance)\n", - " L2: 256 KiB (1 instance)\n", - " L3: 55 MiB (1 instance)\n", - "NUMA: \n", - " NUMA node(s): 1\n", - " NUMA node0 CPU(s): 0,1\n", - "Vulnerabilities: \n", - " Gather data sampling: Not affected\n", - " Itlb multihit: Not affected\n", - " L1tf: Mitigation; PTE Inversion\n", - " Mds: Vulnerable; SMT Host state unknown\n", - " Meltdown: Vulnerable\n", - " Mmio stale data: Vulnerable\n", - " Retbleed: Vulnerable\n", - " Spec rstack overflow: Not affected\n", - " Spec store bypass: Vulnerable\n", - " Spectre v1: Vulnerable: __user pointer sanitization and usercopy barriers only; no swap\n", - " gs barriers\n", - " Spectre v2: Vulnerable, IBPB: disabled, STIBP: disabled, PBRSB-eIBRS: Not affected\n", - " Srbds: Not affected\n", - " Tsx async abort: Vulnerable\n" + "(30494866, 2) (107614, 1)\n", + "CPU times: user 5.14 s, sys: 1.08 s, total: 6.22 s\n", + "Wall time: 6.27 s\n" ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " id\n", + "0 116374117927631468606\n", + "1 112188647432305746617\n", + "2 116719211656774388392\n", + "3 117421021456205115327\n", + "4 116407635616074189669" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id
0116374117927631468606
1112188647432305746617
2116719211656774388392
3117421021456205115327
4116407635616074189669
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"112188647432305746617\",\n \"116407635616074189669\",\n \"116719211656774388392\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 26 } - ] - }, - { - "cell_type": "code", - "source": [ - "!free -h\n" ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "wJohLi58-sN5", - "outputId": "c3e144f6-c19a-4c68-e867-f5e7fa2e9df4" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - " total used free shared buff/cache available\n", - "Mem: 12Gi 717Mi 8.0Gi 1.0Mi 3.9Gi 11Gi\n", - "Swap: 0B 0B 0B\n" - ] - } - ] - }, - { - "cell_type": "code", "source": [ "%%time\n", - "start_nodes = pd.DataFrame({'id': [1]})\n", - "! nvidia-smi\n", - "for i in range(1):\n", - " g2 = co_g.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=1)\n", - "! nvidia-smi\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "#del start_nodes\n", - "#del co_gdf\n", - "#del g2" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "zak4Inhco5il", - "outputId": "30bcf2bc-853e-4e5e-8c57-ba0cd9429554" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Tue Dec 26 01:01:43 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 64C P0 30W / 70W | 2821MiB / 15360MiB | 0% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n" - ] - } + "gg = graphistry.edges(ge_df, 's', 'd').materialize_nodes()\n", + "gg = graphistry.edges(ge_df, 's', 'd').nodes(gg._nodes, 'id')\n", + "print(gg._edges.shape, gg._nodes.shape)\n", + "gg._nodes.head(5)" ] }, { "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({'id': [1]})\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " g2 = co_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=1)\n", - "! nvidia-smi\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del co_gdf\n", - "del g2" - ], + "execution_count": 27, "metadata": { - "id": "-SmFlCBS_Bgx", + "id": "NKtz54uELX-8", "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 116 }, - "outputId": "d2326cf7-3ea6-4f99-9548-f2e98ece59a4" + "outputId": "f4b28841-62bc-42cd-e771-127400a2689e" }, - "execution_count": 16, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "Tue Dec 26 00:56:45 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 49C P0 28W / 70W | 1923MiB / 15360MiB | 37% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Tue Dec 26 00:56:47 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 52C P0 70W / 70W | 2819MiB / 15360MiB | 79% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(12, 1) (11, 2)\n", - "CPU times: user 1.6 s, sys: 37.3 ms, total: 1.64 s\n", - "Wall time: 1.84 s\n" + "CPU times: user 676 ms, sys: 400 ms, total: 1.08 s\n", + "Wall time: 1.11 s\n" ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({'id': [1]})\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " g2 = co_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=2)\n", - "! nvidia-smi\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del co_gdf\n", - "del g2" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" }, - "id": "fjjt3YnYnabv", - "outputId": "05762f50-bfe1-4d23-9153-31431418c8e5" - }, - "execution_count": 17, - "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "Tue Dec 26 00:56:47 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 51C P0 35W / 70W | 1923MiB / 15360MiB | 59% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Tue Dec 26 00:56:49 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 53C P0 59W / 70W | 2821MiB / 15360MiB | 86% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(391, 1) (461, 2)\n", - "CPU times: user 2.32 s, sys: 58.5 ms, total: 2.38 s\n", - "Wall time: 2.51 s\n" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + " id\n", + "0 116374117927631468606" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
id
0116374117927631468606
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 1,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"116374117927631468606\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 27 } + ], + "source": [ + "%%time\n", + "gg.chain([ n({'id': '116374117927631468606'})])._nodes" ] }, + { + "cell_type": "markdown", + "source": [ + "on the GPlus data, simpler `chain` operations over several different hops -- **100-200x** speed increases" + ], + "metadata": { + "id": "e4ZchWvrBKdY" + } + }, { "cell_type": "code", "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({'id': [1]})\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " g2 = co_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=3)\n", - "! nvidia-smi\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del co_gdf\n", - "del g2" + "results_df = pd.DataFrame(columns=['hops', 'CPU hop chain time (s)', 'GPU hop chain time (s)', 'n_notation speedup'])\n", + "\n", + "\n", + "for n_hop in [1,2,3,4,5]:\n", + " start_nodes = pd.DataFrame({fg._node: [0]})\n", + " start0 = time.time()\n", + " out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=n_hop)])._nodes\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", + " start1 = time.time()\n", + " out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=n_hop)])\n", + " end1 = time.time()\n", + "\n", + " del gg_gdf\n", + " del out\n", + " T1 = end1-start1\n", + " # print('\\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))\n", + "\n", + " new_row = pd.DataFrame({\n", + " 'hops': [n_hop],\n", + " 'CPU hop chain time (s)': [np.round(T0, 4)],\n", + " 'GPU hop chain time (s)': [np.round(T1, 4)],\n", + " 'n_notation speedup': [np.round(T0 / T1, 4)]\n", + " })\n", + "\n", + " results_df = pd.concat([results_df, new_row], ignore_index=True)\n", + "\n", + "(results_df.T)" ], "metadata": { "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 175 }, - "id": "oIouuORgnbcY", - "outputId": "f07abe4c-5137-4ee3-935a-afbb2c5eaa1e" + "id": "fTnU8MLr8tV5", + "outputId": "203eb5bf-9d95-4557-f35e-7ef2274424c5" }, - "execution_count": 18, + "execution_count": 28, "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "Tue Dec 26 00:56:50 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 52C P0 36W / 70W | 1925MiB / 15360MiB | 55% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Tue Dec 26 00:56:53 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 54C P0 75W / 70W | 2825MiB / 15360MiB | 74% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(21767, 1) (28480, 2)\n", - "CPU times: user 3.04 s, sys: 63.6 ms, total: 3.1 s\n", - "Wall time: 3.25 s\n" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + " 0 1 2 3 4\n", + "hops 1 2 3 4 5\n", + "CPU hop chain time (s) 33.7597 50.877 228.473 291.1332 327.8891\n", + "GPU hop chain time (s) 0.3082 0.6515 2.9645 4.1146 4.7598\n", + "n_notation speedup 109.5356 78.0912 77.0694 70.7561 68.8877" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
hops12345
CPU hop chain time (s)33.759750.877228.473291.1332327.8891
GPU hop chain time (s)0.30820.65152.96454.11464.7598
n_notation speedup109.535678.091277.069470.756168.8877
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"(results_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": 0,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.3082,\n \"max\": 109.5356,\n \"num_unique_values\": 4,\n \"samples\": [\n 33.7597,\n 109.5356,\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 1,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.6515,\n \"max\": 78.0912,\n \"num_unique_values\": 4,\n \"samples\": [\n 50.877,\n 78.0912,\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 2,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 2.9645,\n \"max\": 228.473,\n \"num_unique_values\": 4,\n \"samples\": [\n 228.473,\n 77.0694,\n 3\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 3,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 4,\n \"max\": 291.1332,\n \"num_unique_values\": 4,\n \"samples\": [\n 291.1332,\n 70.7561,\n 4\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 4,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 4.7598,\n \"max\": 327.8891,\n \"num_unique_values\": 4,\n \"samples\": [\n 327.8891,\n 68.8877,\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 28 } ] }, { - "cell_type": "code", + "cell_type": "markdown", "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({'id': [1]})\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " g2 = co_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=4)\n", - "! nvidia-smi\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del co_gdf\n", - "del g2" + "and similarly for these hop operations -- **100x** speed increases" ], "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "oNLZGjwInc85", - "outputId": "534097cf-4022-48cc-9419-a00c135f69e1" - }, - "execution_count": 19, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Tue Dec 26 00:56:53 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 54C P0 36W / 70W | 1927MiB / 15360MiB | 54% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Tue Dec 26 00:56:58 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 56C P0 38W / 70W | 2907MiB / 15360MiB | 89% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(718640, 1) (2210961, 2)\n", - "CPU times: user 4.58 s, sys: 309 ms, total: 4.89 s\n", - "Wall time: 5.02 s\n" - ] - } - ] + "id": "80bs6Y5pBWb2" + } }, { "cell_type": "code", "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({'id': [1]})\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " g2 = co_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=5)\n", - "! nvidia-smi\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del co_gdf\n", - "del g2" + "results_df = pd.DataFrame(columns=['hops', 'CPU hop chain time (s)', 'GPU hop chain time (s)', 'n_notation speedup'])\n", + "\n", + "\n", + "for n_hop in [1,2,3,4,5]:\n", + " start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", + " start0 = time.time()\n", + " for i in range(1):\n", + " g2 = gg.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end0 = time.time()\n", + " T0 = end0-start0\n", + " start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", + " gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", + " start1 = time.time()\n", + " for i in range(1):\n", + " g2 = gg_gdf.hop(\n", + " nodes=start_nodes,\n", + " direction='forward',\n", + " hops=n_hop)\n", + " end1 = time.time()\n", + "\n", + " del start_nodes\n", + " del gg_gdf\n", + " del g2\n", + " T1 = end1-start1\n", + "\n", + " new_row = pd.DataFrame({\n", + " 'hops': [n_hop],\n", + " 'CPU hop chain time (s)': [np.round(T0, 4)],\n", + " 'GPU hop chain time (s)': [np.round(T1, 4)],\n", + " 'n_notation speedup': [np.round(T0 / T1, 4)]\n", + " })\n", + "\n", + " results_df = pd.concat([results_df, new_row], ignore_index=True)\n", + "\n", + "(results_df.T)" ], "metadata": { "colab": { - "base_uri": "https://localhost:8080/" + "base_uri": "https://localhost:8080/", + "height": 175 }, - "id": "ePqaeujMneX8", - "outputId": "ffd88fff-016e-4ac0-ecb9-fa06baca60f8" + "id": "N2-gDFod9vc3", + "outputId": "907da762-fae2-4caa-cdd2-e78e13b2f635" }, - "execution_count": 20, + "execution_count": 29, "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "Tue Dec 26 00:56:58 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 55C P0 37W / 70W | 1925MiB / 15360MiB | 59% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Tue Dec 26 00:57:10 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 60C P0 48W / 70W | 4325MiB / 15360MiB | 99% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(3041556, 1) (47622917, 2)\n", - "CPU times: user 10.8 s, sys: 1.29 s, total: 12.1 s\n", - "Wall time: 12 s\n" - ] + "output_type": "execute_result", + "data": { + "text/plain": [ + " 0 1 2 3 4\n", + "hops 1 2 3 4 5\n", + "CPU hop chain time (s) 19.6594 33.2538 64.8384 98.9693 147.4526\n", + "GPU hop chain time (s) 0.116 0.2583 0.8252 1.3544 1.9375\n", + "n_notation speedup 169.4189 128.7532 78.5772 73.071 76.103" + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
01234
hops12345
CPU hop chain time (s)19.659433.253864.838498.9693147.4526
GPU hop chain time (s)0.1160.25830.82521.35441.9375
n_notation speedup169.4189128.753278.577273.07176.103
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"(results_df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": 0,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.116,\n \"max\": 169.4189,\n \"num_unique_values\": 4,\n \"samples\": [\n 19.6594,\n 169.4189,\n 1\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 1,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.2583,\n \"max\": 128.7532,\n \"num_unique_values\": 4,\n \"samples\": [\n 33.2538,\n 128.7532,\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 2,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 0.8252,\n \"max\": 78.5772,\n \"num_unique_values\": 4,\n \"samples\": [\n 64.8384,\n 78.5772,\n 3\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 3,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 1.3544,\n \"max\": 98.9693,\n \"num_unique_values\": 4,\n \"samples\": [\n 98.9693,\n 73.071,\n 4\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": 4,\n \"properties\": {\n \"dtype\": \"date\",\n \"min\": 1.9375,\n \"max\": 147.4526,\n \"num_unique_values\": 4,\n \"samples\": [\n 147.4526,\n 76.103,\n 5\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 29 } ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] }, - { - "cell_type": "code", - "source": [ - "%%time\n", - "start_nodes = cudf.DataFrame({'id': [1]})\n", - "co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "! nvidia-smi\n", - "for i in range(10):\n", - " g2 = co_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=6)\n", - "! nvidia-smi\n", - "print(g2._nodes.shape, g2._edges.shape)\n", - "del start_nodes\n", - "del co_gdf\n", - "del g2" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "PTBkoIVHnfzK", - "outputId": "5615ecd7-47ea-46ab-fd36-13bce4b3c787" - }, - "execution_count": 21, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Tue Dec 26 00:57:10 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 59C P0 38W / 70W | 1925MiB / 15360MiB | 44% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "Tue Dec 26 00:57:38 2023 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 68C P0 55W / 70W | 6445MiB / 15360MiB | 95% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n", - "(3071927, 1) (117032738, 2)\n", - "CPU times: user 23.5 s, sys: 2.68 s, total: 26.2 s\n", - "Wall time: 28.2 s\n" - ] - } - ] + "kernelspec": { + "display_name": "Python 3", + "name": "python3" }, - { - "cell_type": "code", - "source": [], - "metadata": { - "id": "Ygc2nrkznlCu" - }, - "execution_count": null, - "outputs": [] + "language_info": { + "name": "python" } - ] + }, + "nbformat": 4, + "nbformat_minor": 0 } \ No newline at end of file diff --git a/demos/gfql/gfql_cpv_gpu_enchmark.ipynb b/demos/gfql/gfql_cpv_gpu_enchmark.ipynb deleted file mode 100644 index 9860cc105e..0000000000 --- a/demos/gfql/gfql_cpv_gpu_enchmark.ipynb +++ /dev/null @@ -1,2976 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "view-in-github", - "colab_type": "text" - }, - "source": [ - "\"Open" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "GZxoiU8sQDk_" - }, - "source": [ - "# GFQL CPU, GPU Benchmark\n", - "\n", - "This notebook examines GFQL property graph query performance on 1-8 hop queries using CPU + GPU modes on various real-world 100K - 100M edge graphs. The data comes from a variety of popular social networks. The single-threaded CPU mode benefits from GFQL's novel dataframe engine, and the GPU mode further adds single-GPU acceleration. Both the `chain()` and `hop()` methods are examined.\n", - "\n", - "The benchmark does not examine bigger-than-memory and distributed scenarios. The provided results here are from running on a free Google Colab T4 runtime, with a 2.2GHz Intel CPU (12 GB CPU RAM) and T4 Nvidia GPU (16 GB GPU RAM).\n", - "\n", - "## Data\n", - "From [SNAP](https://snap.stanford.edu/data/)\n", - "\n", - "| Network | Nodes | Edges |\n", - "|-------------|-----------|--------------|\n", - "| [**Facebook**](#fb)| 4,039 | 88,234 |\n", - "| [**Twitter**](#tw) | 81,306 | 2,420,766 |\n", - "| [**GPlus**](#gpl) | 107,614 | 30,494,866 |\n", - "| [**Orkut**](#ork) | 3,072,441 | 117,185,082 |\n", - "\n", - "## Results\n", - "\n", - "Definitions:\n", - "\n", - "* GTEPS: Giga (billion) edges traversed per second\n", - "\n", - "* T edges / \\$: Estimated trillion edges traversed for 1\\$ USD based on observed GTEPS and a 3yr AWS reservation (as of 12/2023)\n", - "\n", - "Tasks:\n", - "\n", - "1. `chain()` - includes complex pre/post processing\n", - "\n", - " **Task**: `g.chain([n({'id': some_id}), e_forward(hops=some_n)])`\n", - "\n", - "\n", - "| **Dataset** | Max GPU Speedup | CPU GTEPS | GPU GTEPS | T CPU edges / \\$ (t3.l) | T GPU edges / \\$ (g4dn.xl) |\n", - "|-------------|--------------|-------------|-------------|----------------------------|--------------------------------|\n", - "| [**Facebook**](#fb)| 1.1X | 0.66 | 0.61 | 65.7 | 10.4 |\n", - "| [**Twitter**](#tw) | 17.4X | 0.17 | 2.81 | 16.7 | 48.1 |\n", - "| [**GPlus**](#gpl) | 43.8X | 0.09 | 2.87 | 8.5 | 49.2 |\n", - "| [**Orkut**](#ork) | N/A | N/A | 12.15 | N/A | 208.3 |\n", - "| **AVG** | 20.7X | 0.30 | 4.61 | 30.3 | 79.0\n", - "| **MAX** | 43.8X | 0.66 | 12.15 | 65.7 | 208.3\n", - "\n", - "\n", - "2. `hop()` - core property search primitive similar to BFS\n", - "\n", - " **Task**: `g.hop(nodes=[some_id], direction='forward', hops=some_n)`\n", - "\n", - "\n", - "| **Dataset** | Max GPU Speedup | CPU GTEPS | GPU GTEPS | T CPU edges / \\$ (t3.l) | T GPU edges / \\$ (g4dn.xl) |\n", - "|-------------|-------------|-----------|-----------|--------------------|--------------------------------|\n", - "| [**Facebook**](#fb)| 3X | 0.47 | 1.47 | 47.0 | 25.2 |\n", - "| [**Twitter**](#tw) | 42X | 0.50 | 10.51 | 50.2 | 180.2 |\n", - "| [**GPlus**](#gpl) | 21X | 0.26 | 4.11 | 26.2 | 70.4 |\n", - "| [**Orkut**](#ork) | N/A | N/A | 41.50 | N/A | 711.4 |\n", - "| **AVG** | 22X | 0.41 | 14.4 | 41.1 | 246.8\n", - "| **MAX** | 42X | 0.50 | 41.50 | 50.2 | 711.4\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "SAj8lhREEOwS" - }, - "source": [ - "## Optional: GPU setup - Google Colab" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "4hrEEAAm7DTO" - }, - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "W2MF6ZsjDv3B", - "outputId": "d09118ee-55d5-49cf-b950-d1e232aa4eb2" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Mon Feb 19 04:14:57 2024 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 39C P8 9W / 70W | 0MiB / 15360MiB | 0% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "| No running processes found |\n", - "+---------------------------------------------------------------------------------------+\n" - ] - } - ], - "source": [ - "# Report GPU used when GPU benchmarking\n", - "! nvidia-smi" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Aikh0x4ID_wK" - }, - "outputs": [], - "source": [ - "# if in google colab\n", - "# !git clone https://github.com/rapidsai/rapidsai-csp-utils.git\n", - "# !python rapidsai-csp-utils/colab/pip-install.py\n", - "!pip install --extra-index-url=https://pypi.nvidia.com cuml-cu12 cudf-cu12 #==23.12.00 #cugraph-cu11 pylibraft_cu11 raft_dask_cu11 dask_cudf_cu11 pylibcugraph_cu11 pylibraft_cu11\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 35 - }, - "id": "Lwekdei1dH3N", - "outputId": "a506b4fb-0dba-4e90-884e-df16cb19eebd" - }, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "'24.02.01'" - ], - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "string" - } - }, - "metadata": {}, - "execution_count": 2 - } - ], - "source": [ - "import cudf\n", - "cudf.__version__" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "QQpsrtwBT7sa" - }, - "source": [ - "# 1. Install & configure" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "cYjRbgkU9Sx8", - "outputId": "7d592f98-36af-4657-b8eb-80eeabd98e2f" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.3/3.3 MB\u001b[0m \u001b[31m14.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m244.4/244.4 kB\u001b[0m \u001b[31m2.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m332.3/332.3 kB\u001b[0m \u001b[31m8.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", - "\u001b[?25h" - ] - } - ], - "source": [ - "#! pip install graphistry[igraph]\n", - "\n", - "!pip install -q igraph\n", - "!pip install -q graphistry\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Ff6Tt9DhkePl" - }, - "source": [ - "## Imports" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 35 - }, - "id": "S5_y0CbLkjft", - "outputId": "909ff8a7-650e-4e65-aaf9-40f650d50145" - }, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "'0.33.0'" - ], - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "string" - } - }, - "metadata": {}, - "execution_count": 4 - } - ], - "source": [ - "import pandas as pd\n", - "\n", - "import graphistry, time\n", - "\n", - "from graphistry import (\n", - "\n", - " # graph operators\n", - " n, e_undirected, e_forward, e_reverse,\n", - "\n", - " # attribute predicates\n", - " is_in, ge, startswith, contains, match as match_re\n", - ")\n", - "graphistry.__version__" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "I7Fg75jsG4co" - }, - "outputs": [], - "source": [ - "import cudf" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "uLZKph2-a5M4" - }, - "outputs": [], - "source": [ - "#work around google colab shell encoding bugs\n", - "\n", - "import locale\n", - "locale.getpreferredencoding = lambda: \"UTF-8\"" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "eU9SyauNUHtR" - }, - "source": [ - "# 2. Perf benchmarks" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "NA0Ym11fkB8j" - }, - "source": [ - "\n", - "### Facebook: 88K edges" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 224 - }, - "id": "vXuQogHekClJ", - "outputId": "de95808f-d1c9-4864-e8d9-b197f3d38413" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(88234, 2)\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - " s d\n", - "0 0 1\n", - "1 0 2\n", - "2 0 3\n", - "3 0 4\n", - "4 0 5" - ], - "text/html": [ - "\n", - "
\n", - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
sd
001
102
203
304
405
\n", - "
\n", - "
\n", - "\n", - "
\n", - " \n", - "\n", - " \n", - "\n", - " \n", - "
\n", - "\n", - "\n", - "
\n", - " \n", - "\n", - "\n", - "\n", - " \n", - "
\n", - "
\n", - "
\n" - ], - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "dataframe", - "variable_name": "df", - "summary": "{\n \"name\": \"df\",\n \"rows\": 88234,\n \"fields\": [\n {\n \"column\": \"s\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 910,\n \"min\": 0,\n \"max\": 4031,\n \"samples\": [\n 1624,\n 101,\n 377\n ],\n \"num_unique_values\": 3663,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"d\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 893,\n \"min\": 1,\n \"max\": 4038,\n \"samples\": [\n 2193,\n 150,\n 879\n ],\n \"num_unique_values\": 4037,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" - } - }, - "metadata": {}, - "execution_count": 7 - } - ], - "source": [ - "df = pd.read_csv('https://raw.githubusercontent.com/graphistry/pygraphistry/master/demos/data/facebook_combined.txt', sep=' ', names=['s', 'd'])\n", - "print(df.shape)\n", - "df.head(5)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 224 - }, - "id": "jEma7hvvkzkN", - "outputId": "98501752-d3df-4f5d-95b8-ff085a17894a" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(4039, 1) (88234, 2)\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - " id\n", - "0 0\n", - "1 1\n", - "2 2\n", - "3 3\n", - "4 4" - ], - "text/html": [ - "\n", - "
\n", - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
id
00
11
22
33
44
\n", - "
\n", - "
\n", - "\n", - "
\n", - " \n", - "\n", - " \n", - "\n", - " \n", - "
\n", - "\n", - "\n", - "
\n", - " \n", - "\n", - "\n", - "\n", - " \n", - "
\n", - "
\n", - "
\n" - ], - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "dataframe", - "summary": "{\n \"name\": \"fg\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 0,\n \"max\": 4,\n \"samples\": [\n 1,\n 4,\n 2\n ],\n \"num_unique_values\": 5,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" - } - }, - "metadata": {}, - "execution_count": 8 - } - ], - "source": [ - "fg = graphistry.edges(df, 's', 'd').materialize_nodes()\n", - "print(fg._nodes.shape, fg._edges.shape)\n", - "fg._nodes.head(5)" - ] - }, - { - "cell_type": "markdown", - "source": [ - "with 2 and 5 hop `chain` comparison we see a slight/negligable speedup enabled by setting g. to `cudf`" - ], - "metadata": { - "id": "2gVDho9cn2Et" - } - }, - { - "cell_type": "code", - "source": [ - "for n_hop in [2,5]:\n", - " start0 = time.time()\n", - " for i in range(100):\n", - " fg2 = fg.chain([n({'id': 0}), e_forward(hops=n_hop)]) # using n notation\n", - " mid0 = time.time()\n", - " for i in range(100):\n", - " fg2 = fg.chain([e_forward(source_node_match={'id': 0}, hops=n_hop)]) # using source_node_match in e_forward\n", - " end0 = time.time()\n", - " T0 = mid0-start0\n", - " T1 = end0-mid0\n", - " fg_gdf = fg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - " start1 = time.time()\n", - " for i in range(100):\n", - " fg2 = fg_gdf.chain([n({'id': 0}), e_forward(hops=n_hop)])\n", - " mid1 = time.time()\n", - " for i in range(100):\n", - " fg2 = fg_gdf.chain([e_forward(source_node_match={'id': 0}, hops=n_hop)])\n", - " end1 = time.time()\n", - " # print(fg._nodes.shape, fg._edges.shape)\n", - " # print(fg2._nodes.shape, fg2._edges.shape)\n", - " del fg_gdf\n", - " del fg2\n", - " T2 = mid1-start1\n", - " T3 = end1-mid1\n", - " print('\\nhops:',n_hop,'\\nCPU n_notation time:',np.round(T0,4),'\\nGPU n_notation time:',np.round(T2,4),'\\nspeedup:', np.round(T0/T2,4),\n", - " '\\nCPU source_node_match time:',np.round(T1,4),'\\nGPU source_node_match time:',np.round(T3,4),'\\nspeedup:', np.round(T1/T3,4), )" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "ZKzoqGcdxekr", - "outputId": "344b9823-6998-4f71-d3b5-7aff9fec7912" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "hops: 2 \n", - "CPU n_notation time: 13.6357 \n", - "GPU n_notation time: 12.2177 \n", - "n_notation speedup: 1.1161 \n", - "CPU source_node_match time: 21.2028 \n", - "GPU source_node_match time: 14.3844 \n", - "source_node_match speedup: 1.474\n", - "hops: 5 \n", - "CPU n_notation time: 36.8941 \n", - "GPU n_notation time: 21.3562 \n", - "n_notation speedup: 1.7276 \n", - "CPU source_node_match time: 17.8739 \n", - "GPU source_node_match time: 14.8514 \n", - "source_node_match speedup: 1.2035\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "and with simple 2 and 5 hop `hop` comparison we see a 2x speedup enabled by setting g. to `cudf`" - ], - "metadata": { - "id": "5-7M9sPEAf5Z" - } - }, - { - "cell_type": "code", - "source": [ - "for n_hop in [2,5]:\n", - " start_nodes = pd.DataFrame({fg._node: [0]})\n", - " start0 = time.time()\n", - " for i in range(100):\n", - " fg2 = fg.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=n_hop)\n", - " end0 = time.time()\n", - " T0 = end0-start0\n", - " start_nodes = cudf.DataFrame({fg._node: [0]})\n", - " fg_gdf = fg.nodes(cudf.from_pandas(fg._nodes)).edges(cudf.from_pandas(fg._edges))\n", - " start1 = time.time()\n", - " for i in range(100):\n", - " fg2 = fg_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=n_hop)\n", - " end1 = time.time()\n", - " # print(fg._nodes.shape, fg._edges.shape)\n", - " # print(fg2._nodes.shape, fg2._edges.shape)\n", - " del fg_gdf\n", - " del fg2\n", - " T1 = end1-start1\n", - " print('\\nCPU',n_hop,'hop time:',np.round(T0,4),'\\nGPU',n_hop,'hop time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "Tki_0-_j3XKG", - "outputId": "7edbb4e9-1f49-4b7a-fdb4-9df0867512cd" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "\n", - "CPU 2 hop time: 5.7415 \n", - "GPU 2 hop time: 2.7301 2 \n", - "hop speedup: 2.103\n", - "\n", - "CPU 5 hop time: 14.3391 \n", - "GPU 5 hop time: 6.9998 5 \n", - "hop speedup: 2.0485\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "KrJKjXy2KLos" - }, - "source": [ - "\n", - "## Twitter\n", - "\n", - "- edges: 2420766\n", - "- nodes: 81306" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "fO2qasGqpubr", - "outputId": "957c5ea8-0da9-4101-ecf8-7db8d5d13f49" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "--2024-02-20 09:48:59-- https://snap.stanford.edu/data/twitter_combined.txt.gz\n", - "Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80\n", - "Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 10621918 (10M) [application/x-gzip]\n", - "Saving to: ‘twitter_combined.txt.gz’\n", - "\n", - "twitter_combined.tx 100%[===================>] 10.13M 9.10MB/s in 1.1s \n", - "\n", - "2024-02-20 09:49:00 (9.10 MB/s) - ‘twitter_combined.txt.gz’ saved [10621918/10621918]\n", - "\n" - ] - } - ], - "source": [ - "! wget 'https://snap.stanford.edu/data/twitter_combined.txt.gz'\n", - "#! curl -L 'https://snap.stanford.edu/data/twitter_combined.txt.gz' -o twitter_combined.txt.gz" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "fn7zeA3SGlEo" - }, - "outputs": [], - "source": [ - "! gunzip twitter_combined.txt.gz" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "68TAZkhLGz9g", - "outputId": "b861bc74-a142-4880-dc43-8068f7ef6b04" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "214328887 34428380\n", - "17116707 28465635\n", - "380580781 18996905\n", - "221036078 153460275\n", - "107830991 17868918\n" - ] - } - ], - "source": [ - "! head -n 5 twitter_combined.txt" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "QU2wNeGXG2GC", - "outputId": "dff04552-efc7-49a6-9f25-460f31a288be" - }, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "(2420766, 2)" - ] - }, - "metadata": {}, - "execution_count": 31 - } - ], - "source": [ - "te_df = pd.read_csv('twitter_combined.txt', sep=' ', names=['s', 'd'])\n", - "te_df.shape" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "EK5gQH2iG5UU" - }, - "outputs": [], - "source": [ - "import graphistry" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "ZtIW-eFGG_R4", - "outputId": "53bd4e96-f3c4-4d45-abfd-e8a454a8ac7e" - }, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "(81306, 1)" - ] - }, - "metadata": {}, - "execution_count": 33 - } - ], - "source": [ - "g = graphistry.edges(te_df, 's', 'd').materialize_nodes()\n", - "g._nodes.shape" - ] - }, - { - "cell_type": "markdown", - "source": [ - "on the twitter data, simpler `chain` operations over several different hops -- **10-20x** *italicized text* speed increases" - ], - "metadata": { - "id": "yR9Qr8tGww3b" - } - }, - { - "cell_type": "code", - "source": [ - "for n_hop in [1,2,8]:\n", - " start_nodes = pd.DataFrame({fg._node: [0]})\n", - " start0 = time.time()\n", - " for i in range(10):\n", - " g2 = g.chain([n({'id': 17116707}), e_forward(hops=n_hop)])\n", - " end0 = time.time()\n", - " T0 = end0-start0\n", - " g_gdf = g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - " start1 = time.time()\n", - " for i in range(10):\n", - " out = g_gdf.chain([n({'id': 17116707}), e_forward(hops=n_hop)])._nodes\n", - " end1 = time.time()\n", - " # print(fg._nodes.shape, fg._edges.shape)\n", - " # print(fg2._nodes.shape, fg2._edges.shape)\n", - " del g_gdf\n", - " del out\n", - " T1 = end1-start1\n", - " print('\\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "rCsvQJa-6U0x", - "outputId": "70887aac-1bc8-499a-dc9e-40a1803e8fe3" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "\n", - "CPU 1 hop chain time: 20.1676 \n", - "GPU 1 hop chain time: 1.0259 \n", - " 1 hop chain speedup: 19.6579\n", - "\n", - "CPU 2 hop chain time: 21.7168 \n", - "GPU 2 hop chain time: 2.2507 \n", - " 2 hop chain speedup: 9.6488\n", - "\n", - "CPU 8 hop chain time: 157.5035 \n", - "GPU 8 hop chain time: 7.8694 \n", - " 8 hop chain speedup: 20.0147\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "and similarly for these `hop` operations -- **10-40x** speed increases" - ], - "metadata": { - "id": "gHHhyYlzArjw" - } - }, - { - "cell_type": "code", - "source": [ - "for n_hop in [1,2,8]:\n", - " start_nodes = pd.DataFrame({g._node: [17116707]})\n", - " start0 = time.time()\n", - " for i in range(10):\n", - " g2 = g.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=n_hop)\n", - " end0 = time.time()\n", - " T0 = end0-start0\n", - " start_nodes = cudf.DataFrame({g._node: [17116707]})\n", - " g_gdf = g.nodes(cudf.from_pandas(g._nodes)).edges(cudf.from_pandas(g._edges))\n", - " start1 = time.time()\n", - " for i in range(10):\n", - " g2 = g_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=5)\n", - " end1 = time.time()\n", - " # print(fg._nodes.shape, fg._edges.shape)\n", - " # print(fg2._nodes.shape, fg2._edges.shape)\n", - " del start_nodes\n", - " del g_gdf\n", - " del g2\n", - " T1 = end1-start1\n", - " print('\\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "cnILbPnG7tf4", - "outputId": "bd4aa370-3b54-4c18-c92d-7aba957f658a" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "\n", - "CPU 1 hop chain time: 12.3446 \n", - "GPU 1 hop chain time: 1.204 \n", - "speedup: 10.2526\n", - "\n", - "CPU 2 hop chain time: 13.2377 \n", - "GPU 2 hop chain time: 1.1608 \n", - "speedup: 11.4037\n", - "\n", - "CPU 8 hop chain time: 52.2491 \n", - "GPU 8 hop chain time: 1.2148 \n", - "speedup: 43.012\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "9dZzAAVONCD2" - }, - "source": [ - "\n", - "## GPlus\n", - "\n", - "- edges: 30494866\n", - "- nodes: 107614" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "-nhWGNekKpcZ", - "outputId": "0ce5e0c8-e5a4-4e8e-b595-3c544192bf24" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "--2024-02-20 09:59:24-- https://snap.stanford.edu/data/gplus_combined.txt.gz\n", - "Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80\n", - "Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 398930514 (380M) [application/x-gzip]\n", - "Saving to: ‘gplus_combined.txt.gz’\n", - "\n", - "gplus_combined.txt. 100%[===================>] 380.45M 39.7MB/s in 9.9s \n", - "\n", - "2024-02-20 09:59:34 (38.5 MB/s) - ‘gplus_combined.txt.gz’ saved [398930514/398930514]\n", - "\n" - ] - } - ], - "source": [ - "! wget https://snap.stanford.edu/data/gplus_combined.txt.gz" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "g5wgA_c2KqwJ" - }, - "outputs": [], - "source": [ - "! gunzip gplus_combined.txt.gz" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "52hgDbr0Kti6", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 260 - }, - "outputId": "1b81b614-f1b9-4031-db25-42ed9500c9c7" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(30494866, 2)\n", - "CPU times: user 16.8 s, sys: 1.41 s, total: 18.2 s\n", - "Wall time: 18.4 s\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - " s d\n", - "0 116374117927631468606 101765416973555767821\n", - "1 112188647432305746617 107727150903234299458\n", - "2 116719211656774388392 100432456209427807893\n", - "3 117421021456205115327 101096322838605097368\n", - "4 116407635616074189669 113556266482860931616" - ], - "text/html": [ - "\n", - "
\n", - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
sd
0116374117927631468606101765416973555767821
1112188647432305746617107727150903234299458
2116719211656774388392100432456209427807893
3117421021456205115327101096322838605097368
4116407635616074189669113556266482860931616
\n", - "
\n", - "
\n", - "\n", - "
\n", - " \n", - "\n", - " \n", - "\n", - " \n", - "
\n", - "\n", - "\n", - "
\n", - " \n", - "\n", - "\n", - "\n", - " \n", - "
\n", - "
\n", - "
\n" - ], - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "dataframe", - "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"s\",\n \"properties\": {\n \"dtype\": \"string\",\n \"samples\": [\n \"112188647432305746617\",\n \"116407635616074189669\",\n \"116719211656774388392\"\n ],\n \"num_unique_values\": 5,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"d\",\n \"properties\": {\n \"dtype\": \"string\",\n \"samples\": [\n \"107727150903234299458\",\n \"113556266482860931616\",\n \"100432456209427807893\"\n ],\n \"num_unique_values\": 5,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" - } - }, - "metadata": {}, - "execution_count": 41 - } - ], - "source": [ - "%%time\n", - "ge_df = pd.read_csv('gplus_combined.txt', sep=' ', names=['s', 'd'])\n", - "print(ge_df.shape)\n", - "ge_df.head(5)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "w5YkN-nLK6UV", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 260 - }, - "outputId": "9c818e80-34b1-431a-d965-3305b45c1bb2" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(30494866, 2) (107614, 1)\n", - "CPU times: user 4.41 s, sys: 1.29 s, total: 5.7 s\n", - "Wall time: 5.69 s\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - " id\n", - "0 116374117927631468606\n", - "1 112188647432305746617\n", - "2 116719211656774388392\n", - "3 117421021456205115327\n", - "4 116407635616074189669" - ], - "text/html": [ - "\n", - "
\n", - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
id
0116374117927631468606
1112188647432305746617
2116719211656774388392
3117421021456205115327
4116407635616074189669
\n", - "
\n", - "
\n", - "\n", - "
\n", - " \n", - "\n", - " \n", - "\n", - " \n", - "
\n", - "\n", - "\n", - "
\n", - " \n", - "\n", - "\n", - "\n", - " \n", - "
\n", - "
\n", - "
\n" - ], - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "dataframe", - "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"string\",\n \"samples\": [\n \"112188647432305746617\",\n \"116407635616074189669\",\n \"116719211656774388392\"\n ],\n \"num_unique_values\": 5,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" - } - }, - "metadata": {}, - "execution_count": 42 - } - ], - "source": [ - "%%time\n", - "gg = graphistry.edges(ge_df, 's', 'd').materialize_nodes()\n", - "gg = graphistry.edges(ge_df, 's', 'd').nodes(gg._nodes, 'id')\n", - "print(gg._edges.shape, gg._nodes.shape)\n", - "gg._nodes.head(5)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "NKtz54uELX-8", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 116 - }, - "outputId": "aaccd673-85c1-40a4-ccdc-f13388e7a01a" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "CPU times: user 471 ms, sys: 307 ms, total: 779 ms\n", - "Wall time: 776 ms\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - " id\n", - "0 116374117927631468606" - ], - "text/html": [ - "\n", - "
\n", - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
id
0116374117927631468606
\n", - "
\n", - "
\n", - "\n", - "
\n", - " \n", - "\n", - " \n", - "\n", - " \n", - "
\n", - "\n", - "
\n", - "
\n" - ], - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "dataframe", - "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 1,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"string\",\n \"samples\": [\n \"116374117927631468606\"\n ],\n \"num_unique_values\": 1,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" - } - }, - "metadata": {}, - "execution_count": 43 - } - ], - "source": [ - "%%time\n", - "gg.chain([ n({'id': '116374117927631468606'})])._nodes" - ] - }, - { - "cell_type": "markdown", - "source": [ - "on the GPlus data, simpler `chain` operations over several different hops -- **100-200x** speed increases" - ], - "metadata": { - "id": "e4ZchWvrBKdY" - } - }, - { - "cell_type": "code", - "source": [ - "for n_hop in [1,2,3,4,5]:\n", - " start_nodes = pd.DataFrame({fg._node: [0]})\n", - " start0 = time.time()\n", - " out = gg.chain([ n({'id': '116374117927631468606'}), e_forward(hops=n_hop)])._nodes\n", - " end0 = time.time()\n", - " T0 = end0-start0\n", - " gg_gdf = gg.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - " start1 = time.time()\n", - " out = gg_gdf.chain([ n({'id': '116374117927631468606'}), e_forward(hops=n_hop)])\n", - " end1 = time.time()\n", - " # print(fg._nodes.shape, fg._edges.shape)\n", - " # print(fg2._nodes.shape, fg2._edges.shape)\n", - " del gg_gdf\n", - " del out\n", - " T1 = end1-start1\n", - " print('\\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 507 - }, - "id": "fTnU8MLr8tV5", - "outputId": "40a2f455-9d79-4a3c-abeb-aac93713a424" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "\n", - "CPU 1 hop chain time: 70.7013 \n", - "GPU 1 hop chain time: 0.2911 \n", - "speedup: 242.9049\n", - "\n", - "CPU 2 hop chain time: 84.2395 \n", - "GPU 2 hop chain time: 0.6138 \n", - "speedup: 137.252\n" - ] - }, - { - "output_type": "error", - "ename": "KeyboardInterrupt", - "evalue": "", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mstart_nodes\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mDataFrame\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m{\u001b[0m\u001b[0mfg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_node\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mstart0\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mout\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mchain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m \u001b[0mn\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m{\u001b[0m\u001b[0;34m'id'\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0;34m'116374117927631468606'\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0me_forward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mhops\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mn_hop\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_nodes\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0mend0\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mtime\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtime\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mT0\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mend0\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0mstart0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/graphistry/compute/ComputeMixin.py\u001b[0m in \u001b[0;36mchain\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 391\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 392\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mchain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 393\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mchain_base\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 394\u001b[0m \u001b[0mchain\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__doc__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mchain_base\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__doc__\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/graphistry/compute/chain.py\u001b[0m in \u001b[0;36mchain\u001b[0;34m(self, ops, engine)\u001b[0m\n\u001b[1;32m 285\u001b[0m )\n\u001b[1;32m 286\u001b[0m g_step = (\n\u001b[0;32m--> 287\u001b[0;31m op(\n\u001b[0m\u001b[1;32m 288\u001b[0m \u001b[0mg\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mg\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;31m# transition via any original edge\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 289\u001b[0m \u001b[0mprev_node_wavefront\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mprev_step_nodes\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/graphistry/compute/ast.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, g, prev_node_wavefront, target_wave_front, engine)\u001b[0m\n\u001b[1;32m 326\u001b[0m \u001b[0mlogger\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdebug\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'----------------------------------------'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 327\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 328\u001b[0;31m out_g = g.hop(\n\u001b[0m\u001b[1;32m 329\u001b[0m \u001b[0mnodes\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mprev_node_wavefront\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 330\u001b[0m \u001b[0mhops\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhops\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/graphistry/compute/ComputeMixin.py\u001b[0m in \u001b[0;36mhop\u001b[0;34m(self, *args, **kwargs)\u001b[0m\n\u001b[1;32m 379\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 380\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mhop\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 381\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mhop_base\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 382\u001b[0m \u001b[0mhop\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__doc__\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mhop_base\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__doc__\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 383\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/graphistry/compute/hop.py\u001b[0m in \u001b[0;36mhop\u001b[0;34m(self, nodes, hops, to_fixed_point, direction, edge_match, source_node_match, destination_node_match, source_node_query, destination_node_query, edge_query, return_as_wave_front, target_wave_front, engine)\u001b[0m\n\u001b[1;32m 188\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mdirection\u001b[0m \u001b[0;32min\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0;34m'forward'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'undirected'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 189\u001b[0m hop_edges_forward = (\n\u001b[0;32m--> 190\u001b[0;31m wave_front_iter.merge(\n\u001b[0m\u001b[1;32m 191\u001b[0m \u001b[0medges_indexed\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mg2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_source\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mg2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_destination\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mEDGE_ID\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0massign\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0;34m{\u001b[0m\u001b[0mg2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_node\u001b[0m\u001b[0;34m:\u001b[0m \u001b[0medges_indexed\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mg2\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_source\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 192\u001b[0m \u001b[0mhow\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'inner'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py\u001b[0m in \u001b[0;36mmerge\u001b[0;34m(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)\u001b[0m\n\u001b[1;32m 10091\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0mpandas\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcore\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmerge\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mmerge\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10092\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m> 10093\u001b[0;31m return merge(\n\u001b[0m\u001b[1;32m 10094\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10095\u001b[0m \u001b[0mright\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py\u001b[0m in \u001b[0;36mmerge\u001b[0;34m(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)\u001b[0m\n\u001b[1;32m 122\u001b[0m \u001b[0mvalidate\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mvalidate\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 123\u001b[0m )\n\u001b[0;32m--> 124\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mop\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_result\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcopy\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcopy\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 125\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 126\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py\u001b[0m in \u001b[0;36mget_result\u001b[0;34m(self, copy)\u001b[0m\n\u001b[1;32m 771\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mleft\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mright\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_indicator_pre_merge\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mleft\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mright\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 772\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 773\u001b[0;31m \u001b[0mjoin_index\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mleft_indexer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mright_indexer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_join_info\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 774\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 775\u001b[0m result = self._reindex_and_concat(\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py\u001b[0m in \u001b[0;36m_get_join_info\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 1024\u001b[0m )\n\u001b[1;32m 1025\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1026\u001b[0;31m \u001b[0;34m(\u001b[0m\u001b[0mleft_indexer\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mright_indexer\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_join_indexers\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1027\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1028\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mright_index\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py\u001b[0m in \u001b[0;36m_get_join_indexers\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 998\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m_get_join_indexers\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m->\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mnpt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mNDArray\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mintp\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnpt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mNDArray\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mintp\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 999\u001b[0m \u001b[0;34m\"\"\"return the join indexers\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1000\u001b[0;31m return get_join_indexers(\n\u001b[0m\u001b[1;32m 1001\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mleft_join_keys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mright_join_keys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msort\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msort\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhow\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhow\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1002\u001b[0m )\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py\u001b[0m in \u001b[0;36mget_join_indexers\u001b[0;34m(left_keys, right_keys, sort, how, **kwargs)\u001b[0m\n\u001b[1;32m 1583\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mn\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mleft_keys\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1584\u001b[0m )\n\u001b[0;32m-> 1585\u001b[0;31m \u001b[0mzipped\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mzip\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0mmapped\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1586\u001b[0m \u001b[0mllab\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrlab\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mshape\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mx\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mzipped\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1587\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py\u001b[0m in \u001b[0;36m\u001b[0;34m(.0)\u001b[0m\n\u001b[1;32m 1580\u001b[0m \u001b[0;31m# get left & right join labels and num. of levels at each location\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1581\u001b[0m mapped = (\n\u001b[0;32m-> 1582\u001b[0;31m \u001b[0m_factorize_keys\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mleft_keys\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mright_keys\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msort\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msort\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhow\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mhow\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1583\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0mn\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mrange\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mleft_keys\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1584\u001b[0m )\n", - "\u001b[0;32m/usr/local/lib/python3.10/dist-packages/pandas/core/reshape/merge.py\u001b[0m in \u001b[0;36m_factorize_keys\u001b[0;34m(lk, rk, sort, how)\u001b[0m\n\u001b[1;32m 2331\u001b[0m \u001b[0;31m# \"Union[ndarray[Any, dtype[signedinteger[_64Bit]]],\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2332\u001b[0m \u001b[0;31m# ndarray[Any, dtype[object_]]]\"; expected \"ndarray[Any, dtype[object_]]\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2333\u001b[0;31m \u001b[0mrlab\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mrizer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfactorize\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mrk\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;31m# type: ignore[arg-type]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2334\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mllab\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mintp\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mllab\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2335\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0mrlab\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mintp\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrlab\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mKeyboardInterrupt\u001b[0m: " - ] - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "and similarly for these hop operations -- **100x** speed increases" - ], - "metadata": { - "id": "80bs6Y5pBWb2" - } - }, - { - "cell_type": "code", - "source": [ - "for n_hop in [1,2,3,4,5]:\n", - " start_nodes = pd.DataFrame({gg._node: ['116374117927631468606']})\n", - " start0 = time.time()\n", - " for i in range(1):\n", - " g2 = gg.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=n_hop)\n", - " end0 = time.time()\n", - " T0 = end0-start0\n", - " start_nodes = cudf.DataFrame({gg._node: ['116374117927631468606']})\n", - " gg_gdf = gg.nodes(cudf.from_pandas(gg._nodes)).edges(cudf.from_pandas(gg._edges))\n", - " start1 = time.time()\n", - " for i in range(1):\n", - " g2 = gg_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=n_hop)\n", - " end1 = time.time()\n", - " # print(fg._nodes.shape, fg._edges.shape)\n", - " # print(fg2._nodes.shape, fg2._edges.shape)\n", - " del start_nodes\n", - " del gg_gdf\n", - " del g2\n", - " T1 = end1-start1\n", - " print('\\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "N2-gDFod9vc3", - "outputId": "c6967f1f-fa01-41a6-a776-02e2892f300f" - }, - "execution_count": null, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "\n", - "CPU 1 hop chain time: 38.0714 \n", - "GPU 1 hop chain time: 0.2615 \n", - "speedup: 145.5678\n", - "\n", - "CPU 2 hop chain time: 52.949 \n", - "GPU 2 hop chain time: 0.4553 \n", - "speedup: 116.2876\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "R03M_swxarKC" - }, - "source": [ - "\n", - "## Orkut\n", - "- 117M edges\n", - "- 3M nodes" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "QoabYR2maxPo", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "f58e2837-9417-490b-882c-b1f478ed53f8" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "--2024-02-19 06:02:00-- https://snap.stanford.edu/data/bigdata/communities/com-orkut.ungraph.txt.gz\n", - "Resolving snap.stanford.edu (snap.stanford.edu)... 171.64.75.80\n", - "Connecting to snap.stanford.edu (snap.stanford.edu)|171.64.75.80|:443... connected.\n", - "HTTP request sent, awaiting response... 200 OK\n", - "Length: 447251958 (427M) [application/x-gzip]\n", - "Saving to: ‘com-orkut.ungraph.txt.gz’\n", - "\n", - "com-orkut.ungraph.t 100%[===================>] 426.53M 31.8MB/s in 11s \n", - "\n", - "2024-02-19 06:02:11 (37.4 MB/s) - ‘com-orkut.ungraph.txt.gz’ saved [447251958/447251958]\n", - "\n" - ] - } - ], - "source": [ - "! wget https://snap.stanford.edu/data/bigdata/communities/com-orkut.ungraph.txt.gz" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "BvvfFPKWbAVJ" - }, - "outputs": [], - "source": [ - "! gunzip com-orkut.ungraph.txt.gz" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "YsWwRoPqbPIb", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "5c12501a-e724-44d5-f651-e1860a8638af" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "# Undirected graph: ../../data/output/orkut.txt\n", - "# Orkut\n", - "# Nodes: 3072441 Edges: 117185083\n", - "# FromNodeId\tToNodeId\n", - "1\t2\n", - "1\t3\n", - "1\t4\n" - ] - } - ], - "source": [ - "! head -n 7 com-orkut.ungraph.txt" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "cbMC8r2ldjbW", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "87ee2818-72d9-46e7-d0d8-5d16958e10c8" - }, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "('24.02.01', '0.33.0')" - ] - }, - "metadata": {}, - "execution_count": 8 - } - ], - "source": [ - "import pandas as pd\n", - "\n", - "import graphistry\n", - "\n", - "from graphistry import (\n", - "\n", - " # graph operators\n", - " n, e_undirected, e_forward, e_reverse,\n", - "\n", - " # attribute predicates\n", - " is_in, ge, startswith, contains, match as match_re\n", - ")\n", - "\n", - "import cudf\n", - "\n", - "#work around google colab shell encoding bugs\n", - "import locale\n", - "locale.getpreferredencoding = lambda: \"UTF-8\"\n", - "\n", - "cudf.__version__, graphistry.__version__" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "TopFxAvnh_Cv", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "52288638-691e-47e4-87fc-4f761a3e9302" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Mon Feb 19 06:02:29 2024 \n", - "+---------------------------------------------------------------------------------------+\n", - "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n", - "|-----------------------------------------+----------------------+----------------------+\n", - "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n", - "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n", - "| | | MIG M. |\n", - "|=========================================+======================+======================|\n", - "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n", - "| N/A 64C P0 29W / 70W | 111MiB / 15360MiB | 0% Default |\n", - "| | | N/A |\n", - "+-----------------------------------------+----------------------+----------------------+\n", - " \n", - "+---------------------------------------------------------------------------------------+\n", - "| Processes: |\n", - "| GPU GI CI PID Type Process name GPU Memory |\n", - "| ID ID Usage |\n", - "|=======================================================================================|\n", - "+---------------------------------------------------------------------------------------+\n" - ] - } - ], - "source": [ - "! nvidia-smi" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Oczs87ITbJgw", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "b894db9b-13d1-426a-b162-dea780abce3b" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(117185082, 2)\n", - " s d\n", - "0 1 3\n", - "1 1 4\n", - "2 1 5\n", - "3 1 6\n", - "4 1 7\n", - "s int64\n", - "d int64\n", - "dtype: object\n", - "CPU times: user 2.34 s, sys: 1.29 s, total: 3.63 s\n", - "Wall time: 3.77 s\n" - ] - } - ], - "source": [ - "%%time\n", - "co_df = cudf.read_csv('com-orkut.ungraph.txt', sep='\\t', names=['s', 'd'], skiprows=5).to_pandas()\n", - "print(co_df.shape)\n", - "print(co_df.head(5))\n", - "print(co_df.dtypes)\n", - "#del co_df" - ] - }, - { - "cell_type": "markdown", - "source": [ - "from load into gpu and back to cpu again" - ], - "metadata": { - "id": "2QLDI3vdAtkf" - } - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "gGSDjTtveFAT", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 260 - }, - "outputId": "d8501df9-7070-4bfa-c5e2-49a581df5f56" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "(3072441, 1) (117185082, 2)\n", - "CPU times: user 2.06 s, sys: 7.93 s, total: 10 s\n", - "Wall time: 11.2 s\n" - ] - }, - { - "output_type": "execute_result", - "data": { - "text/plain": [ - " id\n", - "0 1\n", - "1 2\n", - "2 3\n", - "3 4\n", - "4 5" - ], - "text/html": [ - "\n", - "
\n", - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
id
01
12
23
34
45
\n", - "
\n", - "
\n", - "\n", - "
\n", - " \n", - "\n", - " \n", - "\n", - " \n", - "
\n", - "\n", - "\n", - "
\n", - " \n", - "\n", - "\n", - "\n", - " \n", - "
\n", - "
\n", - "
\n" - ], - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "dataframe", - "summary": "{\n \"name\": \"get_ipython()\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"id\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1,\n \"min\": 1,\n \"max\": 5,\n \"samples\": [\n 2,\n 5,\n 3\n ],\n \"num_unique_values\": 5,\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" - } - }, - "metadata": {}, - "execution_count": 11 - } - ], - "source": [ - "%%time\n", - "co_g = graphistry.edges(cudf.DataFrame(co_df), 's', 'd').materialize_nodes(engine='cudf')\n", - "co_g = co_g.nodes(lambda g: g._nodes.to_pandas()).edges(lambda g: g._edges.to_pandas())\n", - "print(co_g._nodes.shape, co_g._edges.shape)\n", - "co_g._nodes.head(5)" - ] - }, - { - "cell_type": "markdown", - "source": [ - "on the Orkut data, simpler chain operations over several different hops -- **10-50x** speed increases" - ], - "metadata": { - "id": "G4f19-djBd7J" - } - }, - { - "cell_type": "code", - "source": [ - "for n_hop in [1,2,3,4,5,6]:\n", - " start_nodes = pd.DataFrame({fg._node: [0]})\n", - " start0 = time.time()\n", - " for i in range(10):\n", - " out = co_g.chain([ n({'id': 1}), e_forward(hops=n_hop)])._nodes\n", - " end0 = time.time()\n", - " T0 = end0-start0\n", - " co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - " start1 = time.time()\n", - " for i in range(10):\n", - " out = co_gdf.chain([ n({'id': 1}), e_forward(hops=n_hop)]) end1 = time.time()\n", - " # print(fg._nodes.shape, fg._edges.shape)\n", - " # print(fg2._nodes.shape, fg2._edges.shape)\n", - " del co_gdf\n", - " del out\n", - " T1 = end1-start1\n", - " print('\\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))" - ], - "metadata": { - "id": "yWabsh_k-tgy" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "and similarly for these hop operations -- 10-40x speed increases" - ], - "metadata": { - "id": "v-gVx5trBeSl" - } - }, - { - "cell_type": "code", - "source": [ - "for n_hop in [1,2,3,4,5]:\n", - " start_nodes = pd.DataFrame({'id': [1]})\n", - " start0 = time.time()\n", - " for i in range(1):\n", - " g2 = co_g.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=n_hop)\n", - " end0 = time.time()\n", - " T0 = end0-start0\n", - " start_nodes = cudf.DataFrame({'id': [1]})\n", - " co_gdf = co_g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - " start1 = time.time()\n", - " for i in range(1):\n", - " g2 = gg_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=n_hop)\n", - " end1 = time.time()\n", - " # print(fg._nodes.shape, fg._edges.shape)\n", - " # print(fg2._nodes.shape, fg2._edges.shape)\n", - " del start_nodes\n", - " del co_gdf\n", - " del g2\n", - " T1 = end1-start1\n", - " print('\\nCPU',n_hop,'hop chain time:',np.round(T0,4),'\\nGPU',n_hop,'hop chain time:',np.round(T1,4),'\\nspeedup:', np.round(T0/T1,4))" - ], - "metadata": { - "id": "kHZatWCB_qwd" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "eiXFImxF-rzw" - }, - "outputs": [], - "source": [ - "!lscpu\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "wJohLi58-sN5", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "07a499d6-3109-486e-d387-002abd133d22" - }, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - " total used free shared buff/cache available\n", - "Mem: 12Gi 5.8Gi 1.6Gi 1.0Gi 5.2Gi 5.5Gi\n", - "Swap: 0B 0B 0B\n" - ] - } - ], - "source": [ - "!free -h\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "Ygc2nrkznlCu" - }, - "outputs": [], - "source": [] - } - ], - "metadata": { - "accelerator": "GPU", - "colab": { - "gpuType": "T4", - "provenance": [], - "toc_visible": true, - "include_colab_link": true - }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - }, - "language_info": { - "name": "python" - } - }, - "nbformat": 4, - "nbformat_minor": 0 -} \ No newline at end of file diff --git a/demos/gfql/simple_gfql_notebook.ipynb b/demos/gfql/simple_gfql_notebook.ipynb deleted file mode 100644 index 5f27f7b619..0000000000 --- a/demos/gfql/simple_gfql_notebook.ipynb +++ /dev/null @@ -1,264 +0,0 @@ -{ - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [], - "gpuType": "T4", - "include_colab_link": true - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - }, - "accelerator": "GPU" - }, - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "view-in-github", - "colab_type": "text" - }, - "source": [ - "\"Open" - ] - }, - { - "cell_type": "markdown", - "source": [ - "# simple GFQL demo on Twitter data\n", - "\n", - "* Twitter\tNetwork with 81,306 Nodes\tand 2,420,766 Edges\n", - "\n", - "* The single-threaded CPU mode benefits from GFQL's novel dataframe engine, and the GPU mode further adds single-GPU acceleration. Both the chain() and hop() methods are examined.\n", - "\n", - "* The benchmark does not examine bigger-than-memory and distributed scenarios. The provided results here are from running on a free Google Colab T4 runtime, with a 2.2GHz Intel CPU (12 GB CPU RAM) and T4 Nvidia GPU (16 GB GPU RAM)." - ], - "metadata": { - "id": "Sm80AgJOJ3-c" - } - }, - { - "cell_type": "markdown", - "source": [ - "## Install, Import, Load" - ], - "metadata": { - "id": "g7s-qBKqE9eC" - } - }, - { - "cell_type": "code", - "source": [ - "# !pip install --extra-index-url=https://pypi.nvidia.com cuml-cu12 cudf-cu12\n", - "import cudf\n", - "cudf.__version__\n", - "\n", - "!pip install -q igraph\n", - "!pip install -q graphistry\n", - "\n", - "import pandas as pd\n", - "import graphistry, time, cProfile\n", - "\n", - "from graphistry import (\n", - "\n", - " # graph operators\n", - " n, e_undirected, e_forward, e_reverse,\n", - "\n", - " # attribute predicates\n", - " is_in, ge, startswith, contains, match as match_re\n", - ")\n", - "graphistry.__version__" - ], - "metadata": { - "id": "JTdSJgquBnGd", - "colab": { - "base_uri": "https://localhost:8080/", - "height": 35 - }, - "outputId": "4d1da186-28ab-4236-a569-4dd8760c4715" - }, - "execution_count": 5, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "'0.33.0'" - ], - "application/vnd.google.colaboratory.intrinsic+json": { - "type": "string" - } - }, - "metadata": {}, - "execution_count": 5 - } - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "yLJjLYAWbRXH", - "outputId": "e0177c6d-7d74-449a-c72b-28241516aaf0" - }, - "outputs": [ - { - "output_type": "execute_result", - "data": { - "text/plain": [ - "(81306, 1)" - ] - }, - "metadata": {}, - "execution_count": 2 - } - ], - "source": [ - "te_df = pd.read_csv('https://snap.stanford.edu/data/twitter_combined.txt.gz', sep=' ', names=['s', 'd'])\n", - "g = graphistry.edges(te_df, 's', 'd').materialize_nodes()" - ] - }, - { - "cell_type": "markdown", - "source": [ - "## .chain() CPU v GPU" - ], - "metadata": { - "id": "c3vuo0yVFDCs" - } - }, - { - "cell_type": "code", - "source": [ - "start = time.time()\n", - "\n", - "for i in range(10):\n", - " g2 = g.chain([n({'id': 17116707}), e_forward(hops=1)])\n", - "g2._nodes.shape, g2._edges.shape\n", - "\n", - "end1 = time.time()\n", - "T1 = end1 - start" - ], - "metadata": { - "id": "wEzyOOymCcsj" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "start = time.time()\n", - "\n", - "g_gdf = g.nodes(lambda g: cudf.DataFrame(g._nodes)).edges(lambda g: cudf.DataFrame(g._edges))\n", - "for i in range(10):\n", - " out = g_gdf.chain([n({'id': 17116707}), e_forward(hops=1)])._nodes\n", - "del g_gdf\n", - "del out\n", - "\n", - "end2 = time.time()\n", - "T2= end2 - start\n", - "print('CPU time:',T1, '\\nGPU time:', T2, '\\nspeedup:', T1/T2)" - ], - "metadata": { - "id": "yKoNh5UgClIr", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "5c9545e1-5a6c-45db-b0ab-a199939e8ebd" - }, - "execution_count": 17, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "CPU time: 17.837570190429688 \n", - "GPU time: 2.0647764205932617 \n", - "speedup: 8.638983868919091\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "## .hop() CPU v GPU\n", - "\n", - "* simpler tasks can witness greater speedup\n", - "\n" - ], - "metadata": { - "id": "KrXZ7ajHFJ3z" - } - }, - { - "cell_type": "code", - "source": [ - "start = time.time()\n", - "start_nodes = pd.DataFrame({g._node: [17116707]})\n", - "for i in range(10):\n", - " g2 = g.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " hops=8)\n", - "\n", - "end1 = time.time()\n", - "T1 = end1 - start" - ], - "metadata": { - "id": "CJt_8YTPCtZM" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "start = time.time()\n", - "start_nodes = cudf.DataFrame({g._node: [17116707]})\n", - "g_gdf = g.nodes(cudf.from_pandas(g._nodes)).edges(cudf.from_pandas(g._edges))\n", - "for i in range(10):\n", - " g2 = g_gdf.hop(\n", - " nodes=start_nodes,\n", - " direction='forward',\n", - " engine = 'cudf', # one can also set `engine = cudf`\n", - " hops=8)\n", - "del start_nodes\n", - "del g_gdf\n", - "del g2\n", - "\n", - "end2 = time.time()\n", - "T2= end2 - start\n", - "print('CPU time:',T1, '\\nGPU time:', T2, '\\nspeedup:', T1/T2)" - ], - "metadata": { - "id": "fOC7ODIeFTI6", - "colab": { - "base_uri": "https://localhost:8080/" - }, - "outputId": "ba349462-caee-4f42-8f45-fa7c883c54bc" - }, - "execution_count": 26, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "CPU time: 40.91506862640381 \n", - "GPU time: 2.8351004123687744 \n", - "speedup: 14.431611821543413\n" - ] - } - ] - } - ] -} \ No newline at end of file From e371bdf693fb78bc635dd2d34957e9a858543fe5 Mon Sep 17 00:00:00 2001 From: Daniel Date: Fri, 12 Jul 2024 08:34:24 +0200 Subject: [PATCH 3/5] remove stale links --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 3a8fbb2166..5db55aa816 100644 --- a/README.md +++ b/README.md @@ -147,7 +147,7 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit g2.plot() ``` -* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), [chain and hop demo](demos/gfql/simple_gfql_notebook.ipynb), [benchmark](demos/gfql/gfql_cpv_gpu_enchmark.ipynb) +* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb),[benchmark](demos/gfql/gfql_hops_cpu_gpu.ipynb) Run Cypher-style graph queries natively on dataframes without going to a database or Java with GFQL: @@ -1248,7 +1248,7 @@ assert 'pagerank' in g2._nodes.columns PyGraphistry supports GFQL, its PyData-native variant of the popular Cypher graph query language, meaning you can do graph pattern matching directly from Pandas dataframes without installing a database or Java -See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), [starting with chain and hop](demos/gfql/simple_gfql_notebook.ipynb) and the CPU/GPU [benchmark](demos/gfql/gfql_cpv_gpu_enchmark.ipynb) +See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), and the CPU/GPU [benchmark](demos/gfql/gfql_hops_cpu_gpu.ipynb) Traverse within a graph, or expand one graph against another From 3407edc9f1df2617a81bf97a81180c2334c6a84c Mon Sep 17 00:00:00 2001 From: Daniel Date: Mon, 15 Jul 2024 16:05:21 +0200 Subject: [PATCH 4/5] resolve readme url x2 --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 5db55aa816..cf131684cb 100644 --- a/README.md +++ b/README.md @@ -147,7 +147,7 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit g2.plot() ``` -* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb),[benchmark](demos/gfql/gfql_hops_cpu_gpu.ipynb) +* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb),[benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb) Run Cypher-style graph queries natively on dataframes without going to a database or Java with GFQL: @@ -1248,7 +1248,7 @@ assert 'pagerank' in g2._nodes.columns PyGraphistry supports GFQL, its PyData-native variant of the popular Cypher graph query language, meaning you can do graph pattern matching directly from Pandas dataframes without installing a database or Java -See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), and the CPU/GPU [benchmark](demos/gfql/gfql_hops_cpu_gpu.ipynb) +See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), and the CPU/GPU [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb) Traverse within a graph, or expand one graph against another From 4f53be7254c80d6fb79332f9df1000d022032427 Mon Sep 17 00:00:00 2001 From: Daniel Date: Wed, 7 Aug 2024 12:04:09 +0200 Subject: [PATCH 5/5] backout md url changes --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index a50e260f76..afc7b6c832 100644 --- a/README.md +++ b/README.md @@ -147,7 +147,7 @@ It is easy to turn arbitrary data into insightful graphs. PyGraphistry comes wit g2.plot() ``` -* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](https://github.com/graphistry/pygraphistry/blob/master/demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), [benchmark](https://github.com/graphistry/pygraphistry/blob/master/demos/gfql/benchmark_hops_cpu_gpu.ipynb)) +* GFQL: Cypher-style graph pattern mining queries on dataframes with optional GPU acceleration ([ipynb demo](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb), [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb)) Run Cypher-style graph queries natively on dataframes without going to a database or Java with GFQL: @@ -1250,7 +1250,7 @@ assert 'pagerank' in g2._nodes.columns PyGraphistry supports GFQL, its PyData-native variant of the popular Cypher graph query language, meaning you can do graph pattern matching directly from Pandas dataframes without installing a database or Java -See also [graph pattern matching tutorial](https://github.com/graphistry/pygraphistry/tree/master/demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb) and the CPU/GPU [benchmark](https://github.com/graphistry/pygraphistry/tree/master/demos/gfql/benchmark_hops_cpu_gpu.ipynb) +See also [graph pattern matching tutorial](demos/more_examples/graphistry_features/hop_and_chain_graph_pattern_mining.ipynb) and the CPU/GPU [benchmark](demos/gfql/benchmark_hops_cpu_gpu.ipynb) Traverse within a graph, or expand one graph against another