Skip to content

HyosungSink/Lab2-Realtime-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lab2 Realtime Analytics

Huawei Cloud Lab2 realtime analytics reference project.

This repository extracts the reusable code and configuration from the lab:

  • Generate simulated ecommerce event data with Python.
  • Ingest files with Flume and write them to Kafka.
  • Analyze Kafka streams with DLI Flink SQL.
  • Store aggregate results in RDS MySQL.
  • Visualize results in DLV.

No credentials are stored in this repository. Use environment variables or a local credential file outside the repository when running commands.

Architecture

Python generator -> Flume spooldir -> Kafka topic fludesc
                                      |
                                      v
                                  DLI Flink SQL
                                      |
                                      v
                                  RDS MySQL
                                      |
                                      v
                                  DLV dashboard

Lab Customization

The lab requires adding store 313024 / 陈致远店.

This repo includes that change in:

  • scripts/autodatagen.py: can generate records for store 313024.
  • sql/mysql_schema.sql: inserts 陈致远店 into desc_store_info.

Repository Layout

conf/
  flume.properties.template       Flume spooldir -> Kafka template
docs/
  environment.md                  Environment and service checklist
  runbook.md                      Step-by-step lab runbook
scripts/
  autodatagen.py                  Test data generator
  create_kafka_topic.sh           Kafka topic helper
  manage_cron.sh                  Start/stop periodic data generation
  verify_mysql.py                 Query result tables via env vars
sql/
  flink_job.sql                   DLI Flink SQL template
  mysql_schema.sql                RDS MySQL schema and dimension data

Quick Start

  1. Prepare RDS tables:
mysql -h "$MYSQL_HOST" -P "${MYSQL_PORT:-3306}" -u "$MYSQL_USER" -p < sql/mysql_schema.sql
  1. Upload scripts/autodatagen.py to the MRS master node, for example:
scp scripts/autodatagen.py root@<MRS_MASTER_EIP>:/opt/client/autodatagen.py
ssh root@<MRS_MASTER_EIP> 'chmod 755 /opt/client/autodatagen.py'
  1. Create Kafka topic:
KAFKA_HOME=/opt/Bigdata/client/Kafka/kafka \
BOOTSTRAP_SERVERS=192.168.0.117:9092,192.168.0.153:9092,192.168.0.86:9092 \
scripts/create_kafka_topic.sh
  1. Configure Flume from conf/flume.properties.template, then start Flume with the full FusionInsight Flume runtime.

  2. Create a DLI Flink OpenSource SQL job from sql/flink_job.sql, replacing all ${...} placeholders.

  3. Generate a fixed 235-row validation batch:

python3 /opt/client/autodatagen.py /tmp/flume_spooldir/lab2_100_1.txt 100
python3 /opt/client/autodatagen.py /tmp/flume_spooldir/lab2_100_2.txt 100
python3 /opt/client/autodatagen.py /tmp/flume_spooldir/lab2_35_1.txt 35
  1. Verify RDS results:
MYSQL_HOST=<rds-private-ip> MYSQL_USER=root MYSQL_PASSWORD=<password> \
python3 scripts/verify_mysql.py

For the full procedure, see docs/runbook.md.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors