Big Data Architecture for Distributed Systems 🌐

Advanced Big-Data Applications and Indexing Techniques 📊

Professor: Dr. Marwan Sabbouh

Introduction 📜

In the era of Big Data 📊, traditional relational database techniques are struggling to cope with the unprecedented volume 📈, variety 🌈, and velocity 💨 of new data points. This has created bottlenecks at every stage of the data processing chain, necessitating a shift towards more robust Big Data architectures, particularly for distributed software systems.

Challenges in Data Handling 🚧

Data Ingestion and Validation 🔄

Rapid Validation 🔍: At the outer endpoints of distributed systems, there's a critical need to quickly validate incoming data to maintain quality.

Data Storage 💾

Write Latency ⏱️: For applications with a significant user base, write latency must not exceed tens of milliseconds to ensure real-time responsiveness.

Data Indexing 📑

Indexer Throughput 🚀: The system must maintain a high indexer throughput rate to keep up with the increasing velocity of incoming data.
Advanced Indexing Features 🛠️: Support for logical operators, wildcards, geolocation, join, and aggregate queries is essential.

Near Real-Time Predictive Analytics ⏳

Minimized Workflow Duration 🕒: The duration for ingesting, storing, indexing, and analyzing data must be kept to a minimum to leverage new data points quickly for real-time analytics.

Schemaless System Design 📐

Extensibility and Flexibility 👐: The system must support the extensibility of its data models and the addition of new models without requiring new programming, emphasizing a schema-less design.

Rest APIs that can handle any structured data in JSON format, emphasizing advanced big data applications and indexing techniques.

Key Concepts 🔑

JSON and REST API 🌐

JSON as Graph: Handling JSON data represented as a graph 📈.
REST API Features 🛠️:
- Specify URIs, status codes, headers, data model, version.
- CRUD operations support, including merge/patch and cascaded delete.
- Etag, If-Match, If-None-Match headers.
- Advanced semantics for operations like conditional read and write, update if not changed.
Validation: REST API with support for validation using JSON Schema 📝.
Controller: Validates incoming payloads (json document with multiple nested objects) against the JSON schema.
Strongly Typed Data Protocol: Utilization of AJV for JSON schema validation 📊.

Storage and Search 🗄️

Data Storage: Storing data in a key/value store, with Redis used for caching solutions 💾.
Elastic Search: For search and retrieval capabilities, including parent-child indexing and field-specific searches 🔍.

Security and Authorization 🔒

Security Mechanism: Using RS 256 for security, with Google IDP for generating tokens.
OAuth 2.0 Workflow 💳:
- Client app registration and token request with OAuth/Authorization Server.
- Use of bearer tokens in HTTP requests.
- Relevant OAuth details like client ID, client secret, token type, redirect URI, auth URL, access token URL, and scope.

Setup and Execution Steps 🚀

Elasticsearch and Kibana 📊
# Start Elasticsearch
cd elasticsearch-7.17.15/
./bin/elasticsearch
# Access Elasticsearch at http://localhost:9200/
# Access Kibana Console at http://localhost:5601/app/dev_tools#/console

# Start Kibana
cd kibana-7.17.15-darwin-x86_64/
./bin/kibana

Redis 🗃️
# Start Redis
redis-cli
# Default port: 6379

RabbitMQ 📬
# Start RabbitMQ
brew services start rabbitmq
# Access RabbitMQ at http://localhost:15672/#/

NodeJS Application Run Steps 💻
# Install dependencies
npm install

# Start the application
nodemon index.js

## Sample JSON Payload

```json
{
  "planCostShares": {
    "deductible": 2000,
    "_org": "example.com",
    "copay": 1,
    "objectId": "1234vxc2324sdf-501",
    "objectType": "membercostshare"
  },
  "linkedPlanServices": [
    {
      "linkedService": {
        "_org": "example.com",
        "objectId": "1234520xvc30asdf-502",
        "objectType": "service",
        "name": "Yearly physical"
      },
      "planserviceCostShares": {
        "deductible": 10,
        "_org": "example.com",
        "copay": 20,
        "objectId": "1234512xvc1314asdfs-503",
        "objectType": "membercostshare"
      },
      "_org": "example.com",
      "objectId": "27283xvx9asdff-504",
      "objectType": "planservice"
    },
    {
      "linkedService": {
        "_org": "example.com",
        "objectId": "1234520xvc30sfs-505",
        "objectType": "service",
        "name": "linked service"
      },
      "planserviceCostShares": {
        "deductible": 10,
        "_org": "example.com",
        "copay": 175,
        "objectId": "1234512xvc1314sdfsd-506",
        "objectType": "membercostshare"
      },
      "_org": "example.com",
      "objectId": "27283xvx9sdf-507",
      "objectType": "planservice"
    }
  ],
  "_org": "example.com",
  "objectId": "12xvxc345ssdsds-508",
  "objectType": "plan",
  "planType": "inNetwork",
  "creationDate": "12-12-2017"
}

## Testing Queries for Elasticsearch 🧪

### Delete Insurance Plan Index
```json
DELETE /insurance_plan_index

### Search in Insurance Plan Index

# Query All Documents
GET /insurance_plan_index/_search
{
  "query":{
    "match_all": {}
  }
}


# Get Parent with Specific Child Conditions
GET /insurance_plan_index/_search
{
  "query": {
    "has_child": {
      "type": "planserviceCostShares",
      "query": {
        "range": {
          "copay": {
            "gte": 1
          }
        }
      }
    }
  }
}

# Get Child Objects with Specific Parent
GET /insurance_plan_index/_search
{
  "query": {
    "has_parent": {
      "parent_type": "linkedPlanServices",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "objectId": "27283xvx9asdff-504"
              }
            }
          ]
        }
      }
    }
  }
}

# Get Objects with a Specific Parent Type
GET /insurance_plan_index/_search
{
  "query": {
    "has_parent": {
      "parent_type": "plan",
      "query": {
        "match_all": {}
      }
    }
  }
}

# Search by Object ID
GET /insurance_plan_index/_search
{
  "query": {
    "match" : {
      "objectId": "1234512xvc1314sdfsd-506"
    }
  }
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configuration		configuration
controllers		controllers
exceptions		exceptions
models		models
repository		repository
services		services
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
architecture.png		architecture.png
index.js		index.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Architecture for Distributed Systems 🌐

Advanced Big-Data Applications and Indexing Techniques 📊

Introduction 📜

Challenges in Data Handling 🚧

Data Ingestion and Validation 🔄

Data Storage 💾

Data Indexing 📑

Near Real-Time Predictive Analytics ⏳

Schemaless System Design 📐

Key Concepts 🔑

JSON and REST API 🌐

Storage and Search 🗄️

Security and Authorization 🔒

Setup and Execution Steps 🚀

About

Releases

Packages

Languages

License

Nikhil-Reddy-Karukonda/Distributed_Data_Systems

Folders and files

Latest commit

History

Repository files navigation

Big Data Architecture for Distributed Systems 🌐

Advanced Big-Data Applications and Indexing Techniques 📊

Introduction 📜

Challenges in Data Handling 🚧

Data Ingestion and Validation 🔄

Data Storage 💾

Data Indexing 📑

Near Real-Time Predictive Analytics ⏳

Schemaless System Design 📐

Key Concepts 🔑

JSON and REST API 🌐

Storage and Search 🗄️

Security and Authorization 🔒

Setup and Execution Steps 🚀

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages