DataEngineering - Santhi's Blog

Different Types of Digital Data

Data plays vital role and bcoming irreplaceable in the organisation irrespective of how big or small it is . Data is available everywhere (internally, externally ) . It provides information and insights for business. Organisation understand, manage and process data to get useful information and insights for making business decisions. There are three types of

Distributed SQL Basics

In this article, we will look at the basics of Distributed SQL. In 2012, Google’s spanner has coined the concept of Distributed SQL databases. What is Distributed SQL? Its single relational database deployed on a cluster of network servers. It automatically replicates and distributes data among the server nodes. It is strongly consistent and provides

NewSQL Basics

In this article, we will look at the basics of NewSQL. NewSQL was first conceived in 2011 by Matthew Aslett in a research paper. What is NewSQL? NewSQL is a database language based on concepts of RDBMS & NoSQL. It combines the performance of NoSQL (horizontal scaling) and reliability (ACID properties – Atomicity, Consistency, Isolation

Installation of Hadoop locally in Windows 11

In this article, we will look at installing Hadoop in the local machine with single node cluster setup. For studying & testing purpose, single node cluster setup is sufficient. Let’s look at how to do the installation locally in Windows11. Installation Steps: Step 1: Download URL: https://www.oracle.com/in/java/technologies/downloads/ Download URL: https://hadoop.apache.org/releases.html You have to extract the

Hadoop Basics

In this article, we will look at the basics of Hadoop and its ecosystem. It’s an open-source framework written in Java and developed by Doug Cutting in 2005. It enables processing of large data sets which is not handled efficiently by traditional methodologies like RDBMS (Relational Database Management System). What is Hadoop? Hadoop framework meant

NoSQL Basics

In this article, we will look at the basics of NoSQL and advantages/disadvantages. NoSQL stands for Not Only SQL. It is widely used in big data and other real-time web applications. It supports handling variety of data: structured, semi-structured and unstructured. What is NoSQL? – It is non-relational             . It doesn’t adhere to relational