Data Science

Step into the Data Science Glossary — a hub of definitions for the core concepts shaping this dynamic field. Whether you’re exploring statistical models, data pipelines, or machine learning algorithms, our explanations are designed to support both learners and data professionals in making sense of the data-driven world.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
B
  • Big Data

    Big Data refers to extremely large datasets that are too complex to be handled and processed by traditional data management...

    See more...
  • Big Data Modeling

    Big Data Modeling is the process of structuring large and complex datasets into models that are easier to understand, analyze,...

    See more...
  • BigQuery

    BigQuery is a cloud-based data analysis tool from Google that allows users to quickly process and analyze large datasets using...

    See more...
CD
  • Data Lake

    A data lake is a centralized repository that stores large amounts of raw data in its native format, including structured,...

    See more...
  • Data Visualization

    Data visualization is the practice of displaying data in a visual format, such as charts, graphs, or maps, to make...

    See more...
  • Decision Science

    Decision science is an interdisciplinary field that uses data, statistics, and behavioral insights to make informed decisions and solve complex...

    See more...
  • Decision Tree Machine Learning

    A decision tree is a visual model in machine learning that splits data into branches based on conditions, helping to...

    See more...
  • Deep Learning Data

    Deep learning data refers to the large and diverse datasets used to train deep neural networks, a type of machine...

    See more...
  • Directed Acyclic Graph (DAG)

    A directed acyclic graph (DAG) is a data structure consisting of nodes connected by directed edges, where the connections flow...

    See more...
  • Docker

    Docker is a platform that allows developers to build, package, and run applications in lightweight containers that are consistent across...

    See more...
E
  • Ensemble Learning

    Ensemble learning is a machine learning technique that combines the predictions of multiple models to improve accuracy, robustness, and performance.

    See more...
F
  • FastAPI

    FastAPI is a modern, high-performance web framework for building APIs in Python, designed for speed and developer efficiency.

    See more...
  • Feature Engineering

    Feature engineering is the process of creating, selecting, or transforming data attributes to improve the performance of machine learning models.

    See more...
  • Feature Selection

    Feature selection is the process of identifying and using only the most relevant attributes in a dataset to improve the...

    See more...
G
  • Google Compute

    Google Compute refers to Google Cloud's suite of compute services that provide scalable and flexible virtual machines, containers, and serverless...

    See more...
H
  • Hugging Face

    Hugging Face is an open-source platform and community that provides tools, models, and libraries for natural language processing (NLP) and...

    See more...
  • Hypothesis Testing

    Hypothesis testing is a statistical method used to determine whether a hypothesis about a dataset is supported by evidence or...

    See more...
I
  • Infrastructure as Code (IaC)

    Infrastructure as Code (IaC) is the practice of managing and provisioning IT infrastructure using machine-readable configuration files rather than manual...

    See more...
J
  • Jupyter Notebooks

    Jupyter Notebooks are interactive, open-source tools that allow users to write, run, and document Python code alongside visualizations and text...

    See more...
K
  • K-Means Clustering

    K-means clustering is an unsupervised machine learning algorithm that groups data points into a specified number of clusters based on...

    See more...
  • Keras

    Keras is a high-level, user-friendly library for building and training deep learning models, running on top of TensorFlow.

    See more...
  • KNN algorithm

    KNN, or K-Nearest Neighbors, is a machine learning algorithm that classifies data points based on the "nearest" data points it...

    See more...
LM
  • Machine Learning (Engineering)

    Machine learning engineering involves building, deploying, and maintaining machine learning systems that solve real-world problems using data-driven algorithms.

    See more...
  • Matplotlib

    Matplotlib is a popular Python library for creating static, interactive, and animated visualizations such as line graphs, bar charts, scatter...

    See more...
  • MLflow

    MLflow is an open-source platform that manages the machine learning lifecycle, including experiment tracking, model deployment, and reproducibility.

    See more...
  • MLOps (Machine Learning Operations)

    MLOps is the practice of combining machine learning development with software engineering and operations to streamline the deployment, monitoring, and...

    See more...
  • Multivariate Regression

    Multivariate regression is a statistical method used to predict the outcome of a target variable based on multiple input variables....

    See more...
N
  • Naive Bayes Classifier

    The Naive Bayes classifier is a simple probabilistic algorithm for classification that assumes features are independent, making it fast and...

    See more...
  • Neural Network

    A neural network is a machine learning model inspired by the structure of the human brain, consisting of layers of...

    See more...
  • NumPy

    NumPy is a Python library for numerical computing, providing tools to handle large, multi-dimensional arrays and perform mathematical operations efficiently.

    See more...
OP
  • Pandas Python

    Pandas is a Python library used for data manipulation and analysis, providing data structures like DataFrames to organize, clean, and...

    See more...
  • PCA (Principal Component Analysis)

    Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms high-dimensional data into fewer dimensions while preserving as much...

    See more...
  • Plotly

    Plotly is a Python library for creating interactive, web-based visualizations, such as 3D plots, dashboards, and maps.

    See more...
S
  • SARIMAX Model (Seasonal AutoRegressive Integrated Moving Average with eXogenous variables)

    SARIMAX is a statistical model used for time series forecasting, incorporating both seasonality and the influence of external (exogenous) variables.

    See more...
  • Scikit-learn

    Scikit-learn is a Python library offering tools for machine learning, including algorithms for classification, regression, clustering, and dimensionality reduction.

    See more...
  • Seaborn

    Seaborn is a Python library for creating statistical data visualizations, built on top of Matplotlib, with a focus on attractive...

    See more...
  • Secure Shell (SSH)

    Secure Shell (SSH) is a protocol for securely accessing and managing remote computers over a network using encryption.

    See more...
  • Shapley Values

    Shapley values are a game theory concept used in machine learning to fairly distribute credit among features based on their...

    See more...
  • Statistical Inference

    Statistical inference is the process of using data from a sample to make generalizations about a larger population, often involving...

    See more...
  • Statsmodels

    Statsmodels is a Python library for performing statistical modeling, hypothesis testing, and data exploration.

    See more...
  • Streamlit

    Streamlit is an open-source Python library for building interactive web applications for data visualization, machine learning models, and dashboards quickly.

    See more...
  • Structured Data

    Structured data is highly organized information stored in a fixed format, such as rows and columns in a database or...

    See more...
  • SVM (Support Vector Machine)

    Support Vector Machine (SVM) is a machine learning algorithm used for classification and regression tasks by finding a hyperplane that...

    See more...
TU
  • Unstructured Data

    Unstructured data is information that doesn’t follow a predefined format or structure, such as text, images, videos, and emails.

    See more...
V
  • Virtual Machine (VM)

    A virtual machine (VM) is a software-based simulation of a physical computer, allowing multiple operating systems to run on a...

    See more...
  • VS Code (Visual Studio Code)

    VS Code is a lightweight, open-source code editor developed by Microsoft, offering support for multiple programming languages and extensive customization...

    See more...
X
  • XGBoost

    XGBoost (Extreme Gradient Boosting) is a machine learning library that implements a fast, scalable version of gradient boosting, primarily used...

    See more...

Suscribe to our newsletter

Receive a monthly newsletter with personalized tech tips.