Elasticsearch Glossary

Introduction

Trying to wrap your head around data indexing and search can feel like navigating a maze without a map. But fear not! Our comprehensive guide is here to shed light on the essential Elasticsearch terminology you need to know.

We'll take you on a deep dive into the terms powering data indexing, search mechanisms, and performance optimization. By the time you're done, you'll have a roadmap of key terms and insights to guide you through this intricate landscape.

Elasticsearch Terms

@metadata: A designated area for keeping information that you don't wish to include in the final output events. For instance, the @metadata field is practical for creating temporary fields that can be utilized in conditional checks.

A

Action: The particular response that is triggered when a monitoring rule is activated is referred to as a rule-specific response. A single rule can have several actions. For more information, refer to Connectors and actions. In Elastic Security, actions are responsible for sending notifications through other systems when a detection alert is generated, such as via email, Slack, PagerDuty, and Webhook.

Administration Console: Elastic Cloud Enterprise contains a component known as the Cloud UI's API server. Additionally, it is responsible for synchronizing cluster and allocator data from ZooKeeper to Elasticsearch.

Advanced Settings: Allows you to manage the look and functionality of Kibana by configuring settings such as date format, default index, and various other attributes. This feature is a part of Kibana Stack Management and can be further explored in the Advanced Settings section.

Agent policy: A set of inputs and configurations that specify the data to be gathered by the Elastic Agent. An agent policy can be applied to either a single agent or shared among a group of agents, making it more convenient to manage multiple agents at scale.

Alias: An alternate name for a collection of data streams or indices is referred to as an alias. Most Elasticsearch APIs permit the use of an alias instead of a data stream or index.

Allocator Affinity: Regulates the allocation of Elastic Stack deployments across the available set of allocators within your Elastic Cloud Enterprise setup.

Allocator Tag: In Elastic Cloud Enterprise, allocators represent hardware resources designated for Elastic Stack deployments. Instance configurations utilize allocators to determine where specific instances of the Elastic Stack should be installed.

Allocator: Oversees hosts housing Elasticsearch and Kibana nodes, governing the lifecycle of these nodes by generating new containers and overseeing the nodes within these containers upon request. This process is instrumental in expanding the capacity of your Elastic Cloud Enterprise deployment.

Analysis: The act of transforming unstructured text into a format that is optimized for search is known as indexing.

Annotation: An annotation is a method of enhancing a data presentation with informative domain knowledge.

Anomaly Detection Job: Anomaly detection jobs encompass the configuration details and metadata required to execute an analytics assignment.

API key: In Elasticsearch, a distinct identifier for authentication is referred to as an API key, which is utilized for authentication when transport layer security (TLS) is activated. In such cases, all requests must be authenticated using an API key or a combination of a username and password.

APM agent: An open-source library, written in the same programming language as your service, that integrates with your code and gathers performance data and runtime errors by monitoring the code as it executes.

APM Server: An open-source tool that accepts data from APM agents and forwards it to Elasticsearch.

App: A primary component of Kibana that can be accessed through the side navigation is known as an app. This category encompasses core Kibana components, such as Discover and Dashboard, solutions like Observability and Security, and specialized tools like Maps and Stack Management.

Auto-Follow Pattern: An index pattern that automatically sets up new indices as follower indices for cross-cluster replication.

Availability Zone: Encompasses resources within an Elastic Cloud Enterprise setup that are segregated from other availability zones to provide protection against failures. This could be a rack, a server zone, or another logical constraint that establishes a boundary to mitigate failures. In a highly resilient cluster, the nodes are distributed across two or three availability zones to ensure the cluster's resilience in the event of an entire availability zone failure.

B

Basemap: The foundational context required to situate the location of a map is referred to as the background detail.

Beats Runner: Employed for transmitting Filebeat and Metricbeat data to the logging cluster.

Bucket Aggregation: An aggregation that generates buckets containing sets of documents, with each bucket linked to a criterion (depending on the aggregation type) that establishes whether a document in the current context is suitable for inclusion in the bucket.

Bucket: In Kibana, a group of documents that share certain attributes is called a bucket, which could be categorized based on characteristics such as color, distance, or date range. For instance, documents that match specific criteria might be grouped by color, distance, or date range. Machine learning features utilize the bucket concept to segment time series data into batches for processing, with the bucket span being a configuration parameter for anomaly detection jobs that specifies the time interval for summarizing and modeling data. This interval is usually between 5 minutes to 1 hour, depending on the data's characteristics. When setting the bucket span, consider the desired analysis granularity, input data frequency, typical anomaly duration, and alerting frequency.

C

Canvas Expression Language: A pipeline-based language for data manipulation and visualization, incorporating several functions and features like table transformations, type casting, and sub-expressions, with support for TinyMath functions to perform intricate mathematical operations.

Canvas: Allows for the creation of presentations and infographics that can directly incorporate real-time data from Elasticsearch.

Certainty: The minimum number of documents that must include a term pair to be deemed a valuable connection in a graph is defined by the co-occurrence threshold. The client forwarder is utilized for secure internal communications between various components of Elastic Cloud Enterprise and ZooKeeper.

Cloud UI: The web-based interface for managing your Elastic Cloud Enterprise setup is facilitated by the administration console.

Cluster: A cluster is a collection of one or more interconnected Elasticsearch nodes, as explained in the context of clusters, nodes, and shards.
Clusters represent a layer type and display option in the Maps application, which exhibits a cluster symbol across a grid on the map, with one symbol per grid cluster. The cluster location is calculated as the weighted centroid of all documents within the grid cell.

Codec Plugin: Logstash plugin known as a codec, which modifies the data representation of an event, operates as stream filters for input or output, and allows for the separation of message transport from serialization. Popular codecs include json, msgpack, and plain (text).

Cold Phase: The third potential stage in the index lifecycle is the cold phase. During this phase, data is no longer updated frequently and is rarely queried. While the data still needs to be searchable, slower query performance is acceptable.

Cold Tier: The data tier comprising nodes that store infrequently accessed time series data that is not typically updated is known as the cold data tier.

Component Template: A fundamental unit for constructing index templates, a component template can define mappings, index settings, and aliases.

Condition: The conditions that must be fulfilled to activate an alerting rule are defined by the alerting rule's trigger.

Conditional: A control flow mechanism that performs specific actions based on whether a statement (also known as a condition) is true or false. Logstash supports if, else if, and else statements, enabling the use of conditional statements to apply filters and direct events to a designated output based on user-defined conditions.

Connector: A setup that allows for connectivity with an external system, serving as the target for an action.

Console: In Kibana, a utility for engaging with the Elasticsearch REST API, enabling users to send requests, view responses, access API documentation, and review request history.

Constructor: The Elasticsearch and Kibana node container manager, directing allocators to optimize resource utilization, processes Cloud UI plan change requests, and determines the necessary cluster transformations. In highly available installations, it strategically positions cluster nodes across different availability zones to maintain resilience in the event of a complete availability zone failure.

Container: Comprises an Elastic Cloud Enterprise software instance and its dependencies, facilitating the provision of consistent environments, allocating a dedicated host resource share to nodes, and streamlining operational efforts within Elastic Cloud Enterprise.

Content Tier: The data tier housing nodes responsible for managing the indexing and query workload for content like a product catalog.

Coordinator: Comprises a logical cluster of specific Elastic Cloud Enterprise services, functioning as a distributed coordination system and resource scheduler.

Cross-Cluster Replication (CCR): Reproduces data streams and indices from distant clusters into a local cluster.

Cross-Cluster Search (CCS): Queries data streams and indices located on remote clusters from a local cluster.

Custom Rule:A series of conditions and actions that alter the anomaly detection job behavior. Filters can be employed to further restrict the scope of the rules. In Kibana, custom rules are denoted as job rules.

D

Dashboard: A curated assortment of visualizations, saved searches, and maps, offering a comprehensive understanding of your data from various angles.

Data Frame Analytics Job: Data frame analytics jobs encompass the configuration details and metadata required to execute machine learning analytics tasks on a source index and save the results in a designated index.

Data Source: A document source, database, or service that supplies the fundamental data for a map, Canvas element, or visualization. A data stream represents a named resource for managing time series data, distributing data across multiple supporting indices.

Data Tier: A group of nodes with identical data roles, usually sharing comparable hardware specifications. Data tiers encompass the content tier, hot tier, warm tier, cold tier, and frozen tier.

Data View: An entity that allows you to choose the data you wish to utilize in Kibana and specify field properties. A data view can reference one or more data streams, indices, or aliases, such as selecting log data from the previous day or all indices containing your data.

Datafeed: Anomaly detection jobs can analyze a single data batch or continuously in real time, with datafeeds retrieving data from Elasticsearch for analysis.

Dataset: A grouping of data that shares a consistent structure, with the dataset name often indicating its source.

Delete Phase: The final stage in the index lifecycle where an index is no longer required and can be safely deleted is known as the delete phase.

Deployment Template: A reusable configuration of Elastic products and solutions, used to establish an Elastic Cloud deployment.

Deployment: A group of one or more Elastic Stack products, configured to operate in unison and deployed on Elastic Cloud. Detection alerts are internal Elastic Security alerts, never received from external systems. When a rule's conditions are met, Elastic Security generates a detection alert and stores it in an Elasticsearch alerts index.

Detection Rule: In Elastic Security, background tasks run at intervals to identify and alert on suspicious activities. Detectors, within anomaly detection job configurations, specify the analysis type and fields to examine. Having multiple detectors in a job is more efficient than running separate jobs on the same data.

Director: Oversees the ZooKeeper datastore, a role that is frequently shared with the coordinator but can be separated in production environments.

Discover: Allows you to search and filter your data, focusing on the information that captures your interest.

Distributed Tracing: The comprehensive data collection of performance metrics across your microservices architecture.

Document: A JSON object that holds data stored within Elasticsearch.

Drilldown: A navigational path that preserves context (time range and filters) from the source to the destination, enabling a new perspective on the data. For example, a dashboard displaying the overall status of multiple data centers might have a drilldown to a single data center's dashboard.

E

Elastic Cloud Enterprise (ECE): The official enterprise solution for self-hosting and managing the Elastic Stack at scale, deployable on public cloud platforms like AWS, GCP, or Microsoft Azure, on your private cloud, or on bare metal infrastructure.

Elastic Cloud on Kubernetes (ECK): Developed using the Kubernetes Operator pattern, ECK enhances the fundamental Kubernetes orchestration capabilities to facilitate the deployment and management of Elastic products and solutions on Kubernetes. Edge represents a connection between nodes in a graph, indicating their relationship, with line weight reflecting the relationship's strength.

Elastic Agent: A comprehensive approach to add monitoring for logs, metrics, and other data types to a host, also capable of safeguarding hosts from security threats, retrieving data from operating systems, forwarding data from remote services or hardware, and more.

Elastic Common Schema (ECS): Elasticsearch's document schema, applicable for use cases like logging and metrics, with ECS defining a standardized set of fields, their data types, and usage guidelines. ECS enhances uniformity of event data from diverse sources.

Elastic Maps Service (EMS): A service offering foundational map tiles, shape files, and other vital components for geospatial data visualization.

Elastic Package Registry (EPR): A service provided by Elastic that centrally stores definitions of Elastic packages.

Elastic Security indices: Indexes that hold host and network source events (e.g., packetbeat-*, log-*, and winlogbeat-*). When setting up a new rule in Elastic Security, the default index pattern aligns with the values specified in the securitySolution:defaultIndex advanced setting.

Elastic Stack: The Elastic Stack, also known as the ELK Stack, is the integration of multiple Elastic products, providing a scalable and adaptable solution for managing your data.

Elasticsearch Service: Elastic Stack's official hosted solution, provided by the creators of Elasticsearch, accessible as a software-as-a-service (SaaS) offering across various cloud platforms, including AWS, GCP, and Microsoft Azure. In the context of Canvas, an element is a workpad object that displays an image, text, or visualization.

Endpoint Exception: Exemptions applied to both rules and Endpoint agents on hosts, with the some requirements for Endpoint exceptions which is Endpoint agents must be installed on the hosts, and the Elastic Endpoint Security rule must be activated.

Event Query Language (EQL): EQL is a query language designed for event-based time series data, like logs, metrics, and traces, enabling the matching of event sequences. For a better understanding, refer to EQL. An event is a single unit of information, comprising a timestamp and additional data, arriving via an input, then parsed, timestamped, and processed through the Logstash pipeline.

Exception: In Elastic Security, exceptions are incorporated into rules to prevent particular source event field values from triggering alerts.

External Alert: Alerts that Elastic Security obtains from external systems, such as Suricata.

F

Feature Controls: Allows administrators to tailor the features accessible in each space.

Feature Importance: In supervised machine learning techniques like regression and classification, feature importance signifies the extent to which a particular feature influences the prediction.

Feature Influence: Within outlier detection, feature influence scores reveal the features of a data point that contribute to its outlier characteristics.

Feature State: The indices and data streams utilized for storing configurations, history, and additional data related to an Elastic feature like Elasticsearch security or Kibana. A feature state commonly encompasses one or more system indices or data streams, along with regular indices and data streams utilized by the feature. Snapshots can be employed to archive and recover feature states.

Field Reference: Referring to an event field in Logstash involves mentioning it within an output block or filter block in the configuration file. Field references are commonly enclosed in square brackets ([]), such as [fieldname]. When referencing a top-level field, the brackets can be omitted, and the field name used directly. For nested fields, the complete path to the field is specified like [top-level field][nested field].

Field: Within a document, a key-value pair represents an event property in Logstash. For instance, properties like status code (200, 404), request path ("/", "index.html"), HTTP verb (GET, POST), client IP address, and more are associated with each event in an Apache access log. Logstash uses the term "fields" to denote these properties.

Filter Plugin: Logstash offers a plugin for intermediate event processing, known as filters. Generally, filters process event data following ingestion via inputs, involving tasks like mutating, enriching, or modifying data according to configuration rules. Conditional application of filters often depends on event characteristics. Popular filter plugins include grok, mutate, drop, clone, and geoip. Filter stages are optional in Logstash.

Filter: A query that does not assign a score to matching documents.

Fleet Server: Fleet Server is a component that centrally manages Elastic Agents, functioning as a control plane for updating agent policies, collecting status updates, and orchestrating actions across agents.

Fleet: Fleet offers a centralized method for managing Elastic Agents on a large scale. It consists of two components: The Fleet app within Kibana offers a web-based interface for adding and remotely overseeing agents, while the Fleet Server operates as the backend service responsible for agent management.

Flush: Transfers data from the transaction log to the disk for long-term storage.

Follower Index: The destination index for cross-cluster replication, where a follower index resides within a local cluster and mirrors a leader index.

Force Merge: Initiates a manual merge to decrease the number of segments in an index's shards.

Frozen Phase: The fourth potential stage in the index lifecycle is the frozen phase. During this phase, an index is no longer actively updated and is infrequently queried. While the data remains searchable, it is acceptable for queries to be notably slow.

Frozen Tier: A data tier housing nodes that store time series data accessed infrequently and typically not updated regularly.

G

Gem: A standalone code package, hosted on RubyGems.org, that Logstash plugins are bundled as Ruby Gems. The Logstash plugin manager can be utilized to administer these Logstash gems.

Geo-Point: A data type in Elasticsearch, geo-point, designed for storing point locations through latitude-longitude pairs. The format for these pairs can be derived from a string, geohash, array, well-known text, or object.

Geo-Shape: A data type in Elasticsearch, geo-shape, that accommodates various geographic primitives, such as polygons, lines, or rectangles (and more). The geo-shape field can be populated using GeoJSON or well-known text.

GeoJSON: A standard for encoding geospatial information, GeoJSON is utilized as a file format often employed in the Maps application for uploading geospatial data files.

Graph: A data model and graphical representation illustrating relationships among a group of entities. Each entity is depicted as a node, while the connections between nodes are represented by edges.

Grok Debugger: A utility for constructing and troubleshooting grok patterns, which excels at parsing syslog, Apache, and other webserver logs.

H

Hardware Profile: In Elastic Cloud, a pre-configured deployment template designed to cater to a specific use case for the Elastic Stack, such as a compute-optimized deployment that emphasizes high vCPU for search-intensive scenarios.

Heat Map: A layer category in the Maps application, heat maps aggregate locations to display higher (or lower) densities. Heat maps present a visualization utilizing color-coded cells or regions to examine patterns across multiple dimensions. Refer to the Heat map layer for further details.

Hidden Data Stream or Index: By default, a data stream or index is typically excluded from the majority of index patterns.

Host Runner (Runner): Within Elastic Cloud Enterprise, a local control agent operates on all hosts to deploy local containers according to role definitions. It guarantees the presence and functionality of containers assigned to the host, creating or recreating them as needed.

Hot Phase: The initial potential stage in the index lifecycle is the hot phase, characterized by active updates and queries on the index.

Hot Thread: A Java thread exhibiting elevated CPU usage and running for an extended duration compared to usual.

Hot Tier: A data tier comprising nodes responsible for managing the indexing workload for time series data, including logs or metrics. This tier maintains your most current and frequently accessed data.

I

ID: An identifier assigned to a document, where each document ID must be distinct within an index.

Index: Automatically configures the mappings, index settings, and aliases for new indices that align with its index pattern. Index templates are also applicable for creating data streams.

Indexer: A Logstash instance assigned to interact with an Elasticsearch cluster for the purpose of indexing event data.

Index Lifecycle Policy: Defines the transitions of an index through different phases in the index lifecycle and outlines the corresponding actions to be executed at each phase.

Index Lifecycle: An index can progress through five distinct phases: hot, warm, cold, frozen, and delete.

Index Pattern: In Elasticsearch, a string incorporating a wildcard (*) pattern that can correspond to multiple data streams, indices, or aliases.

Index Template: Automatically sets the mappings, index configurations, and aliases for newly created indices that match its index pattern. Index templates can also be employed to establish data streams.

Inference: A machine learning capability that empowers the use of supervised learning techniques, such as classification, regression, or natural language processing, in a continuous manner by employing trained models against incoming data.

Inference Aggregation: A pipeline aggregation that utilizes a trained model within an aggregation to make inferences on the results field of the parent bucket aggregation. This functionality allows for the application of supervised machine learning during search operations.

Indicator Index: Indices in Elastic Security that store suspicious field values. Indicator match rules utilize these indices to compare their field values with the source event values present in Elastic Security indices.

Inference Processor: A processor outlined in an ingest pipeline that applies a trained model to infer against the data being processed in the pipeline.

Influencer: Influencers are entities that potentially contributed to an anomaly within a specific bucket in an anomaly detection task.

Ingestion: The activity of gathering and transmitting data from diverse data sources to Elasticsearch.

Input Plugin: A Logstash extension that extracts event data from a particular source is referred to as an input plugin, which represents the initial phase in the Logstash event processing pipeline. Popular input plugins encompass file, syslog, redis, and beats.

Instance: A component of the Elastic Stack operating within an Elastic Cloud deployment, like an Elasticsearch node or a Kibana instance. Opting for additional availability zones triggers the automatic creation of additional instances.

Instance Configuration: In Elastic Cloud, facilitates the operation of Elastic Stack instances on compatible hardware resources by employing allocator tags for filtering. Utilized as fundamental components for deployment templates.

Instance Type: In Elastic Cloud, classifications for instances that denote Elastic features or cluster node types, including master, ml, or data.

Instrumentation: Expanding application code to monitor the areas where your application is utilizing time. Code becomes instrumented when it gathers and transmits this performance data to APM (Application Performance Monitoring).

Integration: A straightforward method for external systems to link with the Elastic Stack, offering out-of-the-box assets for seamless setup, whether it's for data collection or safeguarding systems against security threats, with numerous options requiring just a single click.

Integration Policy: A configuration of an integration tailored for a specific application, for instance, collecting logs from a particular file.

J

Job: Within machine learning, jobs encompass the essential configuration details and metadata required to execute an analytical task. These jobs are categorized into two types: anomaly detection jobs and data frame analytics jobs.

K

Kibana privilege: Empowers administrators to grant users read-only, read-write, or no access to specific features within Kibana spaces.

Kibana Query Language (KQL): The primary language used for querying within Kibana. KQL offers scripted fields support as well.

Kibana: An interface that enables visualization of Elasticsearch data and facilitates navigation through the Elastic Stack.

L

Labs: A feature currently under development or testing in Canvas or Dashboard, which users can experiment with and offer feedback on. Upon activation, Labs becomes visible in the toolbar.

Leader Index: The original index for cross-cluster replication, where a leader index is established on a remote cluster and replicated to follower indices.

Len: Enables the creation of visualizations through the drag-and-drop method for data fields. Lens offers intelligent visualization suggestions for your data, facilitating seamless switching between visualization types.

Local Cluster: A cluster that retrieves data from a remote cluster, either in cross-cluster search or cross-cluster replication.

Lucene Query Syntax: The query syntax used in Kibana's older query language. The Lucene query syntax can be accessed from the options menu in the query bar and within Advanced Settings.

M

Machine Learning Node: A machine learning node is characterized by having xpack.ml.enabled set to true and including ml in node.roles. To utilize machine learning features, it's mandatory to have at least one machine learning node in your cluster.

Map: A visualization of geographic data using symbols and labels.

Mapping: Specifies the storage of a document, its fields, and metadata in Elasticsearch, analogous to a schema definition.

Master Node: Manages write requests for the cluster and disseminates changes to other nodes in a sequential manner. Every cluster features a sole master node selected automatically by the cluster, which is substituted if the current master node encounters a failure.

Merge: The action of merging a shard's smaller Lucene segments into a larger one, a process that Elasticsearch handles automatically.

Message Broker: Known as message buffer or queue, a message broker is an external software, such as Redis, Kafka, or RabbitMQ, that temporarily stores messages from the Logstash shipper instance, waiting to be processed by the Logstash indexer instance.

Metric Aggregation: An aggregation that calculates and monitors metrics for a group of documents.

Module: Pre-configured settings for frequent data sources, designed to streamline the collection, parsing, and visualization of logs and metrics. A network endpoint that is supervised to monitor the performance and availability of applications and services.

Multi-Field: A field that's mapped differently in multiple instances.

N

Namespace: A customizable data grouping based on user preferences, such as an environment (dev, prod, or qa), a team, or a strategic business unit.

Natural Language Processing (NLP): A machine learning feature that allows for operations such as language identification, recognition of named entities (NER), text classification, or text embedding.

no-op: Within Elastic Cloud, implementing a rolling update on your deployment without making any configuration changes. This type of update can be beneficial for addressing specific health warnings.

Node: An individual Elasticsearch server. A cluster can be formed by one or more nodes.

O

Observability: Integrating your logs, metrics, uptime data, and application traces to deliver detailed insights and context regarding the behavior of services operating within your environments.

Output Plugin: A plugin for Logstash that exports event data to a particular endpoint is called an output plugin. Output plugins represent the last stage in the event pipeline and are responsible for delivering processed events to their final destinations. Some widely-used output plugins are elasticsearch, file, graphite, and statsd.

P

Painless Lab: A dynamic code editor that enables real-time testing and debugging of Painless scripts.

Panel: A dashboard element that includes either a query component or a visualization, like a chart, table, or list.

Pipeline: The term "pipeline" refers to the path of events through the Logstash workflow, which usually consists of several stages, including input, filter, and output. Input stages extract data from a source and convert it into events, while filter stages, which are optional, can modify event data. Output stages are responsible for delivering the data to a destination. Inputs and outputs support codecs, allowing for data encoding or decoding during entry or exit from the pipeline without requiring a separate filter.

Plan: The configuration and topology of an Elasticsearch or Kibana cluster, including aspects like capacity, availability, and Elasticsearch version, are defined by a plan. When switching to a new plan, the constructor determines the necessary steps to transform the current cluster into the desired state.

Plugin Manager: The plugin manager, accessible through the script  bin/logstash-plugin, manages the lifecycle of plugins in a Logstash deployment. It provides a Command Line Interface (CLI) to install, remove, and upgrade plugins.

Plugin: A plugin for Logstash is a standalone software package that performs one of the stages in the event processing pipeline. The available plugin types are input, output, codec, and filter plugins, all implemented as Ruby gems and hosted on RubyGems.org. The event processing pipeline stages are configured by specifying the appropriate plugins.

Primary Shard: A Lucene instance that stores some or all data for an index is known as a primary shard. When a document is indexed in Elasticsearch, it is added to the primary shard before being replicated to replica shards.

Proxy: A TLS-enabled, highly available proxy layer that routes incoming user requests to the appropriate cluster nodes based on the cluster IDs specified in the request URLs. This layer ensures that requests are directed to the correct nodes within the container.

Q

Query Profiler: A tool that allows you to examine and analyze search queries for diagnosing and troubleshooting queries that are not performing well.

Query: A query represents a request for information about specific data, which can be considered as a question formulated in a way that Elasticsearch can understand and process.

R

Real user monitoring (RUM): Monitoring the performance, tracking metrics, and catching errors in web applications.

Recovery: The operation of synchronizing a replica shard with a primary shard is known as replica shard syncing. Once the syncing process is complete, the replica shard becomes available for search queries.

Reindex: Transfers documents from a specified source to a designated destination, where both the source and destination can be a data stream, index, or alias.

Remote Cluster: A distinct cluster, typically located in a different data center or region, that maintains replicas or searchable indices for the local cluster. This remote cluster is connected unidirectionally to the local cluster.

Replica Shard: A replica shard is an identical copy of the primary shard that can enhance search performance and resiliency by distributing data across multiple nodes, thereby increasing redundancy.

Roles Token: A roles token facilitates a host's integration into an existing Elastic Cloud Enterprise installation, granting permission for the host to assume specific roles, such as the allocator role. This token enhances security by ensuring that only authorized hosts can join the installation, thereby preventing unauthorized access.

Rollover: A rollover operation triggers the creation of a new write index once the current one meets a predefined size, document count, or age limit. This operation can be applied to a data stream or an alias that includes a write index.

Rollup Index: A rollup index is a particular kind of index designed for storing historical data with reduced granularity. This index is populated by a rollup job that summarizes and indexes documents from the original index into the rollup index.

Rollup Job: A rollup job is a continuous background task that regularly summarizes documents in an index and indexes the resulting summaries into a separate rollup index. The configuration of the job determines which data is summarized and the frequency of these summarization operations.

Rollup: Condenses detailed data into a more compact form to ensure cost-effective access to historical data.

Routing: The process of transmitting and receiving data from a specific primary shard is referred to as shard interaction. Elasticsearch employs a hashed routing value to select the shard, and you can include a routing value in indexing and search requests to leverage caching.

Rule: A collection of conditions, schedules, and actions that facilitate the sending of notifications.

Rules: An all-encompassing perspective of your alerting rules, providing the ability to access and manage rules for all Kibana applications from a single location.

Runner: A local control agent that operates on every host, responsible for deploying local containers based on role definitions. This agent ensures that assigned containers are present and functioning, and creates or recreates them if required.

Runtime Field: A field that undergoes evaluation during query time, accessible through the search API like any other field, with Elasticsearch treating runtime fields no differently.

S

Saved Object: An embodiment of a dashboard, visualization, map, data view, or Canvas workpad, which can be saved and reloaded for later use.

Saved Search: The search query, filters, and time filter, collectively saved for future retrieval and reuse.

Scripted Field: A field that dynamically calculates data based on information stored in Elasticsearch indices, displayed in Discover and utilized in visualizations.

Search Session: A collection of one or more queries that are run asynchronously, with the session results stored temporarily for later retrieval. Search sessions are specific to individual users.

Search Template: A saved search that can be executed with varying parameters.

Searchable Snapshot Index: An index where data is stored within a snapshot. Searchable snapshot indices do not require replica shards for resilience as their data is securely stored externally to the cluster.

Searchable Snapshot: A snapshot of an index configured as a searchable snapshot index, allowing you to search it just like a standard index.

Segment: A data file within a shard's Lucene instance, which Elasticsearch manages automatically.

Services Forwarder: Handles internal data routing within an Elastic Cloud Enterprise deployment.

Shard: A Lucene instance that holds some or all data for an index, with Elasticsearch handling their creation and management automatically. There are two types of shards within these instances: primary and replica.

Shareable: A Canvas workpad that can be integrated into any webpage, allowing you to showcase Canvas visualizations on internal wiki pages or public websites.

Shipper: A Logstash instance that forwards events to another Logstash instance or a different application.

Shrink: Decreases the count of primary shards in an index.

Snapshot Lifecycle Policy: Determines the frequency of automatic cluster backups and the duration for which the resulting snapshots are preserved.

Snapshot Repository: The destination where snapshots are saved, which can be a shared filesystem or a remote repository like Azure or Google Cloud Storage.

Snapshot: A backup captured while the cluster is operational. Snapshots can encompass the entire cluster or specific data streams and indices.

Solution: In Elastic Cloud, deployments are available with specialized templates that come pre-configured with sensible defaults and settings tailored for frequent use cases.

Source Field: The initial JSON object supplied during the indexing process.

Space: A space for categorizing dashboards, visualizations, and other saved objects, such as separating them by team, use case, or individual.

Span: Details regarding the execution of a particular code path. Spans track the duration from the beginning to the end of an activity and can be interconnected in a parent/child relationship with other spans.

Split: Increases the number of primary shards in an index.

Stack Rule: The versatile rule types that Kibana offers by default.

Standalone: This mode enables manual configuration and local management of Elastic Agents directly on the systems where they are installed.

Stunnel: Safely encapsulates all traffic within an Elastic Cloud Enterprise deployment.

System Index: An index that holds configurations and internal data utilized by the Elastic Stack. System indices are denoted by names starting with a dot (.), like .security. It is advised not to directly modify or access system indices.

T

Tag: A tag or keyword that you apply to Kibana saved items like dashboards and visualizations, enabling you to categorize them in a manner that is personally significant. Tags simplify content management for you.

Term Join: A common key that merges vector features with the outcomes of an Elasticsearch terms aggregation. Term joins enhance vector features with attributes for data-driven styling and detailed tooltip content in maps.

Text: Unstructured content, like a product description or log message, is typically analyzed to enhance search capabilities.

Time Filter: A Kibana feature that restricts search results to a specific time interval.

Time Series Data Stream: A data stream type tailored for indexing metrics time-series data. A TSDS reduces storage size and facilitates efficient consideration of a sequence of metrics data points as a cohesive unit.

Time Series Data: A sequence of data points, including logs, metrics, and events, that are organized in chronological order. Time-series data can be indexed in a data stream, which enables access as a single named resource with data distributed across multiple backing indices. A time-series data stream is optimized for indexing metrics data.

Timelion: A tool for constructing a time-series visualization that examines data sequentially based on time.

Token: A segment of unstructured text that has been optimized for search. Typically, tokens correspond to individual words. Tokens are also referred to as terms.

Tokenization: The procedure of segmenting unstructured text into smaller, searchable units known as tokens.

Trace: Specifies the duration an application dedicates to a request. Traces consist of a group of transactions and spans that share a common origin.

Tracks: A layer category in the Maps application. This layer transforms a sequence of point locations into a line, usually symbolizing a path or route.

Trained Model: A machine learning model trained and tested on a labeled dataset, usable in an ingest pipeline or pipeline aggregation for classification, regression analysis, or natural language processing on new data.

Transaction: A unique type of span with additional associated attributes. Transactions represent an event recorded by an Elastic APM agent that is monitoring a service.

TSVB: A visualizer for time-series data that enables the combination of an unlimited number of aggregations to display intricate data.

U

Upgrade Assistant: A utility designed to assist with preparing for an upgrade to the subsequent major version of Elasticsearch. The tool identifies deprecated settings in your cluster and indices, guiding you through addressing issues, including reindexing.

Uptime: A reliability metric for monitoring the status of network endpoints via HTTP/S, TCP, and ICMP, used in system surveillance.

V

vCPU: vCPU represents a virtual central processing unit. In Elastic Cloud, vCPUs are virtual compute units allocated to your nodes. The quantity is contingent on the instance size and hardware profile. The instance may qualify for vCPU boosting based on its size.

Vector Data: Map elements comprising points, lines, and polygons.

Vega: A declarative language employed to construct interactive visualizations.

Visualization: A visual depiction of query outcomes in Kibana, such as a histogram, line graph, pie chart, or heat map.

W

Warm Phase:The second potential phase in the index lifecycle. During the warm phase, an index is typically optimized for search and no longer undergoes updates.

Warm Tier: A data tier consisting of nodes that store time-series data that is accessed less frequently and seldom requires updates.

Watcher: The initial set of alerting features.

Web Map Service (WMS): A layer category in the Maps application. Include a WMS source to add authoritative geographic context to your map.

Worker: Logstash's filter thread model, where each worker receives an event, applies all filters sequentially, and then forwards the event to the output queue. This design facilitates scalability across CPUs, as many filters are CPU-intensive.

Workpad: A workspace where you create visual presentations of your real-time data using Canvas.

Z

ZooKeeper: A coordination service for distributed systems employed by Elastic Cloud Enterprise to maintain the installation's state. It handles host discovery, resource allocation, reestablishing leadership following failures, and delivering high-priority notifications.

Guide to Install Elasticsearch

Ready to dive into using Elasticsearch? Ensure you have it installed correctly by following these detailed installation guides tailored to different distros.

With these installation guides, you'll be up and running with Elasticsearch in no time!