Discovering Your Data

Use the optional bucket discovery feature to refresh and list the data sources in your buckets.

If your storage bucket contains different types of content, or if you want to scan the content stored in the bucket, you can use the optional Discover Bucket feature to display information about its contents.

In many cases, bucket discovery is not needed. You can proceed to the object group setup for the files that you plan to index and search. For a comprehensive list of data/log formats, see Acceptable Data Formats.

📘

Bucket discovery is not required before creating object groups. Discovery is a lens into your Cloud object storage bucket.

Discover Bucket

To discover a bucket:

  1. Filter for and/or select the desired bucket.
  2. Click Discover Bucket.

The Discovering message appears while the process is running.

942
📘

The duration of the discovery process depends on the size and volume of your bucket data.

Viewing Aggregated Data

After the discovery process is complete, the system displays a comprehensive bucket content report showing a high-level aggregate view of the data. This report provides general file information including:

General Bucket Information:

  • Total Number of Files
  • Total File Size
  • Bucket Creation Date

File type distribution:

  • File Type
  • Total Size
  • Average Age of the File
  • Number of Files
  • Approximate Number of Duplicates
  • File Type Distribution Pie Chart

Review the summary information to learn about file contents and types, storage sizes and file ages, and duplicate files count. The page also displays other information like events (conditions), indexes, and partitions. The information can help you to see if the bucket and its contents are as expected.

Hover over the file distribution chart for file type information:

Acceptable Data Formats

Name

Table Name

Description

Common Access Log

access_log

The default web access log format for servers like Apache

VMware vSphere Auto Deploy log format

autodeploy_log

The log format for the VMware Auto Deploy service

Generic Block

block_log

A generic format for logs, like cron, that have a date at the start of a block.

Candlepin log format

candlepin_log

Log format used by Candlepin registration system

Yum choose_repo Log

choose_repo_log

The log format for the yum choose_repo tool

CUPS log format

cups_log

Log format used by the Common Unix Printing System

Dpkg Log

dpkg_log

The debian dpkg log

Amazon ELB log

elb_log

Log format for Amazon Elastic Load Balancers

engine log

engine_log

The log format for the engine.log files from RHEV/oVirt

Common Error Log

error_log

The default web error log format for servers like Apache

Fsck_hfs Log

fsck_hfs_log

Log for the fsck_hfs tool on Mac OS X

Glog

glog_log

The google glog format

Java log format

java_log

Log format used by log4j and output by most java programs

Katello log format

katello_log

Log format used by katello and foreman as used in Satellite 6

OpenAM Log

openam_log

The OpenAM identity provider

OpenAM Debug Log

openamdb_log

Debug logs for the OpenAM identity provider

OpenStack log format

openstack_log

The log format for the OpenStack log files

CUPS Page Log

page_log

The CUPS server log of printed pages

Papertrail Service

papertrail_log

Log format for the papertrail log management service

SnapLogic Server Log

snaplogic_log

The SnapLogic server log format

SSSD log format

sssd_log

Log format used by the System Security Services Daemon

Strace

strace_log

The strace output format

sudo

sudo_log

The sudo privilege management tool

Syslog

syslog_log

The system logger format found on most posix systems

TCF Log

tcf_log

Target Communication Framework log

TCSH History

tcsh_history

The tcsh history file format

Uwsgi Log

uwsgi_log

The uwsgi log format

Vdsm Logs

vdsm_log

Vdsm log format

VMKernel Logs

vmk_log

The VMKernel’s log format

VMware Logs

vmw_log

One of the log formats used in VMware’s ESXi and vCenter software

RHN server XMLRPC log format

xmlrpc_log

Generated by Satellite’s XMLRPC component