Discovering Your Data

Use the optional bucket discovery feature to refresh and list the data sources in your buckets.

If your storage bucket contains different types of content, or if you want to scan the content stored in the bucket, you can use the optional Discover Bucket feature to display information about its contents.

In many cases, bucket discovery is not needed. You can proceed to the object group setup for the files that you plan to index and search. For a comprehensive list of data/log formats, see Acceptable Data Formats.


Bucket discovery is not required before creating object groups. Discovery is a lens into your Cloud object storage bucket.

Discover Bucket

To discover a bucket:

  1. Filter for and/or select the desired bucket.
  2. Click Discover Bucket.

The Discovering message appears while the process is running.



The duration of the discovery process depends on the size and volume of your bucket data.

Viewing Aggregated Data

After the discovery process is complete, the system displays a comprehensive bucket content report showing a high-level aggregate view of the data. This report provides general file information including:

General Bucket Information:

  • Total Number of Files
  • Total File Size
  • Bucket Creation Date

File type distribution:

  • File Type
  • Total Size
  • Average Age of the File
  • Number of Files
  • Approximate Number of Duplicates
  • File Type Distribution Pie Chart

Review the summary information to learn about file contents and types, storage sizes and file ages, and duplicate files count. The page also displays other information like events (conditions), indexes, and partitions. The information can help you to see if the bucket and its contents are as expected.

Hover over the file distribution chart for file type information:


Acceptable Data Formats

NameTable NameDescription
Common Access Logaccess_logThe default web access log format for servers like Apache
VMware vSphere Auto Deploy log formatautodeploy_logThe log format for the VMware Auto Deploy service
Generic Blockblock_logA generic format for logs, like cron, that have a date at the start of a block.
Candlepin log formatcandlepin_logLog format used by Candlepin registration system
Yum choose_repo Logchoose_repo_logThe log format for the yum choose_repo tool
CUPS log formatcups_logLog format used by the Common Unix Printing System
Dpkg Logdpkg_logThe debian dpkg log
Amazon ELB logelb_logLog format for Amazon Elastic Load Balancers
engine logengine_logThe log format for the engine.log files from RHEV/oVirt
Common Error Logerror_logThe default web error log format for servers like Apache
Fsck_hfs Logfsck_hfs_logLog for the fsck_hfs tool on Mac OS X
Glogglog_logThe google glog format
Java log formatjava_logLog format used by log4j and output by most java programs
Katello log formatkatello_logLog format used by katello and foreman as used in Satellite 6
OpenAM Logopenam_logThe OpenAM identity provider
OpenAM Debug Logopenamdb_logDebug logs for the OpenAM identity provider
OpenStack log formatopenstack_logThe log format for the OpenStack log files
CUPS Page Logpage_logThe CUPS server log of printed pages
Papertrail Servicepapertrail_logLog format for the papertrail log management service
SnapLogic Server Logsnaplogic_logThe SnapLogic server log format
SSSD log formatsssd_logLog format used by the System Security Services Daemon
Stracestrace_logThe strace output format
sudosudo_logThe sudo privilege management tool
Syslogsyslog_logThe system logger format found on most posix systems
TCF Logtcf_logTarget Communication Framework log
TCSH Historytcsh_historyThe tcsh history file format
Uwsgi Loguwsgi_logThe uwsgi log format
Vdsm Logsvdsm_logVdsm log format
VMKernel Logsvmk_logThe VMKernel’s log format
VMware Logsvmw_logOne of the log formats used in VMware’s ESXi and vCenter software
RHN server XMLRPC log formatxmlrpc_logGenerated by Satellite’s XMLRPC component