Discovering Your Data

Quickly catalog and report on multiple data sources

Use the Data Discovery feature to quickly catalog and report on a variety of data sources in your buckets. Take a look at our Acceptable Data Formats for a comprehensive list of data/log formats.

📘

Bucket discovery is not required before creating Object Groups. The Discover Bucket functionality is a lens into your S3 bucket.

Discover Bucket

  1. Select the desired bucket
  2. Click Discover Bucket

📘

The duration of the discovery process will depend on the size and volume of your object data.

Viewing Aggregated Data

After the discovery process is complete, a comprehensive bucket content report showing a high-level aggregate view of the data is displayed. This report shows general file information including:

General Bucket Information:

  •   Total Number of Files
    
  •   Total File Size
    
  •   Bucket Creation Date
    

File type distribution:

  •   File Type 
    
  •   Total Size 
    
  •   Average Age of the Tile 
    
  •   Number of Files
    
  •   Number of Duplicates
    
  •   File Type Distribution Pie Chart 
    

For each file type that contains duplicates, click the count to display specific details:

Hover over the file distribution chart for file type information:

Security Statistics and Quantities:

  • Public objects (Objects that have been made public through AWS)
  • Policies allowed (If any of the bucket policies has been configured to grant access)
  • Bucket policies (Whether your S3 bucket has any policies)

For each security reporting item, click the count to display specific details:

The discovery process also identifies any trending prefixes across all data.

Acceptable Log Formats

NameTable NameDescription
Common Access Logaccess_logThe default web access log format for servers like Apache
VMware vSphere Auto Deploy log formatautodeploy_logThe log format for the VMware Auto Deploy service
Generic Blockblock_logA generic format for logs, like cron, that have a date at the start of a block.
Candlepin log formatcandlepin_logLog format used by Candlepin registration system
Yum choose_repo Logchoose_repo_logThe log format for the yum choose_repo tool
CUPS log formatcups_logLog format used by the Common Unix Printing System
Dpkg Logdpkg_logThe debian dpkg log
Amazon ELB logelb_logLog format for Amazon Elastic Load Balancers
engine logengine_logThe log format for the engine.log files from RHEV/oVirt
Common Error Logerror_logThe default web error log format for servers like Apache
Fsck_hfs Logfsck_hfs_logLog for the fsck_hfs tool on Mac OS X
Glogglog_logThe google glog format
Java log formatjava_logLog format used by log4j and output by most java programs
Katello log formatkatello_logLog format used by katello and foreman as used in Satellite 6
OpenAM Logopenam_logThe OpenAM identity provider
OpenAM Debug Logopenamdb_logDebug logs for the OpenAM identity provider
OpenStack log formatopenstack_logThe log format for the OpenStack log files
CUPS Page Logpage_logThe CUPS server log of printed pages
Papertrail Servicepapertrail_logLog format for the papertrail log management service
SnapLogic Server Logsnaplogic_logThe SnapLogic server log format
SSSD log formatsssd_logLog format used by the System Security Services Daemon
Stracestrace_logThe strace output format
sudosudo_logThe sudo privilege management tool
Syslogsyslog_logThe system logger format found on most posix systems
TCF Logtcf_logTarget Communication Framework log
TCSH Historytcsh_historyThe tcsh history file format
Uwsgi Loguwsgi_logThe uwsgi log format
Vdsm Logsvdsm_logVdsm log format
VMKernel Logsvmk_logThe VMKernel’s log format
VMware Logsvmw_logOne of the log formats used in VMware’s ESXi and vCenter software
RHN server XMLRPC log formatxmlrpc_logGenerated by Satellite’s XMLRPC component

Updated 7 months ago


Discovering Your Data


Quickly catalog and report on multiple data sources

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.