Discovering Your Data

Quickly catalog and report on multiple data sources

Use the Data Discovery feature to quickly catalog and report on a variety of data sources in your buckets. Take a look at our Acceptable Data Formats for a comprehensive list of data/log formats.

📘

Bucket discovery is not required before creating Object Groups. The Discover Bucket functionality is a lens into your S3 bucket.

Discover Bucket

  1. Select the desired bucket
  2. Click Discover Bucket

📘

The duration of the discovery process will depend on the size and volume of your object data.

Viewing Aggregated Data

After the discovery process is complete, a comprehensive bucket content report showing a high-level aggregate view of the data is displayed. This report shows general file information including:

General Bucket Information:

  •   Total Number of Files
    
  •   Total File Size
    
  •   Bucket Creation Date
    

File type distribution:

  •   File Type 
    
  •   Total Size 
    
  •   Average Age of the Tile 
    
  •   Number of Files
    
  •   Number of Duplicates
    
  •   File Type Distribution Pie Chart 
    

For each file type that contains duplicates, click the count to display specific details:

Hover over the file distribution chart for file type information:

Security Statistics and Quantities:

  • Public objects (Objects that have been made public through AWS)
  • Policies allowed (If any of the bucket policies has been configured to grant access)
  • Bucket policies (Whether your S3 bucket has any policies)

For each security reporting item, click the count to display specific details:

The discovery process also identifies any trending prefixes across all data.

Acceptable Log Formats

Name

Table Name

Description

Common Access Log

access_log

The default web access log format for servers like Apache

VMware vSphere Auto Deploy log format

autodeploy_log

The log format for the VMware Auto Deploy service

Generic Block

block_log

A generic format for logs, like cron, that have a date at the start of a block.

Candlepin log format

candlepin_log

Log format used by Candlepin registration system

Yum choose_repo Log

choose_repo_log

The log format for the yum choose_repo tool

CUPS log format

cups_log

Log format used by the Common Unix Printing System

Dpkg Log

dpkg_log

The debian dpkg log

Amazon ELB log

elb_log

Log format for Amazon Elastic Load Balancers

engine log

engine_log

The log format for the engine.log files from RHEV/oVirt

Common Error Log

error_log

The default web error log format for servers like Apache

Fsck_hfs Log

fsck_hfs_log

Log for the fsck_hfs tool on Mac OS X

Glog

glog_log

The google glog format

Java log format

java_log

Log format used by log4j and output by most java programs

Katello log format

katello_log

Log format used by katello and foreman as used in Satellite 6

OpenAM Log

openam_log

The OpenAM identity provider

OpenAM Debug Log

openamdb_log

Debug logs for the OpenAM identity provider

OpenStack log format

openstack_log

The log format for the OpenStack log files

CUPS Page Log

page_log

The CUPS server log of printed pages

Papertrail Service

papertrail_log

Log format for the papertrail log management service

SnapLogic Server Log

snaplogic_log

The SnapLogic server log format

SSSD log format

sssd_log

Log format used by the System Security Services Daemon

Strace

strace_log

The strace output format

sudo

sudo_log

The sudo privilege management tool

Syslog

syslog_log

The system logger format found on most posix systems

TCF Log

tcf_log

Target Communication Framework log

TCSH History

tcsh_history

The tcsh history file format

Uwsgi Log

uwsgi_log

The uwsgi log format

Vdsm Logs

vdsm_log

Vdsm log format

VMKernel Logs

vmk_log

The VMKernel’s log format

VMware Logs

vmw_log

One of the log formats used in VMware’s ESXi and vCenter software

RHN server XMLRPC log format

xmlrpc_log

Generated by Satellite’s XMLRPC component

Updated 11 months ago


Discovering Your Data


Quickly catalog and report on multiple data sources

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.