PDF Analysis System
Stand-Alone Version
Table of Contents
PDF
Analysis System – Stand Alone Installation
MiniSeed
Data Directory and Files
Source
Code Header File Modification
System
Compilation and Installation
Appendix
II – Installation and Configuration Checklist
Following from the successful implementation of the NEIC PDF Analysis System, both at the IRIS DMC in Seattle, WA, and at the NEIC in Golden, CO, it is desired to provide the system to the seismic community at large in a stand-alone form. This, to allow other users of the seismic community the opportunity to perform their own analyses against data sets not being held by either of these two data centers.
If the time required to read and understand this document in totality is unavailable (but you’d like it installed and working as quickly as possible), you must at least read the sections System Configuration, System Compilation and Installation, and PDF System Execution and follow the steps outlined there. Where problems occur, be certain to consult the other various sections of this document that may provide further information to solve your issue.
This document is intended for all users of the siesmic community with an interest in producing noise analyses of seismic data following the algorithm laid out by Buland and McNamara of the USGS NEIC of Golden, Co.
This document is limited to the technical aspects of the PDF Analysis System: installation, configuration and execution. As such, it does not treat the functional aspects of the system: methodologies, algorithm, philosophy, interpretation possibilities, usage of results, etc. For a complete discussion of these and more, please consult the various documents listed below.
The following parties have significantly contributed to the development of this system and are hereby acknowledged thus:
|
Party |
Contribution |
|
Ray Buland and Daniel McNamara (of USGS NEIC, Golden, CO) |
Provided the original algorithm and proof-of-concept implementation |
|
USGS NEIC |
Provided the funds for original development of algorithm. |
|
NSF |
Provided funds for original system development of generic
implementation at the IRIS Data Management Center (Seattle, WA). |
|
IRIS |
Sponsored the development of the generic implementation at the DMC. |
|
Bruce Weertman (of IRIS) |
Responsible for integration of the PDF system within the IRIS DMC’s
Quack framework. |
Both this document and the PDF analysis system itself was written by Richard Boaz (of Boaz Consultancy: http://www.boazconsultancy.com). Any and all comments and/or bug reports are welcome and are encouraged to be forwarded to riboaz@xs4all.nl.
The following table provides various references which may be of interest to the reader:
|
Description |
Name |
Location |
|
Original Abstract (Adobe pdf) |
Ambient Noise Levels in the Continental United States |
PDF Stand-Alone distribution docs directory |
|
Power Point Presentation |
Noise Based Detection Method for the ANSS |
PDF Stand-Alone distribution docs directory |
|
Discussion Paper (Adobe pdf) |
Determining True Global Ambient Noise |
PDF Stand-Alone distribution docs directory |
|
PDF Analysis Interpretation (html document) |
Ambient Noise Probability Density
Functions |
PDF Stand-Alone distribution docs directory |
|
PDF Analyses at the USGS NEIC |
USGS/ANSS Noise Monitor |
http://geohazards.cr.usgs.gov/staffweb/mcnamara/ |
|
PDF Analyses at the IRIS DMC |
DMC QUACK Information Query |
|
|
PDF Analyses at the IRIS DMC (US Array) |
DMC QUACK Information Query |
A new system for analyzing data quality is now available to the seismology community allowing users to evaluate the long-term seismic noise levels for any broadband seismic data channel. The new noise processing software uses a probability density function (PDF) to display the distribution of seismic power spectral density (PSD) and has been implemented against the entire continuous data-stream held by IRIS at the DMC.
This noise processing system is unique in that there is no need to screen the data for earthquakes, system glitches or general data artifacts, as is commonly done in seismic noise analysis. Instead with this new analysis, system transients map into a low-level background probability while ambient noise conditions reveal themselves as high probability occurrances. In fact, examination of artifacts related to station operation and episodic cultural noise allows us to estimate both the overall station quality and a baseline level of earth noise at each site.
PDF noise plots are useful for characterizing the current and past performance of existing broadband sensors, for detecting operational problems within the recording system, and for evaluating the overall quality of data for a particular station. The advantages of this new approach include:
Please see the document Ambient Noise Probability Density Functions for a more detailed discussion of the PDF plots themselves and their interpretation possibilities.
Quite simply, the PDF Analysis system is comprised of three separate processing components:
Execution is provided in the form of a shell script per channel to analyze, responsible for calling each of these components in turn (see section PDF System Execution below for details).
No specific hardware requirements exist, per se. The program will execute on any platform supporting a C compiler in addition to the other software requirements listed below.
Depending on which compile time output option is chosen (please see section System Compilation and Installation: Compilation: Compile Options for a complete discussion of these options and their effects), disk storage requirements are approximately the following (per channel analyzed):
Output Option |
Maximum Disk Storage
Requirement |
|
No Daily or Hourly .bin output |
5 Mb |
|
Only Daily .bin output |
15 Mb |
|
Both Daily and Hourly .bin output |
50 Mb |
The following table defines the software dependencies currently en force:
|
Software |
Version |
Description |
Available At |
|
C Compiler |
User preference |
Compiler (program developed under gcc) |
|
|
Scripting |
Bash shell |
Scripting tools |
Local machine executing analysis |
|
GMT |
Latest available |
Plotting tool |
|
|
ImageMagick |
Latest available |
Image manipulation tool |
The following table defines the directories and files which make up the source tree of the PDF Analysis System:
|
Directory/File |
Description |
|
PDF |
Root system directory |
|
PDF/PROD |
Production directory
containing production relevant files |
|
PDF/PROD/bin |
Directory containing
shells and executables, |
|
PDF/PROD/script |
Directory containing
executable scripts |
|
PDF/PROD/support |
Directory containing
all necessary production support files |
|
PDF/PROD/helper |
Directory containing
helper scripts (system mgmt, etc.) |
|
PDF/src |
Directory holding all source
code: ·
PDF analysis
program ·
GMT plotting
script ·
Execution
scripts |
|
PDF/src/vx.x.x |
Directory holding all
source code for version x.x.x
of the system |
|
PDF/src/ vx.x.x/analysis |
Source code directory
of analysis program (C code) |
|
PDF/src/ vx.x.x/analysis/mseed |
Miniseed data file
reader source code - as library to main() |
|
PDF/src/ vx.x.x/analysis/resp |
Instrument response
interpreter source code - as library to main() |
|
PDF/src/ vx.x.x/GMT
|
Directory holding GMT source
code - scripts and support files |
|
PDF/src/ vx.x.x/script |
Directory holding
execution scripts |
The following table defines the directories and files which are required input, see sections System Configuration and PDF System Execution for full description of specification and use:
|
Directory/File |
Description |
|
Data Directory |
Directory holding the miniseed data files to be analyzed. N.B. All data files requiring analysis as part of a single execution must
reside in this single directory. |
|
Miniseed Data Files |
Files to analyze, miniseed format only. |
|
Analysis Directory |
Directory holding all output files.
See section Output for a
complete description. |
|
Response File Directory |
Directory holding response files for channels being analyzed. See section System
Configuration for a complete discussion on set-up. |
|
RESP.NTW.STN.LOC.CHN |
The file holding the response information for the instrument and
channel. This must be formatted as for
input to the evresp() function (format as produced by the rdseed program). Where: NTW is the network name |
The following table defines the directories and files which are created in the course of the PDF Analysis execution. All are located as subdirectories to the analysis directory defined in the section Input above and are automatically created in the course of execution. Please consult Appendix I – File Formats for a detailed description of their contents.
|
Directory/File |
Description |
|
NTW.STN.LOC.CHN.png |
Graphical representation of analysis. |
|
Yyyyy |
Directory holding daily PSD .bin files, by year Where: |
|
Yyyyy/HOUR |
Directory holding the hourly PSD .bin files |
|
LOG |
Directory holding the various log files created during the course of
execution. |
|
wrk |
Directory holding various work files |
|
Yyyyy/Djjj.bin |
Files holding individual day's PSD analysis results (currently
unused, for future use). Where: |
|
Yyyyy/HOUR/hour.idx |
Index file to Hjjj.bin
file. |
|
Yyyyy/HOUR/Hjjj.bin |
Files holding individual hour's PSD analysis results (currently
unused, for future use) |
|
LOG/NTW.STN.LOC.CHN.log |
Log file of analysis program. |
|
LOG/plotGMT.log |
Log file of GMT plotting program. |
|
LOG/convert.log |
Log file of ImageMagick convert program, nothing output for normal
execution. |
|
LOG/NTW.STN.LOC.CHN.yyyy.jjj.err |
Analysis program error file, by year and julian day |
|
LOG/PDFanalysis.skp |
File listing those days when problems occurred, information only |
|
wrk/PDFanalysis.bin |
Cumulative dB-based .bin file, results to graph are contained here |
|
wrk/PDFanalysis.inf |
Information file holding various analysis settings |
|
wrk/PDFanalysisSR.bin |
Cumulative period-based .bin file. Where: |
|
wrk/PDFanalysisSR.inf |
Information file as before, sample-rate specific |
|
wrk/PDFanalysis.sts |
File holding various statistics for analysis results, input to GMT |
|
wrk/PDFanalysis.ps |
GMT postscript file output, deleted upon conversion to .png file. |
|
wrk/pdf.grd |
GMT temp file, deleted upon completion of GMT step. |
Configuration of the PDF system amounts to the setup of various directories and script variables. This section lays out these requirements for the PDF system setup to result in a successful installation and subsequent execution. Failure to define these precisely as described herein will result in a non-functioning system.
Appendix II provides a checklist for each parameter and variable which must be defined as part of system setup. Please print, define the values accordingly and supply them in their proper location.
A directory is required to contain the miniseed data files to be analyzed. This directory and the miniseed data files themselves must adhere to the following:
DATAROOT/NTW/STN
Where
DATAROOT is the root directory of the miniseed data files (script variable of such specified below)
NTW is the name of the network
STN is the name of the station
STN. NTW.LOC.CHN.yyyy.jjj
Where
STN is the name of the station
NTW is the name of the network
LOC is the location identifier
CHN is the channel identifier
yyyy is the year of the data file
jjj is the julian day of the data file
(N.B. Where no location identifier exists, field should be null. This would render, for example, a filename for station ATKA and network AK as: ATKA.AK..BHE.2004.261)
Assuming your directory structure and miniseed data files do not naturally conform to these requirements, this directory structure and filenaming convention can be easily accommodated for through the following:
Further, this can be automated via a script rendering this requirement as trivial as possible. Please see the script linkMseed.US (located in pdf/PROD/helper/linkmseed) for an example and modify as necessary.
N.B. Because the filenaming convention uniquely identifies the channel of data, this directory may contain all miniseed data file for all channels of a station, i.e., it is not necessary to create separate directories for each channel, rather, a separate directory only for the station itself and containing the miniseed data files for all channels.
A directory must exist containing all response files used to deconvolve the signal back to absolute ground motion in the course of analysis. This directory and the files themselves must adhere to the following:
RESP.NTW.STN.LOC.CHN
Where
RESP is exactly as specified: RESP
NTW is the name of the network
STN is the name of the station
LOC is the location identifier
CHN is the channel identifier
(N.B. This naming convention follows from the response file output generated by the rdseed program. And as before, where no location identifier exists, field should be null.)
A directory must be created used to collect all .png files created during the course of execution. The .png files themselves are contained in the analysis directory for the channel being analyzed, making collective viewing annoying since all are held within disparate directories. This annoyance is alleviated through the existence of this directory.
Create a directory for these to be contained in and define this location in the .vars-user file. With this directory, the last action in the course of analysis is for a softlink to be created in this directory pointing to the .png file found in the analysis directory.
Additionally, if it is desired to publish the results, it is this directory that can be made available to the web in whatever manner/means appropriate.
The following script variables are installation-specific and must be pre-defined by the user and provided in the shell script file .vars-user (located in directory PDF) before the system can be installed. Failure to do so will result in a non-functioning system.
|
Script Variable Name |
Description |
|
PDFROOT |
Root directory of
PDF analysis system (directory holding the .vars-user file) |
|
WEBDIR |
Directory of
collected .png files |
|
RESPDIR |
Directory holding
response files |
|
DATAROOT |
Root directory of
miniseed data files (parent to NTW/STN) |
|
STATSROOT |
Root directory of
PDF analysis results/statistics |
|
GMTROOT |
GMT installation
root directory |
|
IMROOT |
ImageMagick
installation root directory |
The sole configuration requirement within the source code of the PDF analysis program is the following #define parameter to be specified: (N.B. needless to say, this must be the same as defined as part of the Script Variables above.)
|
#define parameter |
Description |
Location |
|
#define RESPDIR |
Directory holding
response files, inside quotes “”. |
PDF/src/vx.x.x/analysis/PDFuser.h |
Compilation of the PDF Analysis program employs straightforward C/Unix standards, i.e, a C compiler and make. In addition to the analysis portion of the program, there are two subdirectories of libraries requiring compilation as well. As such, a script is provided that will traverse each of these subdirectories, making the dependent libraries in turn. This script is located in the PDF Analysis source directory and conforms to the following invocation specifications:
makesh [ clean | all ]
where
clean will execute make clean, removing all dependant libraries and object files.
all will execute make all in each directory of the PDF analysis program, creating all dependant libraries and object files necessary to ultimately link the PDF analysis executable.
Alternatively, the program may be built when installing the system as a whole, alleviating the need to compile and link by hand. Please see the section Installation below for details.
Two compile-time options exist for the PDF Analysis program. Namely, defining whether or not daily and/or hourly PSD information is output. (Please see section Requirements: Hardware for detailed overall disk storage requirements.)
With daily PSD information output, cumulative .bin files are generated for each day analysed (amounting to ~30Kb/day/channel analysed).
With hourly PSD information output, .bin files are generated for each hour analysed (amounting to ~100Kb/day/channel analysed).
The system is delivered, by default, to output both daily and hourly .bin files. Output of this data is anticipated to be used in future versions of the software, such that existence of these files will allow PDFs to be produced for specific user-defined time periods. For example, a PDF graph representing only the months of January thru March; or a PDF graph representing all months but only between the hours of 6AM and 6PM.
If it is anticipated that these more specific sorts of analyses will be of interest, no action is required, both daily and hourly .bin files will be generated.
If this is not desired, or disk space is an issue, both daily and hourly .bin file generation may be suppressed. The following table defines these compile time options:
|
Compiler Option |
Effect |
|
-DNO_DAILY_PSD |
No daily PSD .bin files output |
|
-DNO_HOURLY_PSD |
No hourly PSD .bin files output |
Either or both (they are mutually independent) of these options can be specified in the CFLAGS section of the Makefile for the main PDF Analysis program. The resulting executable will subsequently NOT output incremental PSD information.
N.B. Since these are compiler options, these settings have a system-wide influence, i.e., these options cannot be implemented on a per channel basis. (One way around this, however, would be to install more than a single system.)
Installation is provided via the shell script installPDF located in the directory PDF and provides for the following functionality:
And conforms to the following usages:
command: installPDF
–h
output: Usage: install [-h] [make] version#
description: prints the usage for the command.
command: installPDF
v1.1
output: Copying v1.1 executables
and support files to PROD dir…
description: installs all relevant executables, scripts and support files for version# to the PROD directory structure. (N.B. command line argument version# must be as specified in the PDF source directory structure.)
command: installPDF make v1.1
output: Compiling v1.1
PDF Analysis program…
Copying v1.1 executables and support files to PROD
dir…
description: as for command installPDF
v1.1, however, cleans and compiles the PDF analysis program before copying
and installing to the PROD directory structure (recommended for first install
since no object files exist as part of delivery).
System execution comes in the form of two scripts, executePDF and PDFscript (both located in PDF/PROD/bin).
PDFscript is a shell script template used to create the individual channel-specific execution script, this script ultimately responsible for the PDF analysis execution of a specific channel.
The executePDF
script executes all channel-specific scripts located in PDF/PROD/script in
turn.
Execution of the PDF Analysis System is provided in the form of a shell executable script (PDFscript) responsible for carrying out the three steps of execution described in section PDF Analysis System Overview: Description. This execution script is created by replacing various strings in the generic file PDFscript with execution-specific values, thus creating a unique script for each channel to be analyzed.
This channel-specific executable shell can be easily created using the following shell script command:
makePDFscript NTW STN LOC CHN
where
NTW is
the network name
STN is
the station name
LOC is
the location identifier (use -- for no location)
CHN is
the channel identifier
The will create the channel-specific script to be executed, named NTW.STN.LOC.CHN.sh and saved to the directory PDF/PROD/script. In addition, the analysis directory will also be created if it does not exist (assuming the STATSROOT directory exists).
Once this script has been created for a specific channel, it can be simply repeatedly executed (daily, weekly, monthly, as desired) to update the analysis results.
Executing the script executePDF will result in all scripts located in PDF/PROD/script to be executed in turn. Specifically, it will execute all files having “.sh” as their filename suffix. Thus, individual analyses can be turned on and off simply by renaming the suffix of the executable script in question.
A simple logfile of executePDF, detailing the channels analyzed, is generated and written to the file PDF/PROD/LOG/PDF.log.
Further, this process may be automated using the UNIX crontab command. At specified times, merely execute the executePDF script and all analyses will be performed and updated.
Please note the following features of the PDF analysis system:
Error handling is very much dependent on the type of error encountered. The following table lists the major errors that may be encountered, how each is handled, and suggested follow-up action.
|
Error |
Type |
How Handled |
User Follow-up |
|
Response File not found |
Fatal |
Analysis execution suspended; error message written to Log file |
Provide proper response file, verify naming convention is adhered to. |
|
Response Information not found |
Fatal |
Analysis execution suspended; error message written to Log file |
Provide file containing response information for appropriate
date/time range, verify file format is adhered to. |
|
Error reading miniseed data file |
Non-fatal |
Analysis execution skips this day of data; error message written to
day-specific error file located in LOG directory of analysis output. |
Determine if miniseed data file can be repaired. |
|
Internal Processing Error |
Fatal |
Analysis execution suspended |
Contact riboaz@xs4all.nl with all
relevant information |
|
No files ever found to analyze |
Non-fatal |
Program successfully executes but nothing analyzed |
Verify miniseed directory structure and filenaming conventions are
adhered to. |
|
Unable to create or access directories |
Fatal |
Analysis execution suspended |
Confirm existence and access permissions of directory in question. |
The limitations are currently defined to be:
This section defines the current and historical releases of the PDF Analysis System.
Release Date: 29-October-2004
Modifications
This appendix defines the formats of the various files produced by the PDF analysis system.
Definition: Cumulative db-based .bin file holding daily PSD analyses for julian day jjj.
Directory: Yyyyy
Internal Format:
· Data Format: ASCII
· Individual lines each defining:
FREQ POWER #HITS
Where:
FREQ is the frequency (in Hz)
POWER is the power bin (in dB)
#HITS is the number of times
Definition: Index file to Hjjj.bin file, index defined by julian day and HH:MM start time of PSD.
Directory: Yyyyy/HOUR
Internal Format:
· Data Format: ASCII
· Individual lines each containing:
JDAY HH:MM REF
Where:
JDAY is the julian day
HH:MM is
the hour and minute start time of the PSD
REF is
the reference identifier, for accessing/extracting from the Hjjj.bin file
Definition: Cumulative db-based .bin file holding hourly PSD analyses for julian day jjj.
Directory: wrk
Internal Format:
· Data Format: ASCII
· Individual lines each containing:
REF FREQ POWER
Where:
REF is the
reference identifier from the hour.idx file
FREQ is
the frequency (in Hz)
POWER is
the power bin (in dB)
Definition: Cumulative db-based .bin file holding overall PSD probabilities for julian day jjj.
Directory: wrk
Internal Format:
· Data Format: ASCII
· Individual lines each defining:
FREQ POWER PROB
Where:
FREQ is the frequency (in seconds)
POWER is the power bin (in dB)
PROB is the normalized (to probability) number of hits
Definition: Information file containing various settings/values pertaining to analysis.
Directory: wrk
Internal Format:
· Data Format: ASCII
· Individual lines each defining:
VALUE :SETTING
· With the following SETTINGs currently being provide for, appearing in the following order:
|
SETTING |
Definition |
|
Analysis Start Date |
Start day of the analysis (format: YYYY:JJJ) |
|
Analysis Stop Date |
Stop day of the analysis (format: YYYY:JJJ) |
|
Total Number of Days |
Total number of days analyzed |
|
Total Number of PSD's |
Total number of PSD’s making up the analysis |
|
Total Number of Problem Days |
Total number of days encountering a problem (information only) |
|
SAMPLE RATE |
Sample rate of the channel |
|
NETWORK |
Network name |
|
STATION |
Station name |
|
LOCATION |
Location identifier |
|
CHANNEL |
Channel identifier |
|
NYQUIST |
Nyquist value for this analysis |
Definition: File holding various statistics for this analysis.
Directory: wrk
Internal Format:
· Data Format: ASCII
· Individual lines each defining:
FREQ MIN AVE 50% 90% MAX MODE
Where:
FREQ is the frequency in question (in Hz)
MIN is the minimum PSD value
AVE is the average PSD value
50% is the 50th percentile PSD value
90% is the 90th percentile PSD value
MAX is the maximum PSD value
MODE is the mode PSD value (most common)
Definition: Cumulative db-based .bin file holding overall PSD for julian day jjj.
Directory: wrk
Internal Format:
· Data Format: ASCII
· Individual lines each defining:
FREQ POWER PROB
Where:
FREQ is the frequency (in Hz)
POWER is the power bin (in dB)
PROB is the number of hits for this bin
This appendix contains a table listing all of the required configuration parameters. Please print this page, providing the appropriate values to be used. Confirm that all values have been properly defined in the specified file(s).
|
Config Parameter |
|
|
|
|
PDFROOT |
Root directory of PDF analysis system |
PDF/.vars-user file |
|
|
WEBDIR |
Directory of .png files accessible from www |
PDF/.vars-user file |
|
|
RESPDIR |
Directory containing response files |
PDF/.vars-user file |
|
|
DATAROOT |
Root directory of miniseed data files |
PDF/.vars-user file |
|
|
STATSROOT |
Root directory of PDF analysis results |
PDF/.vars-user file |
|
|
GMTROOT |
GMT installation root directory |
PDF/.vars-user file |
|
|
IMROOT |
ImageMagick installation root directory |
PDF/.vars-user file |
|
|
RESPDIR |
Directory containing response files |
PDF/src/vx.x.x/analysis/PDFuser.h |
|