Groundwater data processing

Zach Nelson

Groundwater data processing

R

hydrology

monitoring wells

surface water flow

pumping volumes

LADWP data

OVGA

time series

Code automates extracting water level time series data from ZRXP ASCII data exchange format, transforming and merging updates with the active database, and exporting a csv file formatted for the OVGA data management system.

Author

Zach Nelson

Published

January 26, 2024

Modified

January 26, 2024

Overview

Depth to water measurements from LADWP monitoring wells are provided to ICWD after the end of each water year. Data is exported from WISKI (water information system by Kisters) in ZRXP format, an ASCII data exchange file format with a repeating header-data structure. These .DAT text files can be opened in a text editor or excel for processing, but here they are programmatically parsed, processed, appended to the active internal database and formatted and exported for the annual OVGA data management system updates.

Summary information about the processing, number of records dropped etc, and indicator well hydrographs for visual QA/QC are provided. The most recent OVGA depth to water import template instructions are included as an appendix. Before running the same code for the next year, update year in the file names for 1) the water year on the input files and master database, 2) output file name for the master database and the 3) ovga update file. The index.qmd of this repository contains the full code to process the data and produce this document.

Data

Processed here:

DepthRP_2022-23.dat - Depth to water reads measured from the reference point (RP) elevation;
DepthGE_2022-23.dat - Depth to water below ground elevation;
OwensValley_DepthWSE.dat - Water surface elevation.

input data

# new data updates -------
#.dat ascii files
file_pathWSE <-here('data','hydro','2022-23 Water Year Transfer Packet for ICWD','OwensValley_DepthWSE_2022-23.dat')
file_pathRP <-here('data','hydro','2022-23 Water Year Transfer Packet for ICWD','DepthRP_2022-23.dat')
file_pathGE <-here('data','hydro','2022-23 Water Year Transfer Packet for ICWD','DepthGE_2022-23.dat')


# parsed on 1-23-24 so they can be read in from the RDS file
depthRP <- readRDS(file = here('data','hydro','depthRP.RDS'))
depthGE <- readRDS(file = here('data','hydro','depthGE.RDS'))
depthWSE <- readRDS(file = here('data','hydro','depthWSE.RDS'))

# ground surface elevation
gse_staids <- read_csv(here('data','hydro','gse_staids.csv'))

# staids in ovga database - used to cross reference what will be accepted into an import
# need to filter out staids that aren't in the ovga list, otherwise import will fail
ovga_points <- read_csv(here("data","hydro","Owens_Monitoring_Points.csv"))

# reference point elevation - some staids don't have RP elevations. Identify these here for the current water year transfer
rpelev <- read_csv(here('data','hydro','rp_elev.csv'))

# ovga_uploads_mw_with_rpe_102021_092022_zn
# ovga_uploads_mw_with_rpe_102021_092022_zn
last_ovga_mw <- read_csv(here('output','ovga_uploads_mw_with_rpe_102021_092022_zn.csv'))
# historical water levels------
hist <- read_csv(here('data','hydro','Monitoring_Wells_Master_2022.csv'))
hist$date <- ymd(hist$date)

The file repeats header information and data rows for each monitoring well, so we extract the well id from the #TSPATH line using matching, and join it to the observations (lines that don’t start with #).

parse depthGE.dat

dat_content <- readLines(file_pathGE)

# Initialize variables to store data
data_list <- list()

# Function to check if a line contains timestamp and value
is_data_line <- function(line) {
  grepl("\\d+ \\d+\\.\\d+", line)
}

# Loop through each line in the file
for (line in dat_content) {
  # Check if the line starts with '#TSPATH'
  if (startsWith(line, "#TSPATH")) {
    # Extract 'staid' from the '#TSPATH' line
    staid <- sub('.*/([TVFRSW]\\w+).*', '\\1', line)
  }
  
  # Check if the line does not start with '#' and contains timestamp and value
  if (!startsWith(line, "#") && is_data_line(line) && !is.na(staid)) {
    # If the line does not start with '#' and contains timestamp and value,
    # add data to the list directly
    data_list <- c(data_list, list(data.frame(staid, dateread = line)))
  }
}

# Combine the data frames in the list into a single data frame
data_df <- bind_rows(data_list)

# Remove rows with NA values, Remove 'row' and 'value' columns
depthGE <- na.omit(data_df) %>% select(staid, dateread)

# Print the tidy data frame
# head(depthGE)
# 68,180 records


# separate data column 'dateread' into date and dtw read for depthGE, depthRP, depthWSE separate datasets

#separate the data column and assign numeric class to dtw
depthGE <- depthGE %>% separate(dateread, c("date", "dtw.bgs"),sep = " ") 
depthGE$dtw.bgs <- as.numeric(depthGE$dtw.bgs)
# head(depthGE)
saveRDS(depthGE, file = here('data','hydro','depthGE.RDS'))

# notes: The sub function is used to perform a substitution or replacement in a string. Let me break down the pattern used in sub('.*/([TVFRSW]\\w+).*', '\\1', line):
# 
# .*: This matches any characters (except for a newline) zero or more times.
# /: This matches the literal character '/'.
# ([TVFRSW]\\w+): This is a capturing group that matches a letter 'T', 'V', 'F', 'R', 'S', or 'W' followed by one or more word characters (letters, digits, or underscores). The parentheses create a group that can be referred to later.
# .*: Similar to the first one, this matches any characters (except for a newline) zero or more times.
# The replacement part '\\1' refers to the content captured by the first capturing group, which is the part inside the parentheses. In this case, it's the letter followed by one or more word characters. So, it effectively extracts that part from the original string.
# 
# For example, if line is "#TSPATH/0b/T361/GW/GW.DepthGE||CUNITft||RINVAL-777||SNAMET.H. 361||SANRT361|*|", the sub function will replace it with "T361" because that's what is captured by the pattern within the parentheses.
# 
# The goal here is to extract the 'staid' value from the '#TSPATH' line. The sub function is a way to match and replace parts of a string based on a specified pattern.

testwellupdate

# rename the columns
# clean up the date with formal date specification
testwell.up <- try %>% select(-date, -datetime) %>% rename(date = date.y.m.d) %>% mutate(source = "DWP")  
# head(testwell.up)

123311 rows in the annual update
785 staids in annual update

append updates to master database

## Append updates to master
testwells.combined <- bind_rows(hist, testwell.up) 

# head(testwells.combined)
# 1,145,360 records going back to 1971

# testwells.combined %>% n_distinct(staid)
# testwells.combined %>% distinct(staid) %>% nrow()

1171755 rows in the active database
1361 staids in the active database

DB update

export db file

save master database updates

# single year
testwell.up %>% write_csv(here('output','testwellwy2023.csv'))

# whole dataset
testwells.combined %>% write_csv(here('output','Monitoring_Wells_Master_2023.csv'))

#

Reference point elevation data

convert RP dates

## RP can be continually changing so need to be recursive with the RP file for updates.

# date conversion
rpelev_date <- rpelev %>% 
  mutate(date = lubridate::mdy(date_c))

head(rpelev_date) %>% datatable()

  count_unique_staids
1                 785

# A tibble: 19,865 × 4
# Groups:   staid [785]
   staid date       daily_count daily_count_category
   <chr> <date>           <int> <fct>               
 1 T756  2023-04-12          96 (50,100]            
 2 T756  2023-04-13          96 (50,100]            
 3 T756  2023-04-14          96 (50,100]            
 4 T756  2023-04-15          96 (50,100]            
 5 T756  2023-04-16          96 (50,100]            
 6 T756  2023-04-17          96 (50,100]            
 7 T756  2023-04-18          96 (50,100]            
 8 T756  2023-04-19          96 (50,100]            
 9 T756  2023-04-20          96 (50,100]            
10 T756  2023-04-21          96 (50,100]            
# ℹ 19,855 more rows

# A tibble: 10 × 2
   daily_count_category count_unique_staids
   <fct>                              <int>
 1 [0,1]                                770
 2 (1,2]                                 13
 3 (2,3]                                 15
 4 (3,4]                                 34
 5 (4,5]                                 34
 6 (5,10]                                 2
 7 (10,15]                                3
 8 (15,20]                                3
 9 (20,50]                                9
10 (50,100]                               2

      .
1 [0,1]

115 out of 785 staids have multiple measurements per day, some hourly and few up to 15min intervals.

100 staids out of the 115 that have multiple measurements per day, also have single measurements per day.

43 out of 785 staids have more than 1k records, 9 over 5k each. The rest of the staids have less than 32 measures a year.

avg daily dtw - aggregating

# daily average dtw
dtwrpavg.rpelev <- dtwrp.rpelev %>% 
  group_by(staid, date, rp_elev) %>% 
  summarise(dtw.rp =  round(mean(dtw.rp),2))


#19,865 1-24-24
# dtwrpavg.rpelevBeta <- dtwrp.rpelev %>% group_by(staid, date, rp_elev) %>% summarise(dtw.rp =  round(mean(dtw.rp),2),
#           dailyCount = n()
#           ) %>% arrange(desc(dailyCount))
dtwrpavg.rpelev

# A tibble: 19,865 × 4
# Groups:   staid, date [19,865]
   staid date       rp_elev dtw.rp
   <chr> <date>       <dbl>  <dbl>
 1 F033  2022-10-06   3834.   4.57
 2 F033  2022-11-08   3834.   4.57
 3 F033  2022-12-08   3834.   4.57
 4 F033  2023-01-06   3834.   4.57
 5 F033  2023-03-19   3834.   4.57
 6 F033  2023-04-07   3834.   4.57
 7 F033  2023-05-07   3834.   4.51
 8 F033  2023-06-12   3834.   4.51
 9 F033  2023-07-16   3834.   4.54
10 F033  2023-08-08   3834.   4.53
# ℹ 19,855 more rows

avg daily dtw - aggregating

# datatable(dtwrpavg.rpelev,
  # caption = 'daily averaging reduces data')

assign TR vs ES method based on measurement frequency

# more than 4 reads per month, 48 per year, pressure transducer TR, less electric sounder ES
# Number of records from staids with rp elevations
# classify method TR v ES by frequency of measurements.
# check

record.number <- dtwrp.rpelev %>% 
  filter(!is.na(rp_elev)) %>% 
  group_by(staid) %>% 
  summarise(count.with.rpe = n()) %>% 
  mutate(MMethod = case_when(count.with.rpe >= 48 ~ "TR",
                            count.with.rpe < 48~ "ES"))
# if there are four reads per day, once per month, approximates to 48. quick and dirty separation. a list of pressure transducer staids would be better if it could be maintained.

head(record.number %>% arrange(-count.with.rpe))

# A tibble: 6 × 3
  staid count.with.rpe MMethod
  <chr>          <int> <chr>  
1 T752            8734 TR     
2 T753            8734 TR     
3 T754            8734 TR     
4 T755            8734 TR     
5 T756            7893 TR     
6 T759            7614 TR

number of records per staid

# records per staid
LA.staids <- dtwrpavg.rpelev %>% group_by(staid) %>% summarise(records = n())
# head(LA.staids)
# LA.staids %>% arrange(desc(records))
#785

staids in data that are also on ovga list

LA.staids.in.ovga <- LA.staids %>% semi_join(ovga_points, by = c("staid"="mon_pt_name"))

# LA.staids.in.ovga %>% nrow()
#714

monitoring points missing rp elevations after join

# monitoring points missing rp elevations after join
# LA.staids.in.ovga
# find how many staids are missing rp elevation
na.rp.elev <- dtwrpavg.rpelev %>% 
  semi_join(LA.staids.in.ovga, by = 'staid') %>% 
  filter(is.na(rp_elev)) %>% 
  group_by(staid) %>% 
  summarise(count.na = n())

anti_join take NA rp elev off

# anti_join removes records in x that match y
# Select only data with rp elevations (not na)
rpselect <- dtwrpavg.rpelev %>% anti_join(na.rp.elev, by = "staid")
# rpselect
# 19,793 1.24.2024

keep those with rp elev and that are on the ovga list

semi_join retain records in ovga staid list

# 6,048 points including monitoring wells, pumping wells, surface water gauging stations.
# semi_join returns records in x with a match in y
rpselect2 <- rpselect %>% semi_join(ovga_points, by = c("staid"="mon_pt_name"))
# head(rpselect2)
rpselect2

# A tibble: 14,319 × 4
# Groups:   staid, date [14,319]
   staid date       rp_elev dtw.rp
   <chr> <date>       <dbl>  <dbl>
 1 F033  2022-10-06   3834.   4.57
 2 F033  2022-11-08   3834.   4.57
 3 F033  2022-12-08   3834.   4.57
 4 F033  2023-01-06   3834.   4.57
 5 F033  2023-03-19   3834.   4.57
 6 F033  2023-04-07   3834.   4.57
 7 F033  2023-05-07   3834.   4.51
 8 F033  2023-06-12   3834.   4.51
 9 F033  2023-07-16   3834.   4.54
10 F033  2023-08-08   3834.   4.53
# ℹ 14,309 more rows

semi_join retain records in ovga staid list

# datatable(rpselect2)

# -   `r record.number %>% nrow()` number of staids with rp elevations
# 
# -   `r LA.staids %>% nrow()` staids with rp elevation in data
# 
# -   `r LA.staids.in.ovga %>% nrow()` staids in data also in the OVGA database
# 
# -   `r LA.staids %>% nrow()-LA.staids.in.ovga %>% nrow()` omitted from import
# 
# -   `r na.rp.elev %>% nrow()` staids missing rp elevations after join

OVGA template

create ovga template

# methodinfer %>% distinct()
methodinfer <- record.number %>% select(-count.with.rpe)
# %>% distinct(staid)

upload <- rpselect2 %>% 
  left_join(methodinfer, by = "staid") %>% 
  # select(-latest_rp_date) %>% 
  select(WellName = staid, 
         DateMeasured = date, 
         DepthToWater = dtw.rp, 
         ReferencePointElevation = rp_elev
         ,
         MMethod
         ) %>% 
  mutate(ReportingDate = "",
         # ExclusionCondition = "" ,
         QAQCLevel = "High",
         MeasMethod = MMethod,#"ES",# from join above
         NoMeasFlag = "",
         QuestMeasFlag = "",
         DataSource = "LADWP",
         CollectedBy = "LADWP",
         UseInReporting = "yes",
         Notes = "") %>% 
  select(-MMethod)%>%
  filter(DepthToWater < 500 & !is.na(DepthToWater) & DepthToWater != 'NA' & DepthToWater != -777 & ReferencePointElevation !=0) %>% relocate(ReportingDate, .after = DateMeasured)

upload

# A tibble: 12,939 × 13
# Groups:   WellName, DateMeasured [12,939]
   WellName DateMeasured ReportingDate DepthToWater ReferencePointElevation
   <chr>    <date>       <chr>                <dbl>                   <dbl>
 1 F033     2022-10-06   ""                    4.57                   3834.
 2 F033     2022-11-08   ""                    4.57                   3834.
 3 F033     2022-12-08   ""                    4.57                   3834.
 4 F033     2023-01-06   ""                    4.57                   3834.
 5 F033     2023-03-19   ""                    4.57                   3834.
 6 F033     2023-04-07   ""                    4.57                   3834.
 7 F033     2023-05-07   ""                    4.51                   3834.
 8 F033     2023-06-12   ""                    4.51                   3834.
 9 F033     2023-07-16   ""                    4.54                   3834.
10 F033     2023-08-08   ""                    4.53                   3834.
# ℹ 12,929 more rows
# ℹ 8 more variables: QAQCLevel <chr>, MeasMethod <chr>, NoMeasFlag <chr>,
#   QuestMeasFlag <chr>, DataSource <chr>, CollectedBy <chr>,
#   UseInReporting <chr>, Notes <chr>

Completed 1-24-24

csv

18713 well-days in update with rp elevations
690 staids in update with rp elevations
14319 well-days in update with rp elevations and in ovga list
619 staids in update with rp elevations and in ovga list
12939 well-days in the ovga upload
611 staids in the ovga upload

export ovga import file

35 staids included last year in 2022 wy, and not this year 2023 wy. These staids do have water surface elevation but not depth to water from reference point required by ovga. I can calculate the depthRP from rp elevation - wse on next iteration. 17k records so pressure transducer data

Rows: 17,479
Columns: 11
$ staid   <chr> "F124", "F124", "F124", "F124", "F124", "F124", "F124", "F124"…
$ dtw.rp  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ wse     <dbl> 4022.38, 4021.88, 4021.76, 4021.88, 4024.12, 4024.30, 4024.74,…
$ dtw.bgs <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ year    <dbl> 2022, 2022, 2022, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 20…
$ month   <dbl> 10, 11, 12, 1, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 10, 10, 10, 10…
$ day     <int> 11, 7, 12, 18, 21, 12, 10, 12, 18, 10, 20, 1, 1, 1, 1, 1, 1, 1…
$ hour    <int> 12, 11, 15, 12, 12, 10, 10, 12, 9, 10, 9, 0, 1, 2, 3, 4, 5, 6,…
$ minute  <int> 19, 22, 9, 31, 30, 10, 51, 58, 41, 48, 37, 0, 0, 0, 0, 0, 0, 0…
$ date    <date> 2022-10-11, 2022-11-07, 2022-12-12, 2023-01-18, 2023-03-21, 2…
$ source  <chr> "DWP", "DWP", "DWP", "DWP", "DWP", "DWP", "DWP", "DWP", "DWP",…

save OVGA import

upload %>% write_csv(here("output","ovga_uploads_mw_with_rpe_102022_092023_zn.csv"))

datatable

datatable(upload,
  caption = '2022-2023 water year upload formatted for OVGA data managagement system.')

QA/QC Hydrographs

start of the water year is demarcated for 2022 and 2023

Laws

Bishop

Big Pine

Taboose Aberdeen

Thibaut Sawmill

Independence Oak

Symmes Shepherd

Bairs George

95 monitoring points missing rp elevation and omitted from import

Appendix: GLA Data Depth to Water Import Procedures

confirmed updated 1/23/24 zn

GLA Data Web application. The uploaded Excel Workbook must contain one spreadsheet with 13 columns with the following names in this order:

Field Name	Data Type	Required
WellName	Text	Yes
DateMeasured	Date	Yes
ReportingDate	Date	No
DepthToWater	Numeric	Conditional
ReferencePointElevation	Numeric	Conditional
QAQCLevel	Text	Yes
MeasMethod	Text	Yes
NoMeasFlag	Text	Conditional
QuestMeasFlag	Text	No
DataSource	Text	Yes
CollectedBy	Text	No
UseInReporting	Text	No
Notes	Text	Conditional

WellName The WellName column is required and must contain the name of a monitoring point within the basin selected when the file was uploaded.

DateMeasured The DateMeasured column is required. The field must be a date and can not be in the future nor more than 100 years in the past.

2-1-1 import error says that ReportingDate is not part of the column list ReportingDate The ReportingDate column must be blank or a date. If the field is a date, it must be within 14 days of DateMeasured. If left blank, the column is populated with the value in the DataMeasured column. This field allows users to assign a measurement to an adjacent month for reporting purposes. For example, a measurement collected on May 31st may be intended to be used as an April measurement.

DepthToWater This column must be blank or numeric. DepthToWater is the number of feet from the reference point. If blank, NoMeasFlag is required and ReferencePointElevation must also be blank. Positive values indicate the water level is below the top of the casing, while negative values indicate the water level is above the top of the casing (flowing artesian conditions).

ReferencePointElevation This column must be blank or numeric. ReferencePointElevation is the elevation in feet from where the depth to water measurement took place. If blank, NoMeasFlag is required and DepthToWater must also be blank.

QAQCLevel This field is required and must be one of the following values:

High - Data are of high quality
Medium - Data are inconsistent with previous values or sampling conditions were not ideal. Values will be displayed with a different color on plots.
Low - Data are not considered suitable for display or analysis due to inconsistencies with previous values or poor sampling conditions. Preserves sample in database for record-keeping purposes but not displayed on figures, tables, or used in analyses.
Undecided - QA/QC level has not been determined.

MeasMethod This field is required and must be one of the following values:

Code	Description
ES	Electric sounder measurement
ST	Steel tape measurement
AS	Acoustic or sonic sounder
PG	Airline measurement, pressure gage, or manometer
TR	Electronic pressure transducer
OTH	Other
UNK	Unknown

NoMeasFlag This field must be blank if DepthToWater and ReferencePointElevation contain values. Otherwise, this field is required and must be one of the following values:

Code	Description
0	Measurement discontinued
1	Pumping
2	Pump house locked
3	Tape hung up
4	Can’t get tape in casing
5	Unable to locate well
6	Well has been destroyed
7	Special/other
8	Casing leaking or wet
9	Temporary inaccessible
D	Dry well
F	Flowing artesian

QuestMeasFlag This field must be blank or be one of the following values:

Code	Description
0	Caved or deepened
1	Pumping
2	Nearby pump operating
3	Casing leaking or wet
4	Pumped recently
5	Air or pressure gauge measurement
6	Other
7	Recharge or surface water effects near well
8	Oil or foreign substance in casing
9	Acoustical sounder
E	Recently flowing
F	Flowing
G	Nearby flowing
H	Nearby recently flowing

DataSource This field is required and used to identify where the water level data came from (e.g., entity, database, file, etc.). Limit is 100 characters. default = “LADWP”

CollectedBy This field is optional and used to identify the person that physically collected the data. Limit is 50 characters. default = “LADWP”

UseInReporting This field is optional and used to filter measurements used in reports. If included, the value must be “yes”, “no”, “true”, “false”, “1” or “0”. If blank, a value of “yes” is assumed. default = “yes”

Notes This field must be populated if NoMeasFlag is 7 (special/other) or QuestMeasFlag is 6 (other), otherwise this field is optional. Limit is 255 characters. default = “blank”

Citation

BibTeX citation:

@report{nelson2024,
  author = {Nelson, Zach},
  publisher = {Inyo County Water Department},
  title = {Groundwater Data Processing},
  date = {2024-01-26},
  url = {https://inyo-gov.github.io/hydro-data/},
  langid = {en}
}

For attribution, please cite this work as:

Nelson, Zach. 2024. “Groundwater Data Processing.” Data Report. Inyo County Water Department. https://inyo-gov.github.io/hydro-data/.