Last updated: 2019-03-01

Checks: 6 0

Knit directory: project/

This reproducible R Markdown analysis was created with workflowr (version 1.2.0). The Report tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20181125) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File Version Author Date Message
Rmd 4e004f7 Tim Trice 2019-03-01 Add data report

These text products are listed under the header ACUS74. They can be found on the National Weather Service FTP server.

The following reports were obtained:

The links above point to the latest ACUS74 product issued by the respective NWS office. Therefore, it is possible that by the time you have read this article, the content of the text product has changed. Because of this, the rds data files have been saved to this website’s GitHub repository.

All times reported are in UTC. Pressure observations are in millibars. Wind and Gust observations are in knots.

Libraries

The following libraries were used in this article:

library(dplyr)
library(glue)
library(lubridate)
library(purrr)
library(stringr)
library(tibble)

Data

To load the data, I just put the links into a named list. I originally had considered keeping the data separate by NWS office but determined this was not necessary.

The code below looks for the data file mentioned earlier and, if it exists, loads the txt vector. Otherwise, each of the text products will be collected for parsing.

rpts <- c(
  "bro" = "acus74.kbro.psh.bro.txt",
  "crp" = "acus74.kcrp.psh.crp.txt",
  "ewx" = "acus74.kewx.psh.ewx.txt",
  "hgx" = "acus74.khgx.psh.hgx.txt",
  "lch" = "acus74.klch.psh.lch.txt",
  "lix" = "acus74.klix.psh.lix.txt"
)

# Read data
txt <- map(rpts, ~readLines(here::here(glue("./data/{.x}"))))

Each text product contains several sections:

Not all sections will contain data. For this article, all data collected was that which contained a latitude and longitude position.

Each section contains a data header:

A. LOWEST SEA LEVEL PRESSURE/MAXIMUM SUSTAINED WINDS AND PEAK GUSTS
---------------------------------------------------------------------
METAR OBSERVATIONS...
NOTE: ANEMOMETER HEIGHT IS 10 METERS AND WIND AVERAGING IS 2 MINUTES
---------------------------------------------------------------------
LOCATION  ID    MIN    DATE/     MAX      DATE/     PEAK    DATE/
LAT  LON        PRES   TIME      SUST     TIME      GUST    TIME
DEG DECIMAL     (MB)   (UTC)     (KT)     (UTC)     (KT)    (UTC)
---------------------------------------------------------------------
B. MARINE OBSERVATIONS...
NOTE: ANEMOMETER HEIGHT IN METERS AND WIND AVERAGING PERIOD IN
MINUTES INDICATED UNDER MAXIMUM SUSTAINED WIND IF KNOWN
---------------------------------------------------------------------
LOCATION  ID    MIN    DATE/     MAX      DATE/     PEAK    DATE/
LAT  LON        PRES   TIME      SUST     TIME      GUST    TIME
DEG DECIMAL     (MB)   (UTC)     (KT)     (UTC)     (KT)    (UTC)
---------------------------------------------------------------------

Additionally, a subsection of A exists:

NON-METAR OBSERVATIONS... 
NOTE: ANEMOMETER HEIGHT IN METERS AND WIND AVERAGING PERIOD IN
MINUTES INDICATED UNDER MAXIMUM SUSTAINED WIND IF KNOWN
---------------------------------------------------------------------
LOCATION  ID    MIN    DATE/     MAX      DATE/     PEAK    DATE/
LAT  LON        PRES   TIME      SUST     TIME      GUST    TIME
DEG DECIMAL     (MB)   (UTC)     (KT)     (UTC)     (KT)    (UTC)
---------------------------------------------------------------------

This subsection was added in as part of Sectoin A but is indistinguishable in the parsed dataset.

Observation examples are inluded in the relevant section below.

Numerous observations contain remarks (identified in the dataset by ends_with("Rmks")). Every section contains a Remarks footer; however, this may not be populated and, therefore, not every observation with a .Rmks variable would have additional Remarks listed in the text product. The additional Remarks were not collected.

The .Rmks legend is identified in the footer of each text product as:

All .Rmks variables in the datasets are either NA or “I”.

Sections A and B may also contain additional anenometer height and wind-averaging period variables on a third line. This data was not collected but could easily have been; I did not feel it was relevant to this article.

Sea Level Pressure and Marine Observations

A typical observation in Section A or B will look like the following:

RCPT2-ROCKPORT                                                      
28.02  -97.05   941.8 26/0336 I 017/059  26/0154 I 016/094 26/0148 I

There are 15 variables in the observation above (in order as they appear, with the example text):

  • ID (RCPT2)

  • Station (ROCKPORT)

  • Lat (28.02)

  • Lon (-97.05)

  • Pres (941.8) [barometric pressure, mb]

  • PresDT (26/0336) [date/time of Pres observation, UTC]

  • PresRmks (I) [incomplete pressure observation]

  • Wind, WindDir (017/059) [wind speed, kts, and wind direction]

  • WindDT (26/0154) [date/time of preceeding wind observation, UTC]

  • WindRmks (I) [incomplete wind observation]

  • Gust, GustDir (016/094) [maximum gust, kts, and direction]

  • GustDT (26/0148) [date/time of preceeding gust observation, UTC]

  • GustRmks (I) [incomplete gust observation]

Every observation has an empty line before and after which can be used as a delimiter to spilt observations.

To extract the relevant sections, I loop through the text products (txt) and identify where Section A and Section C begins. With these numerical indices, I can extract both sections, assigning the subset to slp_raw.

slp_raw <- c(map(txt, ~.[grep("^A\\.", .):grep("^C\\.", .)])) %>%
  flatten_chr()

From there, I identify all vector elements that begin with a latitude and longitude field. I counted these values (slp_n) so that I know exactly how long my final results will be (and to check progress as I move along, making sure I haven’t inadvertently removed anything).

Once I know where the observation indices are (slp_obs_n) I can find the station identification by calculating slp_obs_n - 1; this gives me slp_stations_n.

slp_obs_ptn <- "^\\d\\d\\.\\d\\d\\s*-*\\d\\d\\d*\\.\\d\\d.+"
slp_n <- sum(str_count(slp_raw, slp_obs_ptn))
slp_obs_n <- str_which(slp_raw, slp_obs_ptn)
slp_stations_n <- slp_obs_n - 1

Before merging the two vectors, I found some station values were not all the same length; I felt this would be beneficial with the regex. I create slp_stations, first trimming all values then finding the max length value. With the max length, I rounded up to the nearest ten and padded all all values to the right.

The last bit of manipulation involved replacing the first “-”, if available, with a “\t” character. This also helped me make it easier to split variables ID and Station since Station would contain additional “-” characters.

slp_stations <-
  slp_raw[slp_stations_n] %>%
  str_trim() %>%
  str_pad(width = round(max(nchar(.)), digits = -1), side = "right") %>%
  # Replace first "-" with "\t" to help split ID and Station
  str_replace("\\s*-\\s*", "\t")

Note that the code above split ID “TXCV-4” which would later be corrected.

Finally, I subset slp_obs and then with slp_stations make vector slp.

# Load observations and trim
slp_obs <- slp_raw[slp_obs_n]

# Combine stations and obs
slp <- str_c(slp_stations, slp_obs)

Following is a look at the head of slp:

head(slp, n = 5L)
[1] "KBRO\tBROWNSVILLE INTL ARPT                        25.91 -97.42   1002.9 25/1153   300/022  25/1853   290/032 25/1753  "
[2] "KHRL\tHARLINGEN VALLEY INTL ARPT                   26.22 -97.66   1003.6 25/1252   310/023  25/1841   310/033 25/1852  "
[3] "KPIL\tPORT ISABEL ARPT                             26.15 -97.23   1001.9 25/1253   300/024  25/1553   300/034 25/1453  "
[4] "KBKS\tBROOKS COUNTY ARPT                           26.20 -98.12   1002.2 26/0215   290/019  26/0215   290/027 26/0215  "
[5] "KTXS33                                            26.13 -97.17   1003.0 25/2240   999/999  99/9999   999/999 99/9999  " 

ID [ID], Station [Station] (opt)

Matching ID and Station is easier after switching the first “-” with a “\t”. Station is optional. To find the end of the string I simply looked for the latitude pattern that would follow.

            # ID-Station
            "(\\w+)\t*(?<=\t{0,1})(.+)(?=\\s+\\d{2}\\.\\d{2})",

Latitude [Lat]

Latitude was also very easy to extract. The pattern just looks for four digits with a decimal splitting in half.

            # Lat
            "\\s+(\\d{2}\\.\\d{2})",

Longitude [Lon]

Extracting Longitude was a little bit more of a challenge. Most observations have a negative longitude value (since occurring in the northwestern hemisphere). This was accurately reflected in the text products; mostly. Some values did not contain the leading “-” such as Station “KEFD”:

slp[grep("^KEFD", slp)][2]
[1] "KEFD                                              29.62 95.17    1003.4 29/1650   160/027  26/1750   360/036 29/2150  "

To accomodate the possibilities, I had to be loose with the number of digits expected in addition to making the negative sign optional.

            # Lon
            "\\s+-*(\\d{2,6}\\.\\d{2})",

Minimum Pressure [Pres] opt

Expected pressure values would have a format like \\d{3,4}\\.\\d{2}. Some observations had values of “9999.0” which clearly were invalid (expected ranges were roughly between 940 and 1010). These would later be cleaned up.

            # Pres
            "\\s+(\\d{1,4}\\.\\d{1})*",

Date/time of pressure observation [PresDT] opt

PresDT was initially split to extract the date value first (PresDTd) followed by the “%h%m” value (PresDThm). The general format, “\d{2}/\d{4}” would not work primarily because many observations contained the text “MM” or “N/A”.

Additionally, some observations also held the value “99/9999”. This would be accepted by the default format but would fail when converting the values to a valid date/time variable. These would later be cleaned.

With this, I ended up being very generous with the regex.

            # PresDTd, PresDThm
            "\\s+(\\w{1,3}|N)*/*(\\w{1,4})*",

Pressure remarks [PresRmks] opt

As noted previously, some Pres variables may be incomplete for unknown reasons. These values would be indicated with the letter “I” as noted in the “RCPT2” example earlier.

            # PresRmks
            "\\s+(I)*",

Maximum sustained wind direction [WindDir], Maximum sustained winds (opt) [Wind]

Variables WindDir and Wind were split with a “/” character. However, again, not all values were numeric as expected, such as “LOPL1”:

slp[grep("^LOPL1", slp)]
[1] "LOPL1\tLA OFFSHORE OIL PORT                        28.88  -90.03   998.1 07/2216   080/029 07/2011    090/037 07/1811  "

Invalid values such as “999” would later be marked as NA.

            # WindDir, Wind
            "\\s+(\\w{3})/(\\d{3})",

The remaining variables followed generally the same rules as similar variables above (i.e., GustDir for WindDir, GustDT for WindDT, etc.). The final str_match call brought all expected observations in order.

# Begin extraction. Move to dataframe and rename variables.
slp_df <-
  str_match(
    slp,
    sprintf("^%s%s%s%s%s%s%s%s%s%s%s%s$",
            # ID-Station
            "(\\w+)\t*(?<=\t{0,1})(.+)(?=\\s+\\d{2}\\.\\d{2})",
            # Lat
            "\\s+(\\d{2}\\.\\d{2})",
            # Lon
            "\\s+-*(\\d{2,6}\\.\\d{2})",
            # Pres
            "\\s+(\\d{1,4}\\.\\d{1})*",
            # PresDTd, PresDThm
            "\\s+(\\w{1,3}|N)*/*(\\w{1,4})*",
            # PresRmks
            "\\s+(I)*",
            # WindDir, Wind
            "\\s+(\\w{3})/(\\d{3})",
            # Wind DTd, WindDThm
            "\\s+(\\d{2,3})*/*(\\d{3,4})*",
            # WindRmks
            "\\s+(I)*",
            # GustDir, Gust
            "\\s+(\\w{3})*/*(\\d{3})*",
            # GustDTd, GustDThm
            "\\s+(\\d{2,3})*/*(\\d{3,4})*",
            # GustRmks
            "\\s+(I)*"
    )
  ) %>%
  as_tibble(.name_repair = "minimal") %>%
  set_names(nm = slp_df_names) %>%
  arrange(ID) %>%
  select(-txt)

Rainfall

Section C of the text products listed recorded rainfall observations across the Texas and Louisiana area. These observations were similar in format to those of pressure:

COLETO CREEK                 GOLIAD              CKDT2         9.42 I
28.73  -97.17

The following fields were extracted:

  • Location (“COLETO CREEK”)

  • Count (“GOLIAD”)

  • Station (“CKDT2”) opt

  • Rain (“9.42”)

  • RainRmks (“I”)

  • Lat (“28.73”)

  • Lon (“-97.17”)

Extracting these observations followed the same premise for that of slp; identifying the latitude lines then subsetting those indices and the previous indices and combining into a vector.

Unlike slp, there were no surprises in cleaning this data. The regex used:

rain_df <- str_match(rain,
                     pattern = sprintf("^%s%s\\s+%s\\s+%s\\s*%s\\s+%s\\s+%s$",
                                       "(.{29})",
                                       "(.{19})",
                                       "(.{0,12})",
                                       "(\\d{1,2}\\.\\d{2})",
                                       "(I)*",
                                       "(\\d{1,2}\\.\\d{2})",
                                       "-*(\\d{2,3}\\.\\d{2})")) %>%
  as_tibble(.name_repair = "minimal") %>%
  set_names(nm = rain_df_names) %>%
  mutate_at(.vars = c("Rain", "Lat", "Lon"), .funs = as.numeric) %>%
  mutate(Lon = if_else(Lon > 0, Lon * -1, Lon)) %>%
  select(-txt)

Tornadoes

Section F lists reported tornadoes for each region of responsibility. Some NWS offices reported no tornado observations. Others, such as Houston, reported at least a dozen.

An example observation is as follows:

4 NNE SEADRIFT               CALHOUN          25/2114          EF0   
28.43  -96.67

FACEBOOK PHOTOS AND VIDEO SHOWED A BRIEF TORNADO TOUCHED DOWN ON
GATES ROAD NEAR SEADRIFT. A SHED AND CARPORT WERE DESTROYED AND A
FEW TREES WERE BLOWN DOWN. RATED EF0. 

Each observation, again, was preceeded and proceeded by an empty line.

  • Location (“4 NNE SEADRIFT”)

  • County (“CALHOUN”)

  • Date (“25/2114”)

  • Scale (“EFO”)

  • Lat (“28.43”)

  • Lon (“-96.67”)

  • Details (“FACEBOOK PHOTOS…”)

Extracting the first two lines used the same technique as previous. However, extracting the Details required a little creativity since many spread over several lines.

To do this, I took the values of tor_y_n which marked the indices of the latitude/longitude positions. Then, I created vector tor_z_n to identify all “^\s+” elements in the original vector, tor_raw. With this, I was able to identify the first “^\s+” index following the latitude/longitude line, t_a and then take the very next index, t_b.

Apologies for the non-descriptive names; I was lacking ingenuity.

With t_a and t_b, I used map2 through tor_raw to extract each subset. From that point there was some cleaning to bring the lines together as needed. If you have any better (even if slower, but more creative), I would love to hear them! I may have tried to get too creative with this task.

The regex was very similar to the rain regex as most items were evenly delimited.

devtools::session_info()
─ Session info ──────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.2 (2018-12-20)
 os       Ubuntu 16.04.5 LTS          
 system   x86_64, linux-gnu           
 ui       X11                         
 language (EN)                        
 collate  C.UTF-8                     
 ctype    C.UTF-8                     
 tz       Etc/UTC                     
 date     2019-03-01                  

─ Packages ──────────────────────────────────────────────────────────────
 package     * version date       lib source        
 assertthat    0.2.0   2017-04-11 [1] RSPM (R 3.5.2)
 backports     1.1.3   2018-12-14 [1] RSPM (R 3.5.2)
 callr         3.1.1   2018-12-21 [1] RSPM (R 3.5.2)
 cli           1.0.1   2018-09-25 [1] RSPM (R 3.5.2)
 crayon        1.3.4   2017-09-16 [1] RSPM (R 3.5.2)
 desc          1.2.0   2018-05-01 [1] RSPM (R 3.5.2)
 devtools      2.0.1   2018-10-26 [1] RSPM (R 3.5.2)
 digest        0.6.18  2018-10-10 [1] RSPM (R 3.5.2)
 dplyr       * 0.8.0.1 2019-02-15 [1] RSPM (R 3.5.2)
 evaluate      0.13    2019-02-12 [1] RSPM (R 3.5.2)
 fs            1.2.6   2018-08-23 [1] RSPM (R 3.5.2)
 git2r         0.24.0  2019-01-07 [1] RSPM (R 3.5.2)
 glue        * 1.3.0   2018-07-17 [1] RSPM (R 3.5.2)
 here          0.1     2017-05-28 [1] RSPM (R 3.5.2)
 htmltools     0.3.6   2017-04-28 [1] RSPM (R 3.5.2)
 knitr         1.21    2018-12-10 [1] RSPM (R 3.5.2)
 lubridate   * 1.7.4   2018-04-11 [1] RSPM (R 3.5.2)
 magrittr      1.5     2014-11-22 [1] RSPM (R 3.5.2)
 memoise       1.1.0   2017-04-21 [1] RSPM (R 3.5.2)
 pillar        1.3.1   2018-12-15 [1] RSPM (R 3.5.2)
 pkgbuild      1.0.2   2018-10-16 [1] RSPM (R 3.5.2)
 pkgconfig     2.0.2   2018-08-16 [1] RSPM (R 3.5.2)
 pkgload       1.0.2   2018-10-29 [1] RSPM (R 3.5.2)
 prettyunits   1.0.2   2015-07-13 [1] RSPM (R 3.5.2)
 processx      3.2.1   2018-12-05 [1] RSPM (R 3.5.2)
 ps            1.3.0   2018-12-21 [1] RSPM (R 3.5.2)
 purrr       * 0.3.0   2019-01-27 [1] RSPM (R 3.5.2)
 R6            2.4.0   2019-02-14 [1] RSPM (R 3.5.2)
 Rcpp          1.0.0   2018-11-07 [1] RSPM (R 3.5.2)
 remotes       2.0.2   2018-10-30 [1] RSPM (R 3.5.2)
 rlang         0.3.1   2019-01-08 [1] RSPM (R 3.5.2)
 rmarkdown     1.11    2018-12-08 [1] RSPM (R 3.5.2)
 rprojroot     1.3-2   2018-01-03 [1] RSPM (R 3.5.2)
 sessioninfo   1.1.1   2018-11-05 [1] RSPM (R 3.5.2)
 stringi       1.3.1   2019-02-13 [1] RSPM (R 3.5.2)
 stringr     * 1.4.0   2019-02-10 [1] RSPM (R 3.5.2)
 tibble      * 2.0.1   2019-01-12 [1] RSPM (R 3.5.2)
 tidyselect    0.2.5   2018-10-11 [1] RSPM (R 3.5.2)
 usethis       1.4.0   2018-08-14 [1] RSPM (R 3.5.2)
 whisker       0.3-2   2013-04-28 [1] RSPM (R 3.5.2)
 withr         2.1.2   2018-03-15 [1] RSPM (R 3.5.2)
 workflowr     1.2.0   2019-02-14 [1] RSPM (R 3.5.2)
 xfun          0.5     2019-02-20 [1] RSPM (R 3.5.2)
 yaml          2.2.0   2018-07-25 [1] RSPM (R 3.5.2)

[1] /home/rstudio-user/R/x86_64-pc-linux-gnu-library/3.5
[2] /opt/R/3.5.2/lib/R/library