Cleaning the Data frame

Question

Post reply

Cleaning the Data frame

Steve Jones - SSC Editor

SSC Guru

Points: 728159
More actions
August 7, 2019 at 12:00 am

#3664362

Comments posted to this topic are about the item Cleaning the Data frame

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Stewart "Arturius" Campbell SSC Guru Points: 72321 More actions · Answer 1

Stewart "Arturius" Campbell

SSC Guru

Points: 72321

August 7, 2019 at 6:36 am

#3669368

Nice reminder, thanks Steve

____________________________________________
Space, the final frontier? not any more...
All limits henceforth are self-imposed.
“libera tute vulgaris ex”

Carlo Romagnano SSC-Insane Points: 22281 More actions · Answer 2

Carlo Romagnano

SSC-Insane

Points: 22281

August 7, 2019 at 7:53 am

#3669383

I found this in the syntax:

"header is set to TRUE if and only if the first row contains one fewer field than the number of columns."

So, because of the same number of columns and titles the right answer is:

x = read.csv2("Flights.csv",header=FALSE,sep=",",na.strings = "!")

jschmidt 17654 Default port Points: 1474 More actions · Answer 3

jschmidt 17654

Default port

Points: 1474

August 7, 2019 at 1:16 pm

#3669485

The "one fewer field" guidance is weird to me. I've been using read.csv2 on files with the same number of header fields and columns with header=TRUE to read many files successfully. I wonder if their is some implied row number field in a csv or if the guidance isn't clear.

Carlo Romagnano SSC-Insane Points: 22281 More actions · Answer 4

Carlo Romagnano

SSC-Insane

Points: 22281

August 7, 2019 at 1:34 pm

#3669489

I should try, but I think that if you specify "header=true or false" the first row contains column names (true= less names than columns, false=same number for names and columns.

Steve Jones - SSC Editor SSC Guru Points: 728159 More actions · Answer 5

Steve Jones - SSC Editor

SSC Guru

Points: 728159

August 7, 2019 at 2:57 pm

#3669518

Not sure that works.

2019-08-07 08_56_47-RStudio

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

George Vobr SSCrazy Eights Points: 9606 More actions · Answer 6

The syntax description in the reference states that header is a logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: header is set to TRUE if and only if the first row contains one fewer field than the number of columns.

If header = FALSE is explicitly specified, the first line is always treated as data values. See both examples given above by Steve, the resulting Data Frame has default column names Values V1, V2, V3, V4 and the first row are data from the original column names of Flights.csv.

Try a simple text import code to easily check the function of the header parameter, for example:

1.
read.csv2(sep = ",", text = "
a,b,c,
1,2,3,4,5
")
Parameter header is missing, but in the first row there are 4 elements a, b, c, for 5 columns.
Default header = TRUE applies.
Result is a data frame with an added header and one row of data:
  a b c X
1 2 3 4 5

2. But you cannot specify:
read.csv2(sep = ",", text = "
a,b,c
1,2,3,4,5
")
Result:
Error: more columns than column names

3. The header = FALSE is explicitly specified:
read.csv2(header = FALSE, sep = ",", text = "
a,b
1,2,3,4,5
")
Result is a data frame with the default column names.
The first row of data is completed with NA.

  V1 V2 V3 V4 V5
1  a  b NA NA NA
2  1  2  3  4  5

4. The header = TRUE is explicitly specified with more columns names than columns...:
read.csv2(header = TRUE, sep = ",", text = "
a,b,c,d,e,f,g,
1,2,3,4,5
")
Result is a data frame the header is completed with X. The first row of data is completed with NA.
  a b c d e  f  g  X
1 1 2 3 4 5 NA NA NA