'Get response header
I would like to get response headers from GET or POST.
My example is:
library(httr)
library(RCurl)
url<-'http://www.omegahat.org/RCurl/philosophy.html'
doc<-GET(url)
names(doc)
[1] "url" "handle" "status_code" "headers" "cookies" "content" "times" "config"
but there is no response headers, only request headers.
Result shoud be something like this:
Connection:Keep-Alive
Date:Mon, 11 Feb 2013 20:21:56 GMT
ETag:"126a001-e33d-4c12cf2702440"
Keep-Alive:timeout=15, max=100
Server:Apache/2.2.14 (Ubuntu)
Vary:Accept-Encoding
Can I do this with R and httr/RCurl packages or R is not enough for this kind of problem?
Edit: I would like to get all response headers. I am mainly interested in Location response which is not in this example.
Edit2: I forgot to tell the system which I work on - it is Windows 7
My session.info
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=Polish_Poland.1250 LC_CTYPE=Polish_Poland.1250 LC_MONETARY=Polish_Poland.1250
[4] LC_NUMERIC=C LC_TIME=Polish_Poland.1250
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rjson_0.2.12 RCurl_1.95-3 bitops_1.0-5 httr_0.2 XML_3.95-0.1
loaded via a namespace (and not attached):
[1] digest_0.6.2 stringr_0.6.2 tools_2.15.2
Solution 1:[1]
You can do it this way :
h <- basicHeaderGatherer()
doc <- getURI("http://www.omegahat.org/RCurl/index.html", headerfunction = h$update)
h$value()
Which will give you a named vector :
Date Server
"Mon, 11 Feb 2013 20:41:58 GMT" "Apache/2.2.14 (Ubuntu)"
Last-Modified ETag
"Wed, 24 Oct 2012 15:49:35 GMT" "\"3262089-10bf-4ccd0088461c0\""
Accept-Ranges Content-Length
"bytes" "4287"
Vary Content-Type
"Accept-Encoding" "text/html"
status statusMessage
"200" "OK"
Solution 2:[2]
Sorry to necro, but you can do this using httr, but names()
on a response object from httr::GET()
doesn't reveal them.
You can either use httr::GET()
or httr::HEAD()
, the difference being that GET()
will also retrieve the object at the URI. So HEAD()
is a more polite call to the server if you only want to check headers.
For example:
str(httr::HEAD("https://stackoverflow.com/questions/14820286/get-response-header"))
#> List of 10
#> $ url : chr "https://stackoverflow.com/questions/14820286/get-response-header"
#> $ status_code: int 200
#> $ headers :List of 19
#> ..$ connection : chr "keep-alive"
#> ..$ cache-control : chr "private"
#> ..$ content-type : chr "text/html; charset=utf-8"
#> ..$ content-encoding : chr "gzip"
#> ..$ strict-transport-security: chr "max-age=15552000"
#> ..$ x-frame-options : chr "SAMEORIGIN"
#> ..$ x-request-guid : chr "1ce6cb52-2fd2-43b4-ac62-a577e2554f8e"
#> ..$ feature-policy : chr "microphone 'none'; speaker 'none'"
#> ..$ content-security-policy : chr "upgrade-insecure-requests; frame-ancestors 'self' https://stackexchange.com"
#> ..$ accept-ranges : chr "bytes"
#> ..$ date : chr "Fri, 15 Apr 2022 19:25:01 GMT"
#> ..$ via : chr "1.1 varnish"
#> ..$ x-served-by : chr "cache-lga21933-LGA"
#> ..$ x-cache : chr "MISS"
#> ..$ x-cache-hits : chr "0"
#> ..$ x-timer : chr "S1650050702.976391,VS0,VE11"
#> ..$ vary : chr "Accept-Encoding,Fastly-SSL"
#> ..$ x-dns-prefetch-control : chr "off"
#> ..$ set-cookie : chr "prov=86c2b2fb-5e39-9798-60ba-0f65eee27a7c; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly"
#> ..- attr(*, "class")= chr [1:2] "insensitive" "list"
#> $ all_headers:List of 1
#> ..$ :List of 3
#> .. ..$ status : int 200
#> .. ..$ version: chr "HTTP/1.1"
#> .. ..$ headers:List of 19
#> .. .. ..$ connection : chr "keep-alive"
#> .. .. ..$ cache-control : chr "private"
#> .. .. ..$ content-type : chr "text/html; charset=utf-8"
#> .. .. ..$ content-encoding : chr "gzip"
#> .. .. ..$ strict-transport-security: chr "max-age=15552000"
#> .. .. ..$ x-frame-options : chr "SAMEORIGIN"
#> .. .. ..$ x-request-guid : chr "1ce6cb52-2fd2-43b4-ac62-a577e2554f8e"
#> .. .. ..$ feature-policy : chr "microphone 'none'; speaker 'none'"
#> .. .. ..$ content-security-policy : chr "upgrade-insecure-requests; frame-ancestors 'self' https://stackexchange.com"
#> .. .. ..$ accept-ranges : chr "bytes"
#> .. .. ..$ date : chr "Fri, 15 Apr 2022 19:25:01 GMT"
#> .. .. ..$ via : chr "1.1 varnish"
#> .. .. ..$ x-served-by : chr "cache-lga21933-LGA"
#> .. .. ..$ x-cache : chr "MISS"
#> .. .. ..$ x-cache-hits : chr "0"
#> .. .. ..$ x-timer : chr "S1650050702.976391,VS0,VE11"
#> .. .. ..$ vary : chr "Accept-Encoding,Fastly-SSL"
#> .. .. ..$ x-dns-prefetch-control : chr "off"
#> .. .. ..$ set-cookie : chr "prov=86c2b2fb-5e39-9798-60ba-0f65eee27a7c; domain=.stackoverflow.com; expires=Fri, 01-Jan-2055 00:00:00 GMT; path=/; HttpOnly"
#> .. .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"
#> $ cookies :'data.frame': 1 obs. of 7 variables:
#> ..$ domain : chr "#HttpOnly_.stackoverflow.com"
#> ..$ flag : logi TRUE
#> ..$ path : chr "/"
#> ..$ secure : logi FALSE
#> ..$ expiration: POSIXct[1:1], format: "2054-12-31 18:00:00"
#> ..$ name : chr "prov"
#> ..$ value : chr "86c2b2fb-5e39-9798-60ba-0f65eee27a7c"
#> $ content : raw(0)
#> $ date : POSIXct[1:1], format: "2022-04-15 19:25:01"
#> $ times : Named num [1:6] 0 0.0471 0.1203 0.2368 0.2819 ...
#> ..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
#> $ request :List of 7
#> ..$ method : chr "HEAD"
#> ..$ url : chr "https://stackoverflow.com/questions/14820286/get-response-header"
#> ..$ headers : Named chr "application/json, text/xml, application/xml, */*"
#> .. ..- attr(*, "names")= chr "Accept"
#> ..$ fields : NULL
#> ..$ options :List of 3
#> .. ..$ useragent : chr "libcurl/7.64.1 r-curl/4.3 httr/1.4.2"
#> .. ..$ nobody : logi TRUE
#> .. ..$ customrequest: chr "HEAD"
#> ..$ auth_token: NULL
#> ..$ output : list()
#> .. ..- attr(*, "class")= chr [1:2] "write_memory" "write_function"
#> ..- attr(*, "class")= chr "request"
#> $ handle :Class 'curl_handle' <externalptr>
#> - attr(*, "class")= chr "response"
Created on 2022-04-15 by the reprex package (v2.0.0)
Solution 3:[3]
curl -I http://www.google.com
HTTP/1.1 200 OK
Date: Mon, 11 Feb 2013 20:36:06 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=ec3eb1b4b4f31100:FF=0:TM=1360614966:LM=1360614966:S=EjQCjjdv07A6PRtw; expires=Wed, 11-Feb-2015 20:36:06 GMT; path=/; domain=.google.com
Set-Cookie: NID=67=neiRZQ9fctd6NqzdKNdRMzfBqk-yAaxxxruYrnsvTcJeG7q8TJm5Ybv1UZ2ZV_ZheYhy-RwgAppHUh1VhIz4KOcFbcl8-0DvtPYXxaiSQmYvXGEKqeh4glhqvhOdxJKB; expires=Tue, 13-Aug-2013 20:36:06 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Transfer-Encoding: chunked
curl -v http://google.com/
$ curl -v http://google.com/
* About to connect() to google.com port 80 (#0)
* Trying 66.102.7.104... connected
* Connected to google.com (66.102.7.104) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.16.4 (i386-apple-darwin9.0) libcurl/7.16.4 OpenSSL/0.9.7l zlib/1.2.3
> Host: google.com
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Location: http://www.google.com/
< Content-Type: text/html; charset=UTF-8
< Date: Thu, 15 Jul 2010 06:06:52 GMT
< Expires: Sat, 14 Aug 2010 06:06:52 GMT
< Cache-Control: public, max-age=2592000
< Server: gws
< Content-Length: 219
< X-XSS-Protection: 1; mode=block
<
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/">here</A>.
</BODY></HTML>
* Connection #0 to host google.com left intact
* Closing connection #0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | juba |
Solution 2 | Joe Wasserman |
Solution 3 | JZ. |