-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy pathTODO
197 lines (167 loc) · 7.15 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
TODO
====
Open Bugs
---------
* filter called after end
https://rt.cpan.org/Ticket/Display.html?id=34409
* RE: [HTTP-Proxy] Re: Broken Pipe Error with IE and NoFork
https://rt.cpan.org/Ticket/Display.html?id=22060
* test 22http.t hangs on Win32
https://rt.cpan.org/Ticket/Display.html?id=21220
* Cannot select source address to use for outgoing connection
https://rt.cpan.org/Ticket/Display.html?id=19098
* I need a way to specify my own match subroutine in push_filter()
https://rt.cpan.org/Ticket/Display.html?id=18637
* Allow buffers in filter_last
https://rt.cpan.org/Ticket/Display.html?id=13877
* $response->content(\&coderef)
https://rt.cpan.org/Ticket/Display.html?id=11480
* HTTP-Proxy-0.13 installaion stops at t\20dummy......
https://rt.cpan.org/Ticket/Display.html?id=7860
Technical
---------
* in the push_filter method, host and path are regexp.
this should be tested and enhanced
* how do we handle filters for:
- CONNECT (i.e. a transparent TCP connection through the proxy)
- SSL (a different socket type, but that's about all)
* Sets of HTTP methods (RFC2616, WebDAV, Microsoft RPC, etc...)
* enable filter bypass
* Add a stash in the proxy (Trelane)
* do something about utf8:
- auto flag utf-8 data ?
- believe the Content-Type
- autocheck with Encode if Content doesn't tell
* add a new response match: status
(match on a scalar / list)
$p->push_filter( status => 401, response => $filter );
$p->push_filter( status => [ 301, 302 ], response => $filter );
* Look at the test daemon in WWW::Mechanize
* push_filter should:
1) handle negative criteria (nohost => 'foo\\.com' );
2) compute the match sub as an eval string so a to be faster
(less useless tests)
* support for pipeline (sending serveral requests in a row, and receiving
the answers likewise. Answers are determined either by content-length
or using chunked encoding
Tests
-----
* tests for push_filter( query => 'foo' )
* test the short-circuit system ($self->proxy->response( $res ))
* test the regex for host and path in push_filter
Examples
--------
* http://www.alltooflat.com/geeky/elgoog/
Features
--------
Here are all the features I plan to add to HTTP::Proxy, in the form of a
roadmap.
Things are listed by big features, since I seem to be unable to follow
my own roadmaps. :-/
* Windows support
- the HPE::NoFork is the only engine correctly working under Win32
* have a look at Log::Dispatch
* RFC support
- Answer to TRACE and OPTIONS request
- support for chunked encoding
=> it's already in HTTP::Daemon, but the send_response
require to have either the whole content (not practical
for a proxy) or a callback (but we can't use both LWP::UA
and HTTP::Daemon callback systems at the same time
(some code will be cut and pasted)
=> should also use the send_redirect and send_error methods
when possible
- obey the RFC regarding Connection (sections 8. and 14.10)
=> send a Connection: close when the proxy decides to close
the connecion
=> Remove the headers listed in the Connection header (14.10)
From RFC 2068
If the Keep-Alive header is sent, the corresponding connection token
MUST be transmitted. The Keep-Alive header MUST be ignored if
received without the connection token.
From RFC 2616
Note: Mozilla sends stuff like this:
Keep-Alive: 300
Proxy-Connection: keep-alive
* Proxy Control
- Control the proxy via specific URIs
- Commands are called asking for http://proxy/cmd
(the host part of the URI should be configurable)
where cmd is a key to a dispatch table
- Example commands
* status => return some debug information
* config => live config management (uses forms)
* should be able to add more in inheriting classes
=> as an example, we could rewrite RGS's biscuit script
- all this requires CGI/HTML, which I don't want to write or maintain
the program should automatically create the configuration forms
and handle the HTML/CSS stuff
- configuration commands are handled by the parent process, so that
future forked process obey the new configuration
- how do subclassed HTTP::Proxy modules handle this?
- can someone add his own commands by modifying either the class or
the instance dispatch table?
- the parent process preforks children, when a control URL is requested
the child can write data to a special file/pipe and kill HUP the
parent so that it reads the new information
=> control system probably means having a HTTP::Proxy::Control module...
* Special response handlers
- must support: standard proxyied request, OPTIONS, TRACE, userdefined
- these methods have the following prototype: method($req, ...)
- they will be pushed by response_handler( GET => sub {} )
(default: PROXY, OPTIONS, TRACE)
=> this can be useful if one want to create a caching proxy
(though that might be complicated)
* Filters
- Must support the Content-Encoding: gzip headers
- Accept-Encoding: content-coding tokens are identity, gzip, compress, deflate
- add some "standard" filters, like a "recorder" for building web bots
- we could add filter factories as modules (HTTP::Proxy::Filter::*)
this could give some reflection capabilities to the system
(like: naming the filters, giving the parameters used to create them
and so on)
- have a look at Apache::Filter (mod_perl 2)
* Data flow
The system could even be more sophisticated by allowing one to
indicate that a filter does not change the body size (s/foo/bar/g vs.
s/foo/squigglewiggle/g). This way a Content-Length could be sent
with the other headers.
* On the fly forwarding
- I suppose there is a problem also with gzip Transfered files, if
we forward them as is, we can't filter them (need to think more
about that)
* record sessions
- the logs are not enough
- this should probably be a "standard" filter
- we want to be able to completely store a browsing session, probably
as a tree, for example to use them as a base for WWW::Mechanize::Builder
* https and CONNECT support
- proxy a https site, so as to filter and record, just as we do for
http requests.
- specific URLs ?
=> example: http://proxy/https://www.secure.com:443/index.html
=> This kind of stuff requires some content rewriting...
- add a filter/handler for the CONNECT method, which would either
simply proxy the TCP connection, or do a "man in the middle"-like
connection (with Crypt::SSLeay)
* other protocols
- support for ftp
- support for gopher
gopher://gopher.tc.umn.edu/
gopher://marvel.loc.gov/
* special headers
- handle big requests (see HTTP::Daemon::get_request documentation)
Better left to other modules
- Browsing recording in a tree-like structure (thanks to Referer:)
=> maybe that should be left to a WWW::Mechanize::Builder module
- become a POE component (OK, maybe that's another project)
=> POE::Component::?::?
Useful documentation:
http://www.stonehenge.com/merlyn/WebTechniques/col11.html
http://www.foad.org/~abigail/Perl/proxy.pl
http://rgarciasuarez.free.fr/perl/biscuit
http://www.jmarshall.com/tools/cgiproxy/
http://www.research.att.com/~hpk/wsp/
Yet another proxy in Perl:
http://www.schmerg.com/WrapUp.asp?file=HttpSniffer.html
http://www.schmerg.com/HttpSniffer.pl.txt