Update Beginners.md

session-replay-tools · Sep 8, 2024 · 2de0307 · 2de0307
1 parent 12eec32
commit 2de0307
Showing 1 changed file with 1 addition and 27 deletions.
diff --git a/Beginners.md b/Beginners.md
@@ -2,8 +2,6 @@
 
 With the rapid development of internet technology, server-side architectures have become increasingly complex. It is now difficult to rely solely on the personal experience of developers or testers to cover all possible business scenarios. Therefore, real online traffic is crucial for server-side testing. TCPCopy [1] is an open-source traffic replay tool that has been widely adopted by large enterprises. While many use TCPCopy for testing in their projects, they may not fully understand its underlying principles. This article provides a brief introduction to how TCPCopy works, with the hope of assisting readers.
 
-
-
 # Architecture
 
 The architecture of TCPCopy has undergone several upgrades, and this article introduces the latest 1.0 version. As shown in the diagram below, TCPCopy consists of two components: *tcpcopy* and *intercept*. *tcpcopy* runs on the online server, capturing live TCP request packets, modifying the TCP/IP header information, and sending them to the test server, effectively "tricking" the test server. *intercept* runs on an auxiliary server, handling tasks such as relaying response information back to *tcpcopy*.
@@ -24,22 +22,14 @@ The simplified interaction process is as follows:
 
 5. *tcpcopy* receives and processes the returned data.
 
-
-
-
-
 # Technical Principles
 
 TCPCopy operates in two modes: online and offline. The online mode is primarily used for real-time capturing of live request packets, while the offline mode reads request packets from pcap-format files. Despite the difference in working modes, the core principles remain the same. This section provides a detailed explanation of TCPCopy's core principles from several perspectives.
 
-
-
-## **1. **Packet Capturing and Sending
+## 1. Packet Capturing and Sending
 
 The core functions of *tcpcopy* can be summarized as "capturing" and "sending" packets. Let's begin with packet capturing. How do you capture real traffic from the server? Many people may feel confused when first encountering this question. In fact, Linux operating systems already provide the necessary functionality, and a solid understanding of advanced Linux network programming is all that's needed. The initialization of packet capturing and sending in *tcpcopy* is handled in the `tcpcopy/src/communication/tc_socket.c` file. Next, we will introduce the two methods *tcpcopy* uses for packet capturing and packet sending.
 
-
-
 ### Raw Socket
 
 A raw socket can receive packets from the network interface card on the local machine.  This is particularly useful for monitoring and analyzing network traffic. The code for initializing raw socket packet capturing in *tcpcopy* is shown below, and this method supports capturing packets at both the data link layer and the IP layer.
@@ -123,16 +113,12 @@ tc_raw_socket_out_init(void)
 
 ```
 
-
-
 Construct the complete packet and send it to the target server.
 
 - `dst_addr` is filled with the target IP address.
 - The IP header is populated with the source and destination IP addresses.
 - The TCP header is filled with the source port, destination port, and other relevant information.
 
-
-
 ### Pcap
 
 Pcap is an application programming interface (API) provided by the operating system for capturing network traffic, with its name derived from 'packet capture.' On Linux systems, pcap is implemented via libpcap, and most packet capture tools, such as *tcpdump*, use libpcap for capturing traffic.
@@ -220,8 +206,6 @@ tc_pcap_snd_init(char *if_name, int mtu)
 }
 ```
 
-
-
 ### Raw Socket vs. Pcap
 
 Since *tcpcopy* offers two methods, which one is better?
@@ -230,8 +214,6 @@ When capturing packets, we are primarily concerned with the specific packets we
 
 For packet sending, *tcpcopy* uses the raw socket output interface by default, but it can also send packets via pcap_inject (using the `--enable-dlinject` option). The choice of which method to use can be determined based on performance testing in your actual environment.
 
-
-
 ## **2. TCP Protocol Stack**
 
 We know that the TCP protocol is stateful. Although the packet sending mechanism was explained earlier, without establishing an actual TCP connection, the sent packets cannot be truly received by the testing service. In everyday network programming, we typically use the TCP socket interfaces provided by the operating system, which abstract away much of the complexity of TCP states. However, in *tcpcopy*, since we need to modify the source IP and destination IP of the packets to deceive the testing service, the APIs provided by the operating system are no longer sufficient.
@@ -251,8 +233,6 @@ In *tcpcopy*, a session is defined to maintain information for different connect
 - **RST Packet:** If the current session is waiting for the test server's response, the RST packet is not sent. Otherwise, it's sent.
 - **FIN Packet:** If the current session is waiting for the test server's response, it waits; otherwise, the FIN packet is sent.
 
-
-
 ## **3. Routing**
 
 After *tcpcopy* sends the request packets, their journey may not be entirely smooth:
@@ -261,8 +241,6 @@ After *tcpcopy* sends the request packets, their journey may not be entirely smo
 - If the test server receives the request packet, the response packet will be sent to the forged IP address. To ensure these response packets don't mistakenly go back to the client with the forged IP, proper routing configuration is necessary. If the routing isn't set up correctly, the response packet won't be captured by *intercept*, leading to incomplete data exchange.
 - After *intercept* captures the response packet, it extracts the response packet and discards the actual data, returning only the response headers and other necessary information to *tcpcopy*. When necessary, it also merges the return information to reduce the impact on the network of the machine running *tcpcopy*.
 
-
-
 ## **4. Intercept**
 
 For those new to *tcpcopy*, it might be puzzling—why is *intercept* necessary if we already have *tcpcopy*? While *intercept* may seem redundant, it actually plays a crucial role. You can think of *intercept* as the server-side counterpart of *tcpcopy*, with its name itself explaining its function: an "interceptor." But what exactly does *intercept* need to intercept? The answer is the response packet from the test service.
@@ -271,8 +249,6 @@ If *intercept* were not used, the response packets from the test server would be
 
 *intercept* is an independent process that, by default, captures packets using the pcap method. During startup, the `-F` parameter needs to be passed, for example, "tcp and src port 8080," following libpcap's filter syntax. This means that *intercept* does not connect directly to the test service but listens on the specified port, capturing the return data packets from the test service and interacting with *tcpcopy*.
 
-
-
 ## **5. Performance**
 
 *tcpcopy* uses a single-process, single-thread architecture based on an epoll/select event-driven model, with related code located in the `tcpcopy/src/event` directory. By default, epoll is used during compilation, though you can switch to select with the `--select` option. The choice of method can depend on the performance differences observed during testing. Theoretically, epoll performs better when handling a large number of connections.
@@ -297,8 +273,6 @@ static tc_event_actions_t tc_event_actions = {
 };
 ```
 
-
-
 # Conclusion
 
 TCPCopy is an excellent open-source project. However, due to the author's limitations, this article only covers the core technical principles of TCPCopy, leaving many details untouched [2]. Nevertheless, I hope this introduction provides some inspiration to those interested in TCPCopy and traffic replay technologies!