Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translator Proxy shuts down after disconnecting downstream miner #1426

Closed
GitGab19 opened this issue Jan 31, 2025 · 7 comments · Fixed by #1427
Closed

Translator Proxy shuts down after disconnecting downstream miner #1426

GitGab19 opened this issue Jan 31, 2025 · 7 comments · Fixed by #1427
Labels
translator proxy SV1>SV2 translator proxy issues
Milestone

Comments

@GitGab19
Copy link
Collaborator

While testing some PRs, I noticed that Translator Proxy shuts down gracefully if a downstream miner pointed to it is killed.

I'm pretty sure this behaviour has been introduced on PR #1319, but further investigations are needed.

What we want to achieve is a Translator Proxy which closes connections with downstreams that shuts down, but without shutting itself down too.

Here some logs representing the issue described above:

Image
@GitGab19 GitGab19 added the translator proxy SV1>SV2 translator proxy issues label Jan 31, 2025
@GitGab19 GitGab19 added this to the 1.3.0 milestone Jan 31, 2025
@plebhash
Copy link
Collaborator

plebhash commented Jan 31, 2025

I started drafting an integration test to reproduce this bug, but soon realized that we still lack some primitives:

#[tokio::test]
async fn tproxy_survives_downstream_disconnect() {
    let (_tp, tp_addr) = start_template_provider(None).await;
    let (_pool, pool_addr) = start_pool(Some(tp_addr)).await;
    let (_, tproxy_addr) = start_sv2_translator(pool_addr).await;

    // connect first mining device
    // note: this function doesn't return anything
    // ideally we need something that allows us to kill the mining device
    start_mining_device_sv1(tproxy_addr).await;

    // tproxy still up
    assert!(tokio::net::TcpListener::bind(tproxy_addr).await.is_err());

    // connect second mining device
    start_mining_device_sv1(tproxy_addr).await;

    // tproxy still up
    assert!(tokio::net::TcpListener::bind(tproxy_addr).await.is_err());

    // here need something to kill the first mining device

    // tproxy still up (if panic, we still have the bug)
    assert!(tokio::net::TcpListener::bind(tproxy_addr).await.is_err());
}

@plebhash
Copy link
Collaborator

plebhash commented Jan 31, 2025

I can confirm this was introduced by #1319

here's what I did:

  • launch TP + Pool + tProxy + Sv1 CPU miner (all from main branch)
  • kill Sv1 CPU miner (tProxy dies too, which is the bug reported here)
  • git revert 80b67d42955827eb2016460c48406e2277e5d896 (revert changes from fix sigterm issue with translator sv2 #1319)
  • launch tProxy + Sv1 CPU miner again
  • kill Sv1 CPU miner (tProxy stays alive, which is the sane behavior)

cc @Shourya742

@plebhash
Copy link
Collaborator

an alternative way to reproduce the bug (using netcat, or nc, instead of a CPU miner):

  • launch TP + Pool + tProxy
  • nc localhost 34255
  • ctrl+c on nc

this is just to show that we don't really need a CPU miner to reproduce this

@plebhash
Copy link
Collaborator

plebhash commented Jan 31, 2025

the comment above was just to showcase that there's a simpler way to try to reproduce this bug as Integration Test, which is to simply to emulate a downstream with a TcpStream::connect on the tProxy listening port

btw, I enabled logs by cherry-picking from #1430 to get a better understanding of what's going on

#[tokio::test]
async fn tproxy_survives_downstream_disconnect() {
    tracing_subscriber::fmt()
        .with_env_filter(EnvFilter::from_default_env())
        .init();
    let (_tp, tp_addr) = start_template_provider(None).await;
    let (_pool, pool_addr) = start_pool(Some(tp_addr)).await;
    let (_, tproxy_addr) = start_sv2_translator(pool_addr).await;

    // emulate first downstream
    let downstream_a = std::net::TcpStream::connect(tproxy_addr).unwrap();

    // emulate second downstream
    let _downstream_b = std::net::TcpStream::connect(tproxy_addr).unwrap();

    // wait a bit to make sure the TCP sockets are processed
    tokio::time::sleep(std::time::Duration::from_secs(2)).await;

    // kill downstream_a
    downstream_a.shutdown(std::net::Shutdown::Both).unwrap();
    drop(downstream_a);

    // wait a bit to make sure the TCP sockets are processed
    tokio::time::sleep(std::time::Duration::from_secs(2)).await;

    // wtf?? tproxy still up??? what happened to the bug?? this should panic!!!
    assert!(tokio::net::TcpListener::bind(tproxy_addr).await.is_err());

    // wtf?? this should panic!!!
    let downstream_c = std::net::TcpStream::connect(tproxy_addr).unwrap();
}

now the puzzling part... tProxy port continues taken after the first connection is killed 🤷🏼

and we even can see the Shutting down gracefully... message, but it seems the tProxy port is still taken, and we can even connect a new downstream!!!

    Finished test [optimized + debuginfo] target(s) in 0.75s
     Running tests/translator_integration.rs (/Users/plebhash/develop/stratum/roles/target/debug/deps/translator_integration-7a568a53c1b4f7a0)

running 1 test
2025-01-31T23:37:48.591138Z  INFO pool_sv2::template_receiver: Connected to template distribution server at 127.0.0.1:63525
2025-01-31T23:37:48.606100Z  INFO roles_logic_sv2::handlers::common: Received SetupConnectionSuccess: version=2, flags=0
2025-01-31T23:37:48.606298Z  INFO pool_sv2::mining_pool: PUB KEY: Ok([TxOut { value: 0, script_pubkey: Script(OP_0 OP_PUSHBYTES_20 ebe1b7dcc293ccaa0ee743a86f89df8258c208fc) }])
2025-01-31T23:37:48.606467Z  INFO pool_sv2::mining_pool: Starting up pool listener
2025-01-31T23:37:48.606938Z  INFO pool_sv2::mining_pool: Listening for encrypted connection on: 127.0.0.1:63531
2025-01-31T23:37:50.592778Z  INFO roles_logic_sv2::handlers::template_distribution: Received NewTemplate with id: 1, is future: true
2025-01-31T23:37:50.592863Z  INFO roles_logic_sv2::handlers::template_distribution: Received SetNewPrevHash for template: 1
2025-01-31T23:37:50.593519Z  INFO translator_sv2::upstream_sv2::upstream: PROXY SERVER - ACCEPTING FROM UPSTREAM: 127.0.0.1:63531
2025-01-31T23:37:50.594599Z  INFO roles_logic_sv2::handlers::common: Received SetupConnection: version=2, flags=100
2025-01-31T23:37:50.594692Z  INFO roles_logic_sv2::handlers::common: Received SetupConnectionSuccess: version=2, flags=100
2025-01-31T23:37:50.594749Z  INFO translator_sv2: Connected to Upstream!
2025-01-31T23:37:50.594901Z  INFO roles_logic_sv2::handlers::mining: Received OpenExtendedMiningChannel from: ABC with id: 0
2025-01-31T23:37:50.595009Z  INFO roles_logic_sv2::handlers::mining: Received OpenExtendedMiningChannelSuccess with request id: 0 and channel id: 1
2025-01-31T23:37:50.595031Z  INFO translator_sv2::upstream_sv2::upstream: Up: Successfully Opened Extended Mining Channel
2025-01-31T23:37:50.595048Z  INFO roles_logic_sv2::handlers::mining: Received new extended mining job for channel id: 0 with job id: 1 is_future: true
2025-01-31T23:37:50.595054Z  INFO roles_logic_sv2::handlers::mining: Received SetNewPrevHash channel id: 1, job id: 1
2025-01-31T23:37:51.595556Z  INFO translator_sv2::downstream_sv1::downstream: PROXY SERVER - ACCEPTING FROM DOWNSTREAM: 127.0.0.1:63536
2025-01-31T23:37:51.595729Z  INFO translator_sv2::downstream_sv1::downstream: PROXY SERVER - ACCEPTING FROM DOWNSTREAM: 127.0.0.1:63537
2025-01-31T23:37:53.597949Z ERROR translator_sv2::status: Error: Io(Custom { kind: ConnectionAborted, error: "Connection closed by client" })
2025-01-31T23:37:53.598158Z  WARN translator_sv2::downstream_sv1::downstream: Downstream: Shutting down sv1 downstream reader
2025-01-31T23:37:53.598258Z  INFO translator_sv2: HEALTHY message: I/O error: `Custom { kind: ConnectionAborted, error: "Connection closed by client" }
2025-01-31T23:37:53.598286Z  INFO translator_sv2: Shutting down gracefully...
2025-01-31T23:37:53.598369Z  WARN translator_sv2::downstream_sv1::downstream: Downstream: Shutting down sv1 downstream writer: 127.0.0.1:63536
2025-01-31T23:37:55.780389Z ERROR network_helpers_sv2::noise_connection_async_std: Shutting down noise stream reader! Kind(
    UnexpectedEof,
)
test tproxy_survives_downstream_disconnect ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 1 filtered out; finished in 9.76s

@plebhash
Copy link
Collaborator

@Shourya742 any theories on the comment above?

my instinct is that assert!(tokio::net::TcpListener::bind(tproxy_addr).await.is_err()); is not a very reliable way to assert that tProxy really died

@Shourya742
Copy link
Contributor

@Shourya742 any theories on the comment above?

my instinct is that assert!(tokio::net::TcpListener::bind(tproxy_addr).await.is_err()); is not a very reliable way to assert that tProxy really died

Can you cherry pick translator commits from this #1360

@plebhash
Copy link
Collaborator

plebhash commented Feb 1, 2025

@Shourya742 any theories on the comment above?
my instinct is that assert!(tokio::net::TcpListener::bind(tproxy_addr).await.is_err()); is not a very reliable way to assert that tProxy really died

Can you cherry pick translator commits from this #1360

ok saw the expected behavior now:

    Finished test [optimized + debuginfo] target(s) in 5.10s
     Running tests/translator_integration.rs (/Users/plebhash/develop/stratum/roles/target/debug/deps/translator_integration-7a568a53c1b4f7a0)

running 1 test
2025-02-01T01:06:32.703516Z  INFO pool_sv2::template_receiver: Connected to template distribution server at 127.0.0.1:50457
2025-02-01T01:06:32.721033Z  INFO roles_logic_sv2::handlers::common: Received SetupConnectionSuccess: version=2, flags=0
2025-02-01T01:06:32.721223Z  INFO pool_sv2::mining_pool: PUB KEY: Ok([TxOut { value: 0, script_pubkey: Script(OP_0 OP_PUSHBYTES_20 ebe1b7dcc293ccaa0ee743a86f89df8258c208fc) }])
2025-02-01T01:06:32.721369Z  INFO pool_sv2::mining_pool: Starting up pool listener
2025-02-01T01:06:32.721939Z  INFO pool_sv2::mining_pool: Listening for encrypted connection on: 127.0.0.1:50464
2025-02-01T01:06:34.704919Z  INFO roles_logic_sv2::handlers::template_distribution: Received NewTemplate with id: 1, is future: true
2025-02-01T01:06:34.705003Z  INFO roles_logic_sv2::handlers::template_distribution: Received SetNewPrevHash for template: 1
2025-02-01T01:06:34.705476Z  INFO translator_sv2::upstream_sv2::upstream: PROXY SERVER - ACCEPTING FROM UPSTREAM: 127.0.0.1:50464
2025-02-01T01:06:34.706425Z  INFO roles_logic_sv2::handlers::common: Received SetupConnection: version=2, flags=100
2025-02-01T01:06:34.706506Z  INFO roles_logic_sv2::handlers::common: Received SetupConnectionSuccess: version=2, flags=100
2025-02-01T01:06:34.706561Z  INFO translator_sv2: Connected to Upstream!
2025-02-01T01:06:34.706711Z  INFO roles_logic_sv2::handlers::mining: Received OpenExtendedMiningChannel from: ABC with id: 0
2025-02-01T01:06:34.706802Z  INFO roles_logic_sv2::handlers::mining: Received OpenExtendedMiningChannelSuccess with request id: 0 and channel id: 1
2025-02-01T01:06:34.706817Z  INFO translator_sv2::upstream_sv2::upstream: Up: Successfully Opened Extended Mining Channel
2025-02-01T01:06:34.706832Z  INFO roles_logic_sv2::handlers::mining: Received new extended mining job for channel id: 0 with job id: 1 is_future: true
2025-02-01T01:06:34.706836Z  INFO roles_logic_sv2::handlers::mining: Received SetNewPrevHash channel id: 1, job id: 1
2025-02-01T01:06:35.708069Z  INFO translator_sv2::downstream_sv1::downstream: PROXY SERVER - ACCEPTING FROM DOWNSTREAM: 127.0.0.1:50469
2025-02-01T01:06:35.708200Z  INFO translator_sv2::downstream_sv1::downstream: PROXY SERVER - ACCEPTING FROM DOWNSTREAM: 127.0.0.1:50470
2025-02-01T01:06:37.709981Z ERROR translator_sv2::status: Error: Io(Custom { kind: ConnectionAborted, error: "Connection closed by client" })
2025-02-01T01:06:37.710168Z  WARN translator_sv2::downstream_sv1::downstream: Downstream: Shutting down sv1 downstream reader
2025-02-01T01:06:37.710256Z  INFO translator_sv2: HEALTHY message: I/O error: `Custom { kind: ConnectionAborted, error: "Connection closed by client" }
2025-02-01T01:06:37.710279Z  INFO translator_sv2: Shutting down gracefully...
2025-02-01T01:06:37.710289Z  WARN translator_sv2: Killed task: "notify_task"
2025-02-01T01:06:37.710299Z  WARN translator_sv2: Killed task: "socket_writer_task"
2025-02-01T01:06:37.710307Z  WARN translator_sv2: Killed task: "socket_reader_task"
2025-02-01T01:06:37.710315Z  WARN translator_sv2: Killed task: "notify_task"
2025-02-01T01:06:37.710323Z  WARN translator_sv2: Killed task: "socket_writer_task"
2025-02-01T01:06:37.710330Z  WARN translator_sv2: Killed task: "socket_reader_task"
2025-02-01T01:06:37.710340Z  WARN translator_sv2: Killed task: "accept_connections"
2025-02-01T01:06:37.710348Z  WARN translator_sv2: Killed task: "handle_downstream_message"
2025-02-01T01:06:37.711790Z  WARN translator_sv2: Killed task: "handle_new_extended_mining_job"
2025-02-01T01:06:37.711805Z  WARN translator_sv2: Killed task: "handle_new_prev_hash"
2025-02-01T01:06:37.711812Z  WARN translator_sv2: Killed task: "handle_submit"
2025-02-01T01:06:37.711818Z  WARN translator_sv2: Killed task: "parse_incoming"
2025-02-01T01:06:37.711824Z  WARN translator_sv2: Killed task: "start_diff_management"
2025-02-01T01:06:37.711830Z  WARN translator_sv2: Killed task: "init task"
2025-02-01T01:06:37.712211Z ERROR network_helpers_sv2::noise_connection_async_std: Shutting down noise stream reader! Kind(
    UnexpectedEof,
)
2025-02-01T01:06:37.712234Z ERROR network_helpers_sv2::noise_connection_tokio: Disconnected from client while reading : early eof - 127.0.0.1:50468
2025-02-01T01:06:37.713542Z ERROR pool_sv2::mining_pool: Downstream 1 disconnected
2025-02-01T01:06:37.713549Z  WARN pool_sv2::mining_pool: Downstream connection dropped
2025-02-01T01:06:37.713584Z ERROR network_helpers_sv2::noise_connection_tokio: Disconnecting from client due to error receiving: receiving from an empty and closed channel - 127.0.0.1:50468
thread 'tproxy_survives_downstream_disconnect' panicked at tests-integration/tests/translator_integration.rs:78:5:
assertion failed: tokio::net::TcpListener::bind(tproxy_addr).await.is_err()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test tproxy_survives_downstream_disconnect ... FAILED

@github-project-automation github-project-automation bot moved this from Todo 📝 to Done ✅ in SV2 Roadmap 🛣️ Feb 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
translator proxy SV1>SV2 translator proxy issues
Projects
Status: Done ✅
Development

Successfully merging a pull request may close this issue.

3 participants