Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [ir] a simple for loop in function fails #1109

Closed
archibate opened this issue Jun 2, 2020 · 7 comments · Fixed by #1224
Closed

[Bug] [ir] a simple for loop in function fails #1109

archibate opened this issue Jun 2, 2020 · 7 comments · Fixed by #1224
Assignees
Labels
ir IR related issues potential bug Something that looks like a bug but not yet confirmed welcome contribution
Milestone

Comments

@archibate
Copy link
Collaborator

Describe the bug

import taichi as ti

ti.init(print_ir=True)

m = ti.var(ti.f32, 3)
x = ti.var(ti.f32, ())

@ti.func
def func(a):
  for j in range(1):
    a = a

@ti.kernel
def kern1():
  a = x[None]
  for i in m:
    func(a)

@ti.kernel
def kern2():
  a = x[None]
  for i in m:
    for j in range(1):
      a = a


kern1() # Error
kern2() # OK

Log/Screenshots

[Taichi] mode=release
[Taichi] version 0.6.7, supported archs: [cpu, cuda, opengl], commit ca4d9dda, python 3.8.2
[I 06/02/20 18:33:39.403] [compile_to_offloads.cpp:operator()@21] Initial IR:
kernel {
  $0 = alloca @tmp4
  @tmp4 = gbl load #@tmp2[]
  $2 : for @tmp5 where S2place_f32 active {
    $3 = alloca @tmp7
    @tmp7 = @tmp4
    $5 : for @tmp8 in range((cast_value<int32> 0), (cast_value<int32> 1)) {
      @tmp7 = @tmp7
    }
  }
}
[I 06/02/20 18:33:39.404] [compile_to_offloads.cpp:operator()@21] Lowered:
kernel {
  $0 = alloca
  <f32 x1> $1 = global ptr [S4place_f32], index [] activate=true
  $2 = global load $1
  $3 : local store [$0 <- $2]
  $4 : for where S1dense active, step 1 {
    $5 = loop $4 index 0
    $6 = alloca
    $7 = local load [ [$0[0]]]
    $8 : local store [$6 <- $7]
    <i32 x1> $9 = const [0]
    $10 = cast_value<i32> $9
    <i32 x1> $11 = const [1]
    $12 = cast_value<i32> $11
    $13 : for in range($10, $12, step 1) {
      $14 = loop $13 index 0
      $15 = local load [ [$6[0]]]
      $16 : local store [$6 <- $15]
    }
  }
}
[I 06/02/20 18:33:39.404] [compile_to_offloads.cpp:operator()@21] Typechecked:
kernel {
  <f32 x1> $0 = alloca
  <f32*x1> $1 = global ptr [S4place_f32], index [] activate=true
  <f32 x1> $2 = global load $1
  <f32 x1> $3 : local store [$0 <- $2]
  $4 : for where S1dense active, step 1 {
    <i32 x1> $5 = loop $4 index 0
    <f32 x1> $6 = alloca
    <f32 x1> $7 = local load [ [$0[0]]]
    <f32 x1> $8 : local store [$6 <- $7]
    <i32 x1> $9 = const [0]
    <i32 x1> $10 = cast_value<i32> $9
    <i32 x1> $11 = const [1]
    <i32 x1> $12 = cast_value<i32> $11
    $13 : for in range($10, $12, step 1) {
      <i32 x1> $14 = loop $13 index 0
      <f32 x1> $15 = local load [ [$6[0]]]
      <f32 x1> $16 : local store [$6 <- $15]
    }
  }
}
[I 06/02/20 18:33:39.404] [compile_to_offloads.cpp:operator()@21] Loop Vectorized:
kernel {
  <f32 x1> $0 = alloca
  <f32*x1> $1 = global ptr [S4place_f32], index [] activate=true
  <f32 x1> $2 = global load $1
  <f32 x1> $3 : local store [$0 <- $2]
  $4 : for where S1dense active, step 1 {
    <i32 x1> $5 = loop $4 index 0
    <f32 x1> $6 = alloca
    <f32 x1> $7 = local load [ [$0[0]]]
    <f32 x1> $8 : local store [$6 <- $7]
    <i32 x1> $9 = const [0]
    <i32 x1> $10 = cast_value<i32> $9
    <i32 x1> $11 = const [1]
    <i32 x1> $12 = cast_value<i32> $11
    $13 : for in range($10, $12, step 1) {
      <i32 x1> $14 = loop $13 index 0
      <f32 x1> $15 = local load [ [$6[0]]]
      <f32 x1> $16 : local store [$6 <- $15]
    }
  }
}
[I 06/02/20 18:33:39.404] [compile_to_offloads.cpp:operator()@21] Loop Split:
kernel {
  <f32 x1> $0 = alloca
  <f32*x1> $1 = global ptr [S4place_f32], index [] activate=true
  <f32 x1> $2 = global load $1
  <f32 x1> $3 : local store [$0 <- $2]
  $4 : for where S1dense active, step 1 {
    <i32 x1> $5 = loop $4 index 0
    <f32 x1> $6 = alloca
    <f32 x1> $7 = local load [ [$0[0]]]
    <f32 x1> $8 : local store [$6 <- $7]
    <i32 x1> $9 = const [0]
    <i32 x1> $10 = cast_value<i32> $9
    <i32 x1> $11 = const [1]
    <i32 x1> $12 = cast_value<i32> $11
    $13 : for in range($10, $12, step 1) {
      <i32 x1> $14 = loop $13 index 0
      <f32 x1> $15 = local load [ [$6[0]]]
      <f32 x1> $16 : local store [$6 <- $15]
    }
  }
}
[I 06/02/20 18:33:39.404] [compile_to_offloads.cpp:operator()@21] Simplified I:
kernel {
  <f32 x1> $0 = alloca
  <f32*x1> $1 = global ptr [S4place_f32], index [] activate=true
  <f32 x1> $2 = global load $1
  <f32 x1> $3 : local store [$0 <- $2]
  $4 : for where S1dense active, step 1 {
    <i32 x1> $5 = loop $4 index 0
    <f32 x1> $6 = alloca
    <f32 x1> $7 = local load [ [$0[0]]]
    <f32 x1> $8 : local store [$6 <- $7]
    <i32 x1> $9 = const [0]
    <i32 x1> $10 = const [1]
    $11 : for in range($9, $10, step 1) {
      <i32 x1> $12 = loop $11 index 0
      <f32 x1> $13 = local load [ [$6[0]]]
      <f32 x1> $14 : local store [$6 <- $13]
    }
  }
}
[I 06/02/20 18:33:39.404] [compile_to_offloads.cpp:operator()@21] Dense struct-for demoted:
kernel {
  <f32 x1> $0 = alloca
  <f32*x1> $1 = global ptr [S4place_f32], index [] activate=true
  <f32 x1> $2 = global load $1
  <f32 x1> $3 : local store [$0 <- $2]
  <i32 x1> $4 = const [0]
  <i32 x1> $5 = const [4]
  $6 : for in range($4, $5, step 1) {
    <i32 x1> $7 = const [0]
    <i32 x1> $8 = loop $6 index 0
    <i32 x1> $9 = const [-1]
    <i32 x1> $10 = bit_extract($8 + 0, 0~2)
    <i32 x1> $11 = const [1]
    <i32 x1> $12 = mul $10 $11
    <i32 x1> $13 = add $7 $12
    <i32 x1> $14 = const [3]
    <i32 x1> $15 = cmp_lt $13 $14
    <i32 x1> $16 = bit_and $9 $15
    <i32 x1> $17 = alloca
    <i32 x1> $18 : local store [$17 <- $13]
    $19 : if $16 {
      <i32 x1> $20 = local load [ [$17[0]]]
      <f32 x1> $21 = alloca
      <f32 x1> $22 = local load [ [$0[0]]]
      <f32 x1> $23 : local store [$21 <- $22]
      <i32 x1> $24 = const [0]
      <i32 x1> $25 = const [1]
      $26 : for in range($24, $25, step 1) {
        <i32 x1> $27 = loop $26 index 0
        <f32 x1> $28 = local load [ [$21[0]]]
        <f32 x1> $29 : local store [$21 <- $28]
      }
    }
  }
}
[I 06/02/20 18:33:39.404] [compile_to_offloads.cpp:operator()@21] Constant extracted:
kernel {
  <i32 x1> $0 = const [1]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [3]
  <i32 x1> $3 = const [1]
  <i32 x1> $4 = const [-1]
  <i32 x1> $5 = const [0]
  <f32 x1> $6 = alloca
  <f32*x1> $7 = global ptr [S4place_f32], index [] activate=true
  <f32 x1> $8 = global load $7
  <f32 x1> $9 : local store [$6 <- $8]
  <i32 x1> $10 = const [0]
  <i32 x1> $11 = const [4]
  $12 : for in range($10, $11, step 1) {
    <i32 x1> $13 = loop $12 index 0
    <i32 x1> $14 = bit_extract($13 + 0, 0~2)
    <i32 x1> $15 = mul $14 $3
    <i32 x1> $16 = add $5 $15
    <i32 x1> $17 = cmp_lt $16 $2
    <i32 x1> $18 = bit_and $4 $17
    <i32 x1> $19 = alloca
    <i32 x1> $20 : local store [$19 <- $16]
    $21 : if $18 {
      <i32 x1> $22 = local load [ [$19[0]]]
      <f32 x1> $23 = alloca
      <f32 x1> $24 = local load [ [$6[0]]]
      <f32 x1> $25 : local store [$23 <- $24]
      $26 : for in range($1, $0, step 1) {
        <i32 x1> $27 = loop $26 index 0
        <f32 x1> $28 = local load [ [$23[0]]]
        <f32 x1> $29 : local store [$23 <- $28]
      }
    }
  }
}
[I 06/02/20 18:33:39.405] [compile_to_offloads.cpp:operator()@21] Store forwarded:
kernel {
  <i32 x1> $0 = const [1]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [3]
  <i32 x1> $3 = const [1]
  <i32 x1> $4 = const [-1]
  <i32 x1> $5 = const [0]
  <f32*x1> $6 = global ptr [S4place_f32], index [] activate=true
  <f32 x1> $7 = global load $6
  <i32 x1> $8 = const [0]
  <i32 x1> $9 = const [4]
  $10 : for in range($8, $9, step 1) {
    <i32 x1> $11 = loop $10 index 0
    <i32 x1> $12 = bit_extract($11 + 0, 0~2)
    <i32 x1> $13 = mul $12 $3
    <i32 x1> $14 = add $5 $13
    <i32 x1> $15 = cmp_lt $14 $2
    <i32 x1> $16 = bit_and $4 $15
    $17 : if $16 {
      <f32 x1> $18 = alloca
      <f32 x1> $19 : local store [$18 <- $7]
      $20 : for in range($1, $0, step 1) {
        <i32 x1> $21 = loop $20 index 0
        <f32 x1> $22 = local load [ [$18[0]]]
        <f32 x1> $23 : local store [$18 <- $22]
      }
    }
  }
}
[I 06/02/20 18:33:39.405] [compile_to_offloads.cpp:operator()@21] Access lowered:
kernel {
  <i32 x1> $0 = const [1]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [3]
  <i32 x1> $3 = const [1]
  <i32 x1> $4 = const [-1]
  <i32 x1> $5 = const [0]
  <f32*x1> $6 = global ptr [S4place_f32], index [] activate=true
  <gen*x1> $7 = get root
  <i32 x1> $8 = linearized(ind {}, stride {})
  <gen*x1> $9 = [S0root][root]::lookup($7, $8) activate = false
  <gen*x1> $10 = get child [S0root->S3dense] $9
  <i32 x1> $11 = linearized(ind {}, stride {})
  <gen*x1> $12 = [S3dense][dense]::lookup($10, $11) activate = false
  <f32*x1> $13 = get child [S3dense->S4place_f32] $12
  <f32 x1> $14 = shuffle $13[0]
  <f32 x1> $15 = global load $14
  <i32 x1> $16 = const [0]
  <i32 x1> $17 = const [4]
  $18 : for in range($16, $17, step 1) {
    <i32 x1> $19 = loop $18 index 0
    <i32 x1> $20 = bit_extract($19 + 0, 0~2)
    <i32 x1> $21 = mul $20 $3
    <i32 x1> $22 = add $5 $21
    <i32 x1> $23 = cmp_lt $22 $2
    <i32 x1> $24 = bit_and $4 $23
    $25 : if $24 {
      <f32 x1> $26 = alloca
      <f32 x1> $27 : local store [$26 <- $15]
      $28 : for in range($1, $0, step 1) {
        <i32 x1> $29 = loop $28 index 0
        <f32 x1> $30 = local load [ [$26[0]]]
        <f32 x1> $31 : local store [$26 <- $30]
      }
    }
  }
}
[I 06/02/20 18:33:39.405] [compile_to_offloads.cpp:operator()@21] DIE:
kernel {
  <i32 x1> $0 = const [1]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [3]
  <i32 x1> $3 = const [1]
  <i32 x1> $4 = const [-1]
  <i32 x1> $5 = const [0]
  <f32*x1> $6 = global ptr [S4place_f32], index [] activate=true
  <gen*x1> $7 = get root
  <i32 x1> $8 = linearized(ind {}, stride {})
  <gen*x1> $9 = [S0root][root]::lookup($7, $8) activate = false
  <gen*x1> $10 = get child [S0root->S3dense] $9
  <i32 x1> $11 = linearized(ind {}, stride {})
  <gen*x1> $12 = [S3dense][dense]::lookup($10, $11) activate = false
  <f32*x1> $13 = get child [S3dense->S4place_f32] $12
  <f32 x1> $14 = shuffle $13[0]
  <f32 x1> $15 = global load $14
  <i32 x1> $16 = const [0]
  <i32 x1> $17 = const [4]
  $18 : for in range($16, $17, step 1) {
    <i32 x1> $19 = loop $18 index 0
    <i32 x1> $20 = bit_extract($19 + 0, 0~2)
    <i32 x1> $21 = mul $20 $3
    <i32 x1> $22 = add $5 $21
    <i32 x1> $23 = cmp_lt $22 $2
    <i32 x1> $24 = bit_and $4 $23
    $25 : if $24 {
      <f32 x1> $26 = alloca
      <f32 x1> $27 : local store [$26 <- $15]
      $28 : for in range($1, $0, step 1) {
        <f32 x1> $29 = local load [ [$26[0]]]
        <f32 x1> $30 : local store [$26 <- $29]
      }
    }
  }
}
[I 06/02/20 18:33:39.405] [compile_to_offloads.cpp:operator()@21] Simplified II:
kernel {
  <i32 x1> $0 = const [1]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [3]
  <f32*x1> $3 = global ptr [S4place_f32], index [] activate=true
  <gen*x1> $4 = get root
  <gen*x1> $5 = [S0root][root]::lookup($4, $1) activate = false
  <gen*x1> $6 = get child [S0root->S3dense] $5
  <gen*x1> $7 = [S3dense][dense]::lookup($6, $1) activate = false
  <f32*x1> $8 = get child [S3dense->S4place_f32] $7
  <f32 x1> $9 = global load $8
  <i32 x1> $10 = const [4]
  $11 : for in range($1, $10, step 1) {
    <i32 x1> $12 = loop $11 index 0
    <i32 x1> $13 = bit_extract($12 + 0, 0~2)
    <i32 x1> $14 = cmp_lt $13 $2
    $15 : if $14 {
      <f32 x1> $16 = alloca
      <f32 x1> $17 : local store [$16 <- $9]
      $18 : for in range($1, $0, step 1) {
        <f32 x1> $19 = local load [ [$16[0]]]
        <f32 x1> $20 : local store [$16 <- $19]
      }
    }
  }
}
[I 06/02/20 18:33:39.405] [compile_to_offloads.cpp:operator()@21] Access flagged:
kernel {
  <i32 x1> $0 = const [1]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [3]
  <f32*x1> $3 = global ptr [S4place_f32], index [] activate=false
  <gen*x1> $4 = get root
  <gen*x1> $5 = [S0root][root]::lookup($4, $1) activate = false
  <gen*x1> $6 = get child [S0root->S3dense] $5
  <gen*x1> $7 = [S3dense][dense]::lookup($6, $1) activate = false
  <f32*x1> $8 = get child [S3dense->S4place_f32] $7
  <f32 x1> $9 = global load $8
  <i32 x1> $10 = const [4]
  $11 : for in range($1, $10, step 1) {
    <i32 x1> $12 = loop $11 index 0
    <i32 x1> $13 = bit_extract($12 + 0, 0~2)
    <i32 x1> $14 = cmp_lt $13 $2
    $15 : if $14 {
      <f32 x1> $16 = alloca
      <f32 x1> $17 : local store [$16 <- $9]
      $18 : for in range($1, $0, step 1) {
        <f32 x1> $19 = local load [ [$16[0]]]
        <f32 x1> $20 : local store [$16 <- $19]
      }
    }
  }
}
[I 06/02/20 18:33:39.405] [compile_to_offloads.cpp:operator()@21] Constant folded:
kernel {
  <i32 x1> $0 = const [1]
  <i32 x1> $1 = const [0]
  <i32 x1> $2 = const [3]
  <f32*x1> $3 = global ptr [S4place_f32], index [] activate=false
  <gen*x1> $4 = get root
  <gen*x1> $5 = [S0root][root]::lookup($4, $1) activate = false
  <gen*x1> $6 = get child [S0root->S3dense] $5
  <gen*x1> $7 = [S3dense][dense]::lookup($6, $1) activate = false
  <f32*x1> $8 = get child [S3dense->S4place_f32] $7
  <f32 x1> $9 = global load $8
  <i32 x1> $10 = const [4]
  $11 : for in range($1, $10, step 1) {
    <i32 x1> $12 = loop $11 index 0
    <i32 x1> $13 = bit_extract($12 + 0, 0~2)
    <i32 x1> $14 = cmp_lt $13 $2
    $15 : if $14 {
      <f32 x1> $16 = alloca
      <f32 x1> $17 : local store [$16 <- $9]
      $18 : for in range($1, $0, step 1) {
        <f32 x1> $19 = local load [ [$16[0]]]
        <f32 x1> $20 : local store [$16 <- $19]
      }
    }
  }
}
[I 06/02/20 18:33:39.406] [compile_to_offloads.cpp:operator()@21] Offloaded:
kernel {
  $0 = offloaded  {
    <i32 x1> $1 = const [1]
    <i32 x1> $2 = const [0]
    <i32 x1> $3 = const [3]
    <f32*x1> $4 = global ptr [S4place_f32], index [] activate=false
    <gen*x1> $5 = get root
    <gen*x1> $6 = [S0root][root]::lookup($5, $2) activate = false
    <gen*x1> $7 = get child [S0root->S3dense] $6
    <gen*x1> $8 = [S3dense][dense]::lookup($7, $2) activate = false
    <f32*x1> $9 = get child [S3dense->S4place_f32] $8
    <f32 x1> $10 = global load $9
    <i32 x1> $11 = const [4]
  }
  $12 = offloaded range_for(0, 4) block_dim=adaptive {
    <i32 x1> $13 = loop $12 index 0
    <i32 x1> $14 = bit_extract($13 + 0, 0~2)
    <i32 x1> $15 = const [3]
    <i32 x1> $16 = cmp_lt $14 $15
    $17 : if $16 {
      <f32 x1> $18 = alloca
      <f32 x1> $19 : local store [$18 <- $10]
      <i32 x1> $20 = const [0]
      <i32 x1> $21 = const [1]
      $22 : for in range($20, $21, step 1) {
        <f32 x1> $23 = local load [ [$18[0]]]
        <f32 x1> $24 : local store [$18 <- $23]
      }
    }
  }
}
[E 06/02/20 18:33:39.406] [verify.cpp:basic_verify@39] stmt 19 cannot have operand 10.


***********************************
* Taichi Compiler Stack Traceback *                                                          
***********************************                                                          
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::Logger::error(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool)                                                                                   
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::IRVerifier::basic_verify(taichi::lang::Stmt*)                                              
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::IRVerifier::visit(taichi::lang::LocalStoreStmt*)                                           
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::IRVerifier::visit(taichi::lang::Block*)                                                    
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::BasicStmtVisitor::visit(taichi::lang::IfStmt*)                                             
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::IRVerifier::visit(taichi::lang::Block*)                                                    
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::IRVerifier::visit(taichi::lang::Block*)                                                    
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::irpass::analysis::verify(taichi::lang::IRNode*)                                            
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::irpass::compile_to_offloads(taichi::lang::IRNode*, taichi::lang::CompileConfig const&, bool, bool, bool, bool, bool)                                                                    
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::Kernel::lower(bool)                                                                        
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::Program::compile(taichi::lang::Kernel&)                                                    
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::Kernel::compile()                                                                          
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so: taichi::lang::Kernel::operator()()                                                                       
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so(+0x6ec234) [0x7f989ebd6234]                                                                               
/home/bate/.local/lib/python3.8/site-packages/taichi/core/../lib/taichi_core.so(+0x655d50) [0x7f989eb3fd50]                                                                               
/usr/lib/libpython3.8.so.1.0: PyCFunction_Call                                               
/usr/lib/libpython3.8.so.1.0: _PyObject_MakeTpCall                                           
/usr/lib/libpython3.8.so.1.0(+0xfeb1d) [0x7f98aeaaeb1d]                                      
/usr/lib/libpython3.8.so.1.0: PyObject_Call                                                  
/usr/lib/libpython3.8.so.1.0(+0xb121b) [0x7f98aea6121b]                                      
/usr/lib/libpython3.8.so.1.0: _PyObject_MakeTpCall                                           
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault                                       
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName                                       
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall                                         
/usr/lib/libpython3.8.so.1.0: PyObject_Call                                                  
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault                                       
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName                                       
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall                                         
/usr/lib/libpython3.8.so.1.0: _PyObject_FastCallDict                                         
/usr/lib/libpython3.8.so.1.0: _PyObject_Call_Prepend                                         
/usr/lib/libpython3.8.so.1.0(+0x23d0e9) [0x7f98aebed0e9]                                     
/usr/lib/libpython3.8.so.1.0: PyObject_Call                                                  
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault                                       
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName                                       
/usr/lib/libpython3.8.so.1.0: _PyFunction_Vectorcall                                         
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalFrameDefault                                       
/usr/lib/libpython3.8.so.1.0: _PyEval_EvalCodeWithName                                       
/usr/lib/libpython3.8.so.1.0: PyEval_EvalCode                                                
/usr/lib/libpython3.8.so.1.0(+0x2668c8) [0x7f98aec168c8]
/usr/lib/libpython3.8.so.1.0(+0x26aba3) [0x7f98aec1aba3]
/usr/lib/libpython3.8.so.1.0: PyRun_FileExFlags
/usr/lib/libpython3.8.so.1.0: PyRun_SimpleFileExFlags
/usr/lib/libpython3.8.so.1.0: Py_RunMain
/usr/lib/libpython3.8.so.1.0: Py_BytesMain
/usr/lib/libc.so.6: __libc_start_main
python(_start+0x2e) [0x55a034ce805e]
Traceback (most recent call last):
  File "bug.py", line 27, in <module>
    kern1()
  File "/home/bate/.local/lib/python3.8/site-packages/taichi/lang/kernel.py", line 533, in wrapped
    return primal(*args, **kwargs)
  File "/home/bate/.local/lib/python3.8/site-packages/taichi/lang/kernel.py", line 464, in __call__
    return self.compiled_functions[key](*args)
  File "/home/bate/.local/lib/python3.8/site-packages/taichi/lang/kernel.py", line 428, in func__
    t_kernel()
RuntimeError: [verify.cpp:basic_verify@39] stmt 19 cannot have operand 10.

To Reproduce
Run the above code.

If you have local commits (e.g. compile fixes before you reproduce the bug), please make sure you first make a PR to fix the build errors and then report the bug.

@archibate archibate added the potential bug Something that looks like a bug but not yet confirmed label Jun 2, 2020
@archibate
Copy link
Collaborator Author

If it's my syntax error, can we improve the error message? I can't do my homework now.

@archibate archibate changed the title [Bug] a simple for loop in function fails [Bug] [ir] a simple for loop in function fails Jun 2, 2020
@archibate archibate added ir IR related issues welcome contribution labels Jun 2, 2020
@archibate
Copy link
Collaborator Author

@xumingkuan Seems function arguments can't be gtmp?

kernel "kern1" {
  $0 = offloaded  {
    <i32 x1> $1 = const [1]
    <i32 x1> $2 = const [0]
    <i32 x1> $3 = const [3]
    <f32*x1> $4 = global ptr [S4place_f32], index [] activate=false
    <gen*x1> $5 = get root
    <gen*x1> $6 = [S0root][root]::lookup($5, $2) activate = false
    <gen*x1> $7 = get child [S0root->S3dense] $6
    <gen*x1> $8 = [S3dense][dense]::lookup($7, $2) activate = false
    <f32*x1> $9 = get child [S3dense->S4place_f32] $8
    <f32 x1> $10 = global load $9
    <i32 x1> $11 = const [4]
  }
  $12 = offloaded range_for(0, 4) block_dim=adaptive {
    <i32 x1> $13 = loop $12 index 0
    <i32 x1> $14 = bit_extract($13 + 0, 0~2)
    <i32 x1> $15 = const [3]
    <i32 x1> $16 = cmp_lt $14 $15
    $17 : if $16 {
      <f32 x1> $18 = alloca
      <f32 x1> $19 : local store [$18 <- $10]
      <i32 x1> $20 = const [0]
      <i32 x1> $21 = const [1]
      $22 : for in range($20, $21, step 1) {
        <f32 x1> $23 = local load [ [$18[0]]]
        <f32 x1> $24 : local store [$18 <- $23]
      }
    }
  }
}

kernel "kern2" {
  $0 = offloaded  {
    <i32 x1> $1 = const [1]
    <i32 x1> $2 = const [0]
    <i32 x1> $3 = const [3]
    <f32*x1> $4 = global tmp var (offset = 0 B)
    <f32 x1> $5 = const [0.0]
    <f32*x1> $6 : global store [$4 <- $5]
    <f32*x1> $7 = global ptr [S4place_f32], index [] activate=false
    <gen*x1> $8 = get root
    <gen*x1> $9 = [S0root][root]::lookup($8, $2) activate = false
    <gen*x1> $10 = get child [S0root->S3dense] $9
    <gen*x1> $11 = [S3dense][dense]::lookup($10, $2) activate = false
    <f32*x1> $12 = get child [S3dense->S4place_f32] $11
    <f32 x1> $13 = global load $12
    <f32*x1> $14 = global tmp var (offset = 0 B)
    <f32*x1> $15 : global store [$14 <- $13]
    <i32 x1> $16 = const [4]
  }
  $17 = offloaded range_for(0, 4) block_dim=adaptive {
    <i32 x1> $18 = loop $17 index 0
    <i32 x1> $19 = bit_extract($18 + 0, 0~2)
    <i32 x1> $20 = const [3]
    <i32 x1> $21 = cmp_lt $19 $20
    $22 : if $21 {
      <i32 x1> $23 = const [0]
      <i32 x1> $24 = const [1]
      $25 : for in range($23, $24, step 1) {
        <f32*x1> $26 = global tmp var (offset = 0 B)
        <f32 x1> $27 = global load $26
        <f32*x1> $28 = global tmp var (offset = 0 B)
        <f32*x1> $29 : global store [$28 <- $27]
      }
    }
  }
}

@archibate
Copy link
Collaborator Author

new mrp (minimal-reproduceable):

import taichi as ti

ti.init(print_ir=True, print_preprocessed=True)

m = ti.var(ti.f32, 3)
x = ti.var(ti.f32, ())

@ti.kernel
def kern():
  a = x[None]
  for i in m:
    b = a
    for j in range(1):
      b = b

kern()

@archibate
Copy link
Collaborator Author

Could we have some unit test for Store Forwarded? It caused the error.

@xumingkuan
Copy link
Contributor

Thanks for proposing this! Looks like a bug in offload. Will investigate later.

@zdxpan
Copy link

zdxpan commented Jun 18, 2020

(base) ➜  ~ python  difftaichi/examples/billiards.py
[Taichi] mode=release
[Taichi] version 0.6.11, supported archs: [cpu, metal], commit 762aca58, python 3.7.3
difftaichi/examples/billiards.py:42: PendingDeprecationWarning: @ti.layout will be deprecated in the future, use ti.root directly to specify data layout anytime before the data structure materializes.
  @ti.layout
[E 06/18/20 17:28:09.242] [verify.cpp:basic_verify@39] stmt 5249 cannot have operand 4663.



                            * Taichi Core - Stack Traceback *
==========================================================================================
|                       Module |  Offset | Function                                      |
|----------------------------------------------------------------------------------------|
*               taichi_core.so |     110 | taichi::Logger::error(std::__1::basic_string< |
                                         | char, std::__1::char_traits<char>, std::__1:: |
                                         | allocator<char> > const&, bool)               |
*               taichi_core.so |    1136 | taichi::lang::IRVerifier::basic_verify(taichi |
                                         | ::lang::Stmt*)                                |
*               taichi_core.so |      25 | taichi::lang::IRVerifier::visit(taichi::lang: |
                                         | :LocalLoadStmt*)                              |
*               taichi_core.so |     139 | taichi::lang::IRVerifier::visit(taichi::lang: |
                                         | :Block*)                                      |
*               taichi_core.so |      42 | taichi::lang::BasicStmtVisitor::visit(taichi: |
                                         | :lang::IfStmt*)                               |
*               taichi_core.so |     139 | taichi::lang::IRVerifier::visit(taichi::lang: |
                                         | :Block*)                                      |
*               taichi_core.so |     139 | taichi::lang::IRVerifier::visit(taichi::lang: |
                                         | :Block*)                                      |
*               taichi_core.so |     139 | taichi::lang::IRVerifier::visit(taichi::lang: |
                                         | :Block*)                                      |
*               taichi_core.so |     111 | taichi::lang::irpass::analysis::verify(taichi |
                                         | ::lang::IRNode*)                              |
*               taichi_core.so |    1007 | taichi::lang::irpass::compile_to_offloads(tai |
                                         | chi::lang::IRNode*, taichi::lang::CompileConf |
                                         | ig const&, bool, bool, bool, bool, bool)      |
*               taichi_core.so |     280 | taichi::lang::Kernel::lower(bool)             |
*               taichi_core.so |     211 | taichi::lang::Program::compile(taichi::lang:: |
                                         | Kernel&)                                      |
*               taichi_core.so |      62 | taichi::lang::Kernel::compile()               |
*               taichi_core.so |     132 | taichi::lang::Kernel::operator()()            |
*               taichi_core.so |     103 | void pybind11::cpp_function::initialize<taich |
                                         | i::export_lang(pybind11::module&)::$_9, void, |
                                         |  taichi::lang::Kernel*, pybind11::name, pybin |
                                         | d11::is_method, pybind11::sibling>(taichi::ex |
                                         | port_lang(pybind11::module&)::$_9&&, void (*) |
                                         | (taichi::lang::Kernel*), pybind11::name const |
                                         | &, pybind11::is_method const&, pybind11::sibl |
                                         | ing const&)::'lambda'(pybind11::detail::funct |
                                         | ion_call&)::__invoke(pybind11::detail::functi |
                                         | on_call&)                                     |
*               taichi_core.so |    4075 | pybind11::cpp_function::dispatcher(_object*,  |
                                         | _object*, _object*)                           |
*                       python |     437 | (null)                                        |
*                       python |     111 | (null)                                        |
*                       python |     130 | (null)                                        |
*                       python |     130 | (null)                                        |
*                       python |     370 | (null)                                        |
*                       python |     179 | (null)                                        |
*                       python |     453 | (null)                                        |
*                       python |   46151 | (null)                                        |
*                       python |     414 | (null)                                        |
*                       python |     231 | (null)                                        |
*                       python |   46712 | (null)                                        |
*                       python |     414 | (null)                                        |
*                       python |     231 | (null)                                        |
*                       python |     189 | (null)                                        |
*                       python |     130 | (null)                                        |
*                       python |   46712 | (null)                                        |
*                       python |     117 | (null)                                        |
*                       python |     183 | (null)                                        |
*                       python |   45942 | (null)                                        |
*                       python |     117 | (null)                                        |
*                       python |     130 | (null)                                        |
*                       python |     245 | (null)                                        |
*                       python |   44976 | (null)                                        |
*                       python |     117 | (null)                                        |
*                       python |     183 | (null)                                        |
*                       python |   46151 | (null)                                        |
*                       python |     414 | (null)                                        |
*                       python |     256 | (null)                                        |
*                       python |     391 | (null)                                        |
*                       python |    9663 | (null)                                        |
*                       python |     125 | (null)                                        |
*                libdyld.dylib |       1 | (null)                                        |
*                          ??? |       2 | (null)                                        |
==========================================================================================


Internal Error occurred, check this page for possible solutions:
https://taichi.readthedocs.io/en/stable/install.html#troubleshooting
Traceback (most recent call last):
  File "difftaichi/examples/billiards.py", line 216, in <module>
    optimize()
  File "difftaichi/examples/billiards.py", line 173, in optimize
    forward(visualize=True, output=output)
  File "/Users/zhoudaoxian/anaconda3/lib/python3.7/site-packages/taichi/lang/tape.py", line 18, in __exit__
    self.grad()
  File "/Users/zhoudaoxian/anaconda3/lib/python3.7/site-packages/taichi/lang/tape.py", line 27, in grad
    func.grad(*args)
  File "/Users/zhoudaoxian/anaconda3/lib/python3.7/site-packages/taichi/lang/kernel.py", line 459, in __call__
    return self.compiled_functions[key](*args)
  File "/Users/zhoudaoxian/anaconda3/lib/python3.7/site-packages/taichi/lang/kernel.py", line 423, in func__
    t_kernel()
RuntimeError: [verify.cpp:basic_verify@39] stmt 5249 cannot have operand 4663.

@Shirely351
Copy link

你好,我遇到了一个很奇怪的问题,就是在使用print_ir属性时,生成的中间IR表达式总是会出现File"/home/pycharmproject/pythonproject/foo.py",line10,in foo a[i]=i这个问题,方便问一下怎么解决么?
源码:
import taichi as ti

ti.init(print_preprocessed_ir=True)

ti.init(print_accessor_ir=True)

ti.init(print_ir=True)
a = ti.field(ti.i32,shape =(10))
@ti.kernel
def foo():
for i in range(10):
a[i]=i;
foo()

生成的错误:
[W 07/27/24 23:18:16.513 66498] [offline_cache.cpp:disable_offline_cache_if_needed@16] Disable offline_cache because print_preprocessed_ir or print_ir or print_accessor_ir is enabled
[I 07/27/24 23:18:16.632 66498] [ir_printer.cpp:operator()@993] [foo_c76_0] Initial IR:
[I 07/27/24 23:18:16.632 66498] [ir_printer.cpp:operator()@993] [foo_c76_0] Lowered:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0] Immutable local vars eliminated:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0] Typechecked:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0] Bit Loop Vectorized:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0] Matrix ptr lowered:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] extract_constant:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] unreachable_code_elimination:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] binary_op_simplify:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] constant_fold:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] die:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] alg_simp:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] loop_invariant_code_motion:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] die:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] simplify:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] die:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] cfg_optimization:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] extract_constant:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] unreachable_code_elimination:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] binary_op_simplify:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] constant_fold:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] die:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] alg_simp:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] loop_invariant_code_motion:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] die:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] simplify:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] die:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] cfg_optimization:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0] Simplified I:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0] External ptr boundary processed:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0] Access flagged I:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] extract_constant:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] unreachable_code_elimination:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] binary_op_simplify:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] constant_fold:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] die:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] alg_simp:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] loop_invariant_code_motion:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] die:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] simplify:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] die:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] cfg_optimization:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0] Simplified II:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0] Offloaded:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0] Optimized by CFG:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0] Access flagged II:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] extract_constant:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] unreachable_code_elimination:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] binary_op_simplify:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] constant_fold:
[I 07/27/24 23:18:16.633 66498] [ir_printer.cpp:operator()@993] [foo_c76_0.simplify] die:
kernel {
$0 : for @tmp0 in range((cast_value 0), (cast_value 10)) block_dim=adaptive {
$1 = alloca @TMP1
File "/home/shangyuye/PycharmProjects/pythonProject/foo.py", line 10, in foo:
a[i]=i
^^^^^^
<*i32>@TMP1 = <*i32>@tmp0
File "/home/shangyuye/PycharmProjects/pythonProject/foo.py", line 10, in foo:
a[i]=i
^^^^^^
<*i32>#@tmp0 (snode=S2place)[<*i32>@tmp0] = <*i32>@TMP1
File "/home/shangyuye/PycharmProjects/pythonProject/foo.py", line 10, in foo:
a[i]=i
^^^^^^
}
File "/home/shangyuye/PycharmProjects/pythonProject/foo.py", line 9, in foo:
for i in range(10):
^^^^^^^^^^^^^^^^^^^
}
kernel {
$0 = const 0
$1 = cast_value $0
$2 = const 10
$3 = cast_value $2
$4 : for in range($1, $3) block_dim=adaptive {
<*i32> $5 = loop $4 index 0
File "/home/shangyuye/PycharmProjects/pythonProject/foo.py", line 10, in foo:
a[i]=i
^
<*i32> $6 = alloca
$7 : local store [$6 <- $5]
File "/home/shangyuye/PycharmProjects/pythonProject/foo.py", line 10, in foo:
a[i]=i
^^^^^^
$8 = local load [$6]
$9 = global ptr [S2place], index [$5] activate=true
File "/home/shangyuye/PycharmProjects/pythonProject/foo.py", line 10, in foo:
a[i]=i
^^^^
$10 : global store [$9 <- $8]
File "/home/shangyuye/PycharmProjects/pythonProject/foo.py", line 10, in foo:
a[i]=i
^^^^^^
}
}
kernel {
$0 = const 0
$1 = cast_value $0
$2 = const 10
$3 = cast_value $2
$4 : for in range($1, $3) block_dim=adaptive {
<*i32> $5 = loop $4 index 0
File "/home/shangyuye/PycharmProjects/pythonProject/foo.py", line 10, in foo:
a[i]=i
^
$6 = global ptr [S2place], index [$5] activate=true
File "/home/shangyuye/PycharmProjects/pythonProject/foo.py", line 10, in foo:
a[i]=i
^^^^
$7 : global store [$6 <- $5]
File "/home/shangyuye/PycharmProjects/pythonProject/foo.py", line 10, in foo:
a[i]=i
^^^^^^
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ir IR related issues potential bug Something that looks like a bug but not yet confirmed welcome contribution
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants