Skip to content

Releases: jina-ai/serve

🎉 Release v0.9.2

05 Jan 20:14
Compare
Choose a tag to compare

We are excited to release Jina 0.9.2. Jina is the easier way to do neural search in the cloud. Highlights of this release include:

  • Support for delete/update operations
  • Add native AsyncIO support and unlock native support for running Jina in Jupyter notebooks and IPython
  • Add MultiModalDocument as primitive types to support multimodal search in a Pythonic way
  • Refactor Pea and introduce Runtime to improve code readability and maintainability

Release 0.9.2

⬆️ Major Features and Improvements

Completeness

Click to see example

async_flow

from jina import AsyncFlow
with AsyncFlow().add(uses='_logforward') as f:
    await f.index_lines(lines=['hello', 'jina'], on_done=print)

Ease of Use

Click here for example code
from jina import Document, MultimodalDocument
chunk_img = Document(modality='dummy_image', embedding=np.random.rand(1, 4))
chunk_text = Document(modality='dummy_text', embedding=np.random.rand(1, 10))
multimodal_doc = MultimodalDocument(chunks=[chunk_img, chunk_text])
  • Introduce Runtime as a member of Pea, defined as "a procedure that blocks the main process once running, therefore must be put into a separated thread/process. The new architecture greatly improves the readability and maintainability of the code. #1426, #1473, #1487, #1539, #1577

⚠️ Breaking Changes

  • Introduce UniqueId, ChunkSet, DocumentSet, MatchSet; Remove add_chunk and add_match; Refactor Document with newly introduced classes. #1343
Click here for example code
0.8.0 0.9.2
from jina import Document()
with Document() as d:
    c = Document(id=f'1:0>16')
    d.chunks.append(c)

with Document() as d:
     c = d.chunks.append()
     c.id = f'1:0>16'    
from jina import Document()
with Document() as d:
    c = d.chunks.add_chunks()
    c.id = f'1:0>16'  
  • Refactor YAML file parsing backend from ruamel.yaml to pyyaml and introduce jina.jaml for parsing YAML files. The dependency on ruamel.yaml is deprecated. #1495, #1516, #1524, #1533, #1547, #1581

  • Add _merge_matches and _merge_chunks for merging messages in different ways. Remove _merge_all. #1406 #1418

  • PyClient renamed to Client for simplicity #1450

📗 Documentation

🐞 Bug Fixes and Other Changes

Flow

  • Fix issue terminating RemotePea #133
  • Refactor Pea closing logic #1379, #1398, #1457
  • Refactor peapods code base #1421
  • Add versioning for Flow YAML config files. Introduce method field for Flow YAML configurations. #1442
  • Add env filed for Flow and Pod YAML configuration so that shared environment variables can be set. #1446, #1448
  • Rename Flow output argument to on_done. #1476
  • Fix client top_k malfunctioning bug. #1522
  • Add return_list option for Flow API and introduce Response as new primitive type. When return_list=True, return results are a list of Response objects to make it easy to interpret. #1541
  • Fix CORS behavior bug for REST API #1568 @yk

Executors

  • Change default metric of NumpyIndexer to cosine #1393
  • Remove deprecated jina/executors/encoders/helper.py #1563 @tadejsv
  • Introduce batching_multi_input decorator to add batching support for rankers #1467 @deepampatel
  • Allow Indexers to have separate workspaces. #1383
  • Fix bug when shards are empty #1340, #1396

Drivers

  • Add op_name for Matches2DocRankDriver #1409
  • Add batch_size argument for EncodeDriver to enable batching on driver level #1483
  • Make DocIdCache capable of detecting collisions on content level #1510
  • Enable AggregateMatches2DocRankDriver for keeping chunks of matches #1494

Types

  • Add NamedScore as new primitive type. #1430
  • Support + and += operations for Document. #1555
  • Move extract_content() to DocumentSet. Instead of using docs = DocumentSet(random_docs(2)); extract_content(docs), docs.all_contents() makes it easier to get contents from a set of Documents. #1387
  • Refactor random_id and introduce content_hash field in Document. #1440

Tests

  • Improve unit tests for test_hello_world #1305
  • Refactor unit tests for queryset #1336
  • Refactor unit tests for evaluation #1339
  • Refactor unit tests for index remote #1346
  • Fix integration tests for jinad #1367, #1388, #1407
  • Refactor random_docs() in unit tests #1356
  • Add unit tests for convert functions in Document #1389
  • Fix callbacks in unit tests. callback failures had chance of being not captured by tests #1391
  • Fix integration tests for evaluation #1411
  • Refactor doctrings in unit tests of QueryLangSet #1417
  • Fix bug failing to capture errors of callbacks during unit tests. #1419, #1536
  • Refactor unit tests for types #1435
  • Refactor unit tests for request #1445
  • Add unit tests for corner cases in calculating similarity metrics #1434
  • Add evaluation option for hello-world #1465, #1488, #1508, #1501,
  • Add test for loading customized drivers #1474
  • Refactor unit test for drivers #1452
  • Set default value of eval_at in PrecisionEvaluator and RecallEvaluator to None #1552
  • Fix unit tests of test_hub_usage when GITHUB_TOKEN is used. #1560
  • Refactor unit tests for drivers #1559
  • Refactor unit tests in hubio to use BuildTestLevel #1361
  • Fix naming for test_rankingevaluation_driver #1573

HubIO

  • Fix Jina Hub automated updates and add GA for updating Jina Hub images. Check out more details at hub-updater #1298, #1345, #1360, #1456
  • Redefine naming convention of Docker images in Jina Hub. Naming follows {repository}/{type}.{kind}.{name}:{version}-{jina_version} #1341
  • Avoid overwriting Docker image in Jina Hub when tag already exists. #1365
  • Clean up hubio imports. #1381
  • Fix hubio version checking and add --no-overwrite option for jina hub --push #1403
  • Fix hubio test levels #1361
  • Add --timeout-ready option for hubio #1525
  • Fix typo in error message #1531
  • Fix access to token credential file for jina hub push #1492
  • Switch to hubapi for retrieving Docker login information #1429, #1589

Others

  • Adapt to new remote log APIs #1300
  • Adapt to Docker SDK 4.4.0 in ContainerPea #1334
  • Move log parser from jinad to core. #1342
  • Use load_config directly as a classmethod #1352, #1354
  • Fix bug during completing file path for errors #1353
  • Fix top-k setting bug #1359
  • Fix newlines for autocompletion in bash. #1425 @lsgrep
  • Fix latency check during CI #1437
  • Add client-side exception handlers #1458, #1462,
  • Add GA for automated comments on lint failures. #1486, #1507, #1519
  • Introduce ArgNamespace in jina.helper to manage all namespace-related operations #1489
  • Introduce training. #1518
  • Introduce jina.jaml for parsing YAML files. #1533, #1547, #1581
  • Fix bug in parsing config source files #1583

🙏 Thanks to our Contributors

This release contains contributions from Amritpal Singh, Bithiah Yuan, CatStark, Deepam Patel, Deepankar Mahapatro, Florian Hönicke, Han Xiao, Harry Stark, Hidan, Joan Fontanals, Nan Wang, Pratik Bhavsar, Rutuja Surve, Sergey M, Siyuan Shi, Szymon Skorupinski, Tadej Svetina, Wang Bo, Yannic Kilcher, Yusup, cristian, florian-hoenicke

🙏 Thanks to our Community

And thanks to all of you out there as well! Without you Jina couldn't do what we do. Your support means a lot to us.

🤝 Work with Jina

Want to work with Jina full-time? Check out our openings on our website.

🎉 Release v0.9.1

05 Jan 13:06
Compare
Choose a tag to compare

Release Note (0.8.22)

Release time: 2021-01-03 23:25:16

🙇 We'd like to thank all contributors for this new release! In particular,
Han Xiao, Jina Dev Bot, 🙇

📗 Documentation

🍹 Other Improvements

  • [2e7786cb] - version: the next version will be 0.8.22 (Jina Dev Bot)

🎉 Jina 0.9.0

05 Jan 12:18
90d6782
Compare
Choose a tag to compare

Jina 0.9.0

We are excited to release Jina 0.9.0. Jina is the easier way to do neural search in the cloud. Highlights of this release include:

  • Support for delete/update operations
  • Add native AsyncIO support and unlock native support for running Jina in Jupyter notebooks and IPython
  • Add MultiModalDocument as primitive types to support multimodal search in a Pythonic way
  • Refactor Pea and introduce Runtime to improve code readability and maintainability

Release 0.9.0

⬆️ Major Features and Improvements

Completeness

Click to see example

async_flow

from jina import AsyncFlow
with AsyncFlow().add(uses='_logforward') as f:
    await f.index_lines(lines=['hello', 'jina'], on_done=print)

Ease of Use

Click here for example code
from jina import Document, MultimodalDocument
chunk_img = Document(modality='dummy_image', embedding=np.random.rand(1, 4))
chunk_text = Document(modality='dummy_text', embedding=np.random.rand(1, 10))
multimodal_doc = MultimodalDocument(chunks=[chunk_img, chunk_text])
  • Introduce Runtime as a member of Pea, defined as "a procedure that blocks the main process once running, therefore must be put into a separated thread/process. The new architecture greatly improves the readability and maintainability of the code. #1426, #1473, #1487, #1539, #1577

⚠️ Breaking Changes

  • Introduce UniqueId, ChunkSet, DocumentSet, MatchSet; Remove add_chunk and add_match; Refactor Document with newly introduced classes. #1343
Click here for example code
0.8.0 0.9.0
from jina import Document()
with Document() as d:
    c = Document(id=f'1:0>16')
    d.chunks.append(c)

with Document() as d:
     c = d.chunks.append()
     c.id = f'1:0>16'    
from jina import Document()
with Document() as d:
    c = d.chunks.add_chunks()
    c.id = f'1:0>16'  
  • Refactor YAML file parsing backend from ruamel.yaml to pyyaml and introduce jina.jaml for parsing YAML files. The dependency on ruamel.yaml is deprecated. #1495, #1516, #1524, #1533, #1547, #1581

  • Add _merge_matches and _merge_chunks for merging messages in different ways. Remove _merge_all. #1406 #1418

  • PyClient renamed to Client for simplicity #1450

📗 Documentation

🐞 Bug Fixes and Other Changes

Flow

  • Fix issue terminating RemotePea #133
  • Refactor Pea closing logic #1379, #1398, #1457
  • Refactor peapods code base #1421
  • Add versioning for Flow YAML config files. Introduce method field for Flow YAML configurations. #1442
  • Add env filed for Flow and Pod YAML configuration so that shared environment variables can be set. #1446, #1448
  • Rename Flow output argument to on_done. #1476
  • Fix client top_k malfunctioning bug. #1522
  • Add return_list option for Flow API and introduce Response as new primitive type. When return_list=True, return results are a list of Response objects to make it easy to interpret. #1541
  • Fix CORS behavior bug for REST API #1568 @yk

Executors

  • Change default metric of NumpyIndexer to cosine #1393
  • Remove deprecated jina/executors/encoders/helper.py #1563 @tadejsv
  • Introduce batching_multi_input decorator to add batching support for rankers #1467 @deepampatel
  • Allow Indexers to have separate workspaces. #1383
  • Fix bug when shards are empty #1340, #1396

Drivers

  • Add op_name for Matches2DocRankDriver #1409
  • Add batch_size argument for EncodeDriver to enable batching on driver level #1483
  • Make DocIdCache capable of detecting collisions on content level #1510
  • Enable AggregateMatches2DocRankDriver for keeping chunks of matches #1494

Types

  • Add NamedScore as new primitive type. #1430
  • Support + and += operations for Document. #1555
  • Move extract_content() to DocumentSet. Instead of using docs = DocumentSet(random_docs(2)); extract_content(docs), docs.all_contents() makes it easier to get contents from a set of Documents. #1387
  • Refactor random_id and introduce content_hash field in Document. #1440

Tests

  • Improve unit tests for test_hello_world #1305
  • Refactor unit tests for queryset #1336
  • Refactor unit tests for evaluation #1339
  • Refactor unit tests for index remote #1346
  • Fix integration tests for jinad #1367, #1388, #1407
  • Refactor random_docs() in unit tests #1356
  • Add unit tests for convert functions in Document #1389
  • Fix callbacks in unit tests. callback failures had chance of being not captured by tests #1391
  • Fix integration tests for evaluation #1411
  • Refactor doctrings in unit tests of QueryLangSet #1417
  • Fix bug failing to capture errors of callbacks during unit tests. #1419, #1536
  • Refactor unit tests for types #1435
  • Refactor unit tests for request #1445
  • Add unit tests for corner cases in calculating similarity metrics #1434
  • Add evaluation option for hello-world #1465, #1488, #1508, #1501,
  • Add test for loading customized drivers #1474
  • Refactor unit test for drivers #1452
  • Set default value of eval_at in PrecisionEvaluator and RecallEvaluator to None #1552
  • Fix unit tests of test_hub_usage when GITHUB_TOKEN is used. #1560
  • Refactor unit tests for drivers #1559
  • Refactor unit tests in hubio to use BuildTestLevel #1361
  • Fix naming for test_rankingevaluation_driver #1573

HubIO

  • Fix Jina Hub automated updates and add GA for updating Jina Hub images. Check out more details at hub-updater #1298, #1345, #1360, #1456
  • Redefine naming convention of Docker images in Jina Hub. Naming follows {repository}/{type}.{kind}.{name}:{version}-{jina_version} #1341
  • Avoid overwriting Docker image in Jina Hub when tag already exists. #1365
  • Clean up hubio imports. #1381
  • Fix hubio version checking and add --no-overwrite option for jina hub --push #1403
  • Fix hubio test levels #1361
  • Add --timeout-ready option for hubio #1525
  • Fix typo in error message #1531
  • Fix access to token credential file for jina hub push #1492
  • Switch to hubapi for retrieving Docker login information #1429, #1589

Others

  • Adapt to new remote log APIs #1300
  • Adapt to Docker SDK 4.4.0 in ContainerPea #1334
  • Move log parser from jinad to core. #1342
  • Use load_config directly as a classmethod #1352, #1354
  • Fix bug during completing file path for errors #1353
  • Fix top-k setting bug #1359
  • Fix newlines for autocompletion in bash. #1425 @lsgrep
  • Fix latency check during CI #1437
  • Add client-side exception handlers #1458, #1462,
  • Add GA for automated comments on lint failures. #1486, #1507, #1519
  • Introduce ArgNamespace in jina.helper to manage all namespace-related operations #1489
  • Introduce training. #1518
  • Introduce jina.jaml for parsing YAML files. #1533, #1547, #1581
  • Fix bug in parsing config source files #1583

🙏 Thanks to our Contributors

This release contains contributions from Amritpal Singh, Bithiah Yuan, CatStark, Deepam Patel, Deepankar Mahapatro, Florian Hönicke, Han Xiao, Harry Stark, Hidan, Joan Fontanals, Nan Wang, Pratik Bhavsar, Rutuja Surve, Sergey M, Siyuan Shi, Szymon Skorupinski, Tadej Svetina, Wang Bo, Yannic Kilcher, Yusup, cristian, florian-hoenicke

🙏 Thanks to our Community

And thanks to all of you out there as well! Without you Jina couldn't do what we do. Your support means a lot to us.

🤝 Work with Jina

Want to work with Jina full-time? Check out our openings on our [website](https://jina.ai/...

Read more

🎉 Release v0.8.22

03 Jan 23:26
3178fef
Compare
Choose a tag to compare

Release Note (0.8.21)

Release time: 2021-01-03 16:37:08

🙇 We'd like to thank all contributors for this new release! In particular,
Han Xiao, 🙇

🏁 Unit Test and CICD

  • [a431537f] - fix github-push-action default branch (Han Xiao)
  • [b2430fbd] - fix tag release order (Han Xiao)

🍹 Other Improvements

  • [b2689933] - hotfix release (Han Xiao)

🎉 Release v0.8.21

03 Jan 16:37
Compare
Choose a tag to compare

Release Note (0.8.18)

Release time: 2021-01-01 15:44:56

🙇 We'd like to thank all contributors for this new release! In particular,
Han Xiao, Florian Hönicke, Jina Dev Bot, 🙇

🐞 Bug fixes

  • [4f50e802] - russian translation 101 (#1572) (Florian Hönicke)

🚧 Code Refactoring

  • [9b81559d] - redesign Runtime, Pea, Pod, Parser (#1539) (Han Xiao)

📗 Documentation

  • [04d7d9da] - versioning docs (Han Xiao)
  • [33590bd6] - fix parser module path (Han Xiao)

🍹 Other Improvements

  • [8b72e175] - hotfix release (Han Xiao)
  • [804a0a4c] - version: the next version will be 0.8.18 (Jina Dev Bot)

🎉 Release v0.8.20

03 Jan 16:07
Compare
Choose a tag to compare

Release Note (0.8.18)

Release time: 2021-01-01 15:44:56

🙇 We'd like to thank all contributors for this new release! In particular,
Han Xiao, Florian Hönicke, Jina Dev Bot, 🙇

🐞 Bug fixes

  • [4f50e802] - russian translation 101 (#1572) (Florian Hönicke)

🚧 Code Refactoring

  • [9b81559d] - redesign Runtime, Pea, Pod, Parser (#1539) (Han Xiao)

📗 Documentation

  • [04d7d9da] - versioning docs (Han Xiao)
  • [33590bd6] - fix parser module path (Han Xiao)

🍹 Other Improvements

  • [8b72e175] - hotfix release (Han Xiao)
  • [804a0a4c] - version: the next version will be 0.8.18 (Jina Dev Bot)

🎉 Release v0.8.19

03 Jan 15:44
Compare
Choose a tag to compare

Release Note (0.8.18)

Release time: 2021-01-01 15:44:56

🙇 We'd like to thank all contributors for this new release! In particular,
Han Xiao, Florian Hönicke, Jina Dev Bot, 🙇

🐞 Bug fixes

  • [4f50e802] - russian translation 101 (#1572) (Florian Hönicke)

🚧 Code Refactoring

  • [9b81559d] - redesign Runtime, Pea, Pod, Parser (#1539) (Han Xiao)

📗 Documentation

  • [04d7d9da] - versioning docs (Han Xiao)
  • [33590bd6] - fix parser module path (Han Xiao)

🍹 Other Improvements

  • [8b72e175] - hotfix release (Han Xiao)
  • [804a0a4c] - version: the next version will be 0.8.18 (Jina Dev Bot)

🎉 Jina 0.8.0

23 Nov 09:47
Compare
Choose a tag to compare

We are excited to release Jina 0.8.0. Jina is an easier way to do neural search on the cloud. Highlights of this release include:

  • Introduce jinad to improve experience of using remote Flows/Pods/Peas
  • Add support for multimodal search SparseArray
  • Add jina.types module to offer Pythonic interface to access and manipulate protobuf objects.

Release 0.8.0

⬆️ Major Features and Improvements

Ease of Use

  • We introduce two new ways of using Jina Pods remotely:
    • Create a remote Pod via SSH #1275
    • Create a remote Pod via jinad. Jinad is a daemon process working together with jina on remote machines. Jinad makes it even easier to deploy Jina Flows/Pods/Peas on remote machines. Find out more details in the README #1182, #1203, #1254, #1297, #1299, #1307, #1312, #1324
Click here for example code

RemoteSSHPod Jinad API
jina pod --host [email protected] --remote-access SSH

jina pod --host 11.22.33.44 --port-expose 8000 --remote-access JINAD

With jinad, you can create and use Pods directly from the Flow as well: Start the Docker container equipped with jinad on the remote machine as follows:

sudo docker run --rm -d --network host jinaai/jinad

Now you can directly create and use the remote pods from your local machine:

f = (Flow()
     .add(name='p1', uses='_logforward')
     .add(name='p2', host='10.11.22.33', port_expose='8000', uses='_logforward')
with f:
     f.search_lines(lines=['jina', 'is', 'cute'], output_fn=print)
  • We've added jina.types module, which offers a Pythonic interface to access and manipulate protobuf objects. The main types include Request, QueryLang, NdArray, Message, and Document. With the help of Jina types, you can construct inputs to Jina much more easily than before. #1283, #1284, #1289, #1323
Click here for example code

v0.7.0 v0.8.0
Document
from jina.proto import jina_pb2
d = jina_pb2.DocumentProto()
d.text = 'hello world'

from jina import Document
d = Document()
d.text = 'abc'

Request
from jina.proto import jina_pb2
r = jina_pb2.Request()
d = r.docs.add()

from jina.types.request import Request
from jina.types.document import Document
r = Request()
d = Document()
r.add_document(d)

Message
from jina.proto import jina_pb2
r = jina_pb2.RequestProto.IndexRequestProto()
m = jina_pb2.MessageProto()
m.envelop = None
m.request = r

from jina.types.message import Message
from jina.types.request import Request
r = Request()
m = Message(None, r)

QueryLang
from jina.proto import jina_pb2
ql = jina_pb2.QueryLangProto(name='SliceQL')
ql.parameters['start'] = 1
ql.parameters['end'] = 3

from jina.types.querylang import QueryLang
ql = QueryLang(SliceQL(start=1, end=3))

NdArray
from jina.proto import jina_pb2
from jina.drivers.helper import array2pb
a = jina_pb2.jina_pb2.NdArrayProto()
a.CopyFrom(array2pb(np.ndarray([2, 17])))

from jina.types.ndarray.generic import NdArray
a = NdArray()
a.value = np.ndarray([2, 17])

Completeness

⚠️ Breaking Changes

  • Refactor drivers for evaluation from function-based to type-based. #1165

    • Removed EncodeEvaluationDriver and CraftEvaluationDriver
    • TextEvaluateDriver, NDArrayEvaluateDriver, and FieldEvaluateDriver
    • RankingEvaluationDriver renamed to RankEvaluateDriver
  • Introduce SparseNdArray and provide generic interface for SparseNdArray and DenseNdArray #1190, #1283

Click here for example code

v0.7.0 v0.8.0
dense array
from jina.proto import jina_pb2
from jina.proto import jina_pb2
from jina.drivers.helper import array2pb
a = jina_pb2.jina_pb2.NdArrayProto()
a.CopyFrom(array2pb(np.ndarray([2, 17])))

from jina.types.ndarray.generic import NdArray
a = NdArray()
a.value = np.ndarray([2, 17])

sparse array
not support

from jina.types.ndarray.generic import NdArray
from .sparse.scipy import SparseNdArray
from scipy.sparse import coo_matrix
row = np.array([20, 0])
col = np.array([0, 20])
data = np.array([2, 17])
a = NdArray(is_sparse=True, sparse_cls=SparseNdArray)
a.value = coo_matrix((data, (row, col)), shape=(21, 21))

  • Add callback_on and continue_on_error fot the client. callback_on_body is removed. #1265
Click here for example code

v0.7.0 v0.8.0
from jina.flow import Flow
f = (Flow().add(name='p1').add(name='p2'))

with f:
    f.search_lines(lines=['hello', 'jina'], callback_on_body=True)

from jina.flow import Flow
f = (Flow().add(name='p1').add(name='p2'))

with f:
    f.search_lines(lines=['hello', 'jina'], callback_on='body')

  • Add ProtoMessage, LazyRequest to replace the original jina_pb2.Message and jina_pb2.Request so that the protobuf message is deserialized in a lazy way #1210, #1283
Click here for example code

v0.7.0 v0.8.0
from jina.proto import jina_pb2
r = jina_pb2.RequestProto.IndexRequestProto()
m = jina_pb2.MessageProto()
m.envelop = None
m.request = r

from jina.types.message import Message
from jina.types.request import Request
r = Request()
m = Message(None, r)

🐞 Bug Fixes and Other Changes

Flow

  • Fix argument overridden bug for Pod when passing arguments from Flow #1189
  • Refactor num_part logic #1247
  • Enable client to interpret dict of json-like str into parsed documents #1282
  • Besides callback function for Flow API, three more actions added for postprocessing requests on_done, on_error, on_always #1303

Protos

  • Use Docker container to generate protobuf files #1241, #1242

Drivers

  • Refactor over-reduce logic to BaseDriver. Move ReduceDriver function into BaseDriver. Merge PassDriver and RouteDriver into RouteDriver #1228
  • Adapt the Drivers to the jina.type #1313,

Tests

  • Remove pip cache from Docker images #1168
  • Refactor unit tests for ContainerPea to pytest #1179
  • Switch back to use S3 bucket instead of GitHub for accessing fashionmnist dataset #1183
  • Refactor unit tests for CompoundExecutors to pytest #1192
  • Refactor unit tests for hello-world to pytest #1263
  • Refactor unit tests for indexing to pytest. #1258, #1237
  • Add unit tests for southpark example #1218
  • Fix flaky test #1219
  • Remove legacy code #1291, #1314
  • Adapt unit tests to jina.type #1319, #1320, #1322

Usability

  • Add --repository option for jina hub cli so users can push Pod images to their own repository. #1175
  • Replace id_tag argument with field in RankEvaluateDriver so users can access all fields of matches #1176

Documentation

Read more

🎉 release v0.7.0

26 Oct 12:40
Compare
Choose a tag to compare

Jina v0.7.0

We are excited to release Jina v0.7.0. Jina is an easier way to do a neural search on the cloud. Highlights of this release include:

  • Flow evaluation support
  • Support for preventing duplicates Documents in the index
  • Flow visualization support

Release v0.7.0

⬆️ Major Features and Improvements

Completeness

  • Evaluation is fully supported by Jina. jina.executors.evaluators and jina.drivers.evaluate have been introduced to make this happen. Now you can use different metrics to evaluate the Flow. No matter whether you want to evaluate the whole Flow or just part of it, the evaluation can be done smoothly without stopping the running Flow. #1043, #1086, #1087, #1090, #1092, #1099, #1100, #1102, #1114, #1134
Click here to see the example codes

code index-doc.yml eval.yml
from jina.flow import Flow
from jina.proto import jina_pb2
from jina.drivers.helper import array2pb
import numpy as np

def get_index_docs():
    doc0 = jina_pb2.Document()
    doc0.tags['id'] = '0'
    doc0.embedding.CopyFrom(array2pb(np.array([1, 1])))
    doc1 = jina_pb2.Document()
    doc1.tags['id'] = '1'
    doc1.embedding.CopyFrom(array2pb(np.array([1, -1])))
    return [doc0, doc1]

# indexed two docs
f_index = (Flow().add(uses='index-doc.yml'))
with f_index:
    f_index.index(input_fn=get_index_docs)


def get_eval_docs():
    doc = jina_pb2.Document()
    doc.embedding.CopyFrom(array2pb(np.array([1, 1])))
    groundtruth = jina_pb2.Document()
    match0 = groundtruth.matches.add()
    match0.tags['id'] = '0'
    match1 = groundtruth.matches.add()
    match1.tags['id'] = '2'
    return [(doc, groundtruth), ]

def validate(resp):
    # retrieved docs with id `0` and `1`
    # relevant docs with id `0` and `2`
    # Precision@2 = 0.5
    assert resp.docs[0].evaluations[0].value == 0.5

# evaluate Precision@2
f_eval = (Flow()
          .add(uses='index-doc.yml')
          .add(uses='eval.yml'))
with f_eval:
    f_eval.search(
        input_fn=get_eval_docs, 
        output_fn=validate, 
        callback_on_body=True)

!CompoundIndexer
components:
  - !NumpyIndexer
    metas:
      name: vecidx
  - !BinaryPbIndexer
    metas:
      name: docidx
requests:
  on:
    IndexRequest:
      - !VectorIndexDriver
        with:
          executor: vecidx
          traversal_paths: ['r']
      - !KVIndexDriver
        with:
          executor: docidx
          traversal_paths: ['r']
    SearchRequest:
      - !VectorSearchDriver
        with:
          executor: vecidx
          traversal_paths: ['r']
      - !KVSearchDriver
        with:
          executor: docidx
          traversal_paths: ['m']

!PrecisionEvaluator
with:
    eval_at: 2
    id_tag: 'id'

  • To prevent duplicates in the index, UniquePbIndexer and UniqueVectorIndexer are introduced together with the corresponding drivers in jina.drivers.cache. Please refer to docs.jina.ai for more details. #1064, #1081, #1147
Click here to see the example codes
from jina.flow import Flow
from jina.proto import jina_pb2

doc_0 = jina_pb2.Document()
doc_0.text = f'I am doc0'
doc_1 = jina_pb2.Document()
doc_1.text = f'I am doc1'


def assert_num_docs(rsp, num_docs):
    assert len(rsp.IndexRequest.docs) == num_docs

f = Flow().add(
    uses='NumpyIndexer', uses_before='_unique')

with f:
    f.index(
        [doc_0, doc_0, doc_1], 
        output_fn=lambda rsp: assert_num_docs(rsp, num_docs=2))

Usability

  • Add visualization for Flow. Calling plot() function of Flow gives a better view of how the Flow looks. #1002, #1116
Click here to see the example codes flow_visualize

⚠️ Breaking Changes

  • Document.id, Document.parent_id and Relevance.ref_id are now string types instead of int. Please refer to docs.jina.ai for more details. #1005, #1034, #1136 Accordingly, the following changes are made,

    • SortQL.field now uses dunder_get syntax rather than . expansion (e.g. a.b.c -> a__b__c, score.value -> score__value) and now supports dict and list access.
    • first_doc_id, random_doc_id and override_doc_id have been removed from CLI.
  • Refactor logger config into YAML. Add --log-config to jina pea CLI, by default it points to logging.default.yml. --log-sse, --log-profile, --log-with-own-name are deprecated. #1031

Click here to check how the loggers are mapped to different resource files:
Filename Logger in the code
logging.default.yml default_logger and any logger defined with JinaLogger()
logging.docker.yml logger used in the ContainerPea
logging.profile.yml profile_logger
logging.remote.yml logger used in the RemotePea
  • Refactor the codes for traversing recursive Documents. Replaced by traversal_paths, granularity_range, adjacency_range, recur_on and recursion_order are deprecated. This allows us to specify where the traversal should happen in an exact way. #995, #998, #1001, #1003, #1006, #1007, #1027, #1036, #1044

  • Protobuf request_id is now string type. --first-request-id removed from client CLI. --query-uses and --index-uses from hello-world CLI now renamed to --uses-query and --uses-index. #1049

🐞 Bug Fixes and Other Changes

Flow

  • Refactor log stream server with fluentd. Flunetd acts as a daemon collecting logs from different parts of Jina and forwarding them to a specific output. Check out more details at docs.jina.ai #1002, #999
  • Add ordinal_idx_arg for batching decorator to support passing ordinal index to indexers #1089
  • Refactor request_id to uuid #1049
  • Refactor logger wrapper #1029
  • Add ssh tunneling for Pod. You can specify ssh information #1018
  • Switch to hash function for generating ids #1005, #1034
  • Support to use --uses-before and --uses-after when --parallel=1. Both options only act on when parallel > 1. _pass and _forward are using RouteDriver by default. #1112
  • Rename replica_id to pea_id and fix the PeaRoleType #1015
  • Fix the bug in setting top_k #1133 #1138 #1145

Executors

  • Add checking for the existence of model paths #1077
  • Improve exception handling for the failure of loading pre-trained models #1065
  • Fix typing of indexers #1053
  • Fix the no attribute error for BaseOnnxEncoder #1107

Drivers

  • Fix bug in QueryDriver when passing dictionary argument. #1080

CLI

  • Improve the hubio module. jina hub login supports to login with the OAuth authentification. jina hub list is for list the available pods in the jina-hub. jina hub push support to build and push the pod images via Hubapi deployed on AWS API Gateway #1022, #1041, #1118, #1120, #1135

  • Add the update checking for jina cli #1117

Tests & CICD

  • Refactor test for Python client #1095
  • Add tests for including examples during ci #1088
  • Fix dependency conflicts in ci by replacing [match-py-ver] with [cicd] #1101
  • Improve PR review process by adding CODEOWNERS #1108
  • Refactor to pytest in testing request #1045
  • Add unit test for helper #1046
  • Fix io test #1052
  • Fix test coverage #1054, #1056
  • Use pytest fixture to remove tmp files #1021
  • Refactor the unit tests to pytest style in test_protobuf #1121
  • Add docker helper test #1115
  • Add test in the ci for testing examples #1142
  • Add test in the ci for testing hello-world in docker with no devel installed #1139

Documentation

  • Add Portuguese translation for README #1097
  • Add Ukrainian translation for README.md #1124
  • Fix Russian README #1057
  • Fix broken links in README #1033, #1037, #105
  • Fix links in CHANGELOG and CONTRIBUTING #1032
  • Improve the docstring for rank drivers #1143

Others

  • Fix duplicate lines in cookiecutter #1063
  • Fix conflicts between copyright adding action and typing #1023
  • Move numpy importing inside function #1019
  • Rename jina_cli to cli #1017
  • Fix typing error in mypy #1009
  • Fix line spaces in code #1105

🙏 Thanks to our Contributors

This release contains contributions from Alex C-G, Alex McKenzie, CatStark, Christopher Lennan, Deepankar Mahapatro, Fernanda Kawasaki, Han Xiao, Joan Fontanals Martinez, Ján Jendrušák, Maximilian Werk, Nan Wang, Oleh Yaroshchuk, Pratik Bhavsar, RenrakuRunrat, Rutuja Surve, Sai Sandeep Mutyala, Sergei Averkiev, Susana Guzman, Wang Bo, jancijen, pswu11

🙏 Thanks to our Community

And thanks to all of you out there as well! Without you, Jina couldn't do what we do. Your support means a lot to us.

🤝 Work with Jina

Want to work with Jina full-time? Check out our openings on our website.

🎉 v0.6.0

04 Oct 18:14
Compare
Choose a tag to compare

Jina v0.6.0

We are excited to release Jina 0.6.0. Jina is the easier way to do neural search on the cloud. Highlights of this release include:

  • Improve the memory footprint for the Indexer.
  • Add an example for building a cross-modal search system with Jina.
  • Add support for indexing .pdf files.

Release 0.6.0

⬆️ Major Features and Improvements

Scalability

  • Improve the memory footprint for the Indexer. Instead of using the in-memory index during the query mode, both the NumpyIndexer and the BinaryPbIndexer use the memory mapping to better support scaling out for large datasets. To further improve the memory footprint for the vector index, ZarrIndexer based on Zarr has been added to Jina Hub. #950, #984.

Universal

⚠️ Breaking Changes

For details of all breaking changes, please refer to #885

  • Improve the way of traversing recursive document structure. #944, #933, #923, #893, #889,
  • Rename --yaml-path to --uses in Flow CLI #925, #922
  • Rename --uses-reducing to --uses-after and add --uses-before. This change enables us to customize the executors' behaviors before sending them to and after receiving from all parallels/shards. #925

🐞 Bug Fixes and Other Changes

Flow

  • Improve context management of Flow and Pod with ExitStack. #901,
  • Improve shut-down logic for log server #935, #958
  • Fix shut-down logic for Peas and Pods #907, #956
  • Refactor de-/serialization logic #988, #991

Executors

  • Add a meta variable force_register for executors in order to force Jina to use local version of executor. #883
  • Fix a bug in reducing functions for encoders. #900
  • Fix default behavior of CompoundIndexer #939
  • Fix bug in overwriting metas using Python client. #980

Drivers

  • Add CollectMatches2DocRankDriver for calculating matches with granularity=k-1 from Matches at granularity=k. #851
  • Add Matches2DocRankDriver for calculating new scores of matches from original scores #919
  • Add VectorFillDriver for filling embeddings of Document 2 #909, #913
  • Add support for using tags with QueryLangDrivers #938
  • Add support for traversing recursive Documents via explicit tree path definition. #983, #979, #994, #993
  • Enable BaseSegmenter to change mime_type. #981
  • Add NdArray2PngURI and Blob2PngURI for convert numpy arrays into data URI. #982

CLI

  • Add --test-uses option for jina hub build CLI for skipping failed-start peas when building Docker file. #902, #965
  • Add is_build_success field for checking results of jina hub build. #903
  • Add --type app option for jina hub new CLI for creating a new Jina app. #917
  • Add --push option for jina hub build CLI for building and pushing local executors to Jina Hub. #937
  • Improve jina hub list CLI. #985
  • Improve speed of CLI autocompletion. #992

Tests

  • Add more unit tests for reducing functions 1 #898
  • Move dependencies for unit tests into extra-requirements.txt #906
  • Add unit tests for sleeping executors #918
  • Add more unit tests for checking Peas #921
  • Add more unit tests for decorators of executors. #930
  • Add more unit tests for overriding Flow arguments. #926, #927
  • Fix name conflicts in test when running unit tests on Github. #961
  • Add more unit tests for support of Documents with chunks of different mime_type, #968

Documentation

Others

🙏 Thanks to our Contributors

This release contains contributions from Alasdair Tran, Alex C-G, David Sanwald, Deepankar Mahapatro, Han Xiao, JamesTang-jinaai, Joan Fontanals Martinez, Maximilian Werk, Nan Wang, Rutuja Surve, Sreerag-ibtl, Susana Guzman, Yue Liu, pswu11, rameshwara

🙏 Thanks to our Community

And thanks to all of you out there as well! Without you, Jina couldn't do what we do. Your support means a lot to us.

🤝 Work with Jina

Want to work with Jina full-time? Check out our openings on our website.