-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyPi conflict #1
Comments
Hi Ståle, apologies, but while the python-xlsx repo here on GitHub is currently empty the work is already in progress and the naming was chosen to be consistent with python-docx and python-pptx, companion projects I'm committer for that are both already well along :( Sorry about that. Would love to chat with you about what you're doing though, might be an opportunity to work together as I've got a fair amount of infrastructure in place now that applies to all three of the MS Office XML formats, let me know :) |
I would highly recommend combining efforts here. I wrote the original ticket to upload @staale's library to PyPI. If you combine projects it would be really nice if the same API was available at first and slowly changed as the library evolved with both your efforts. Do you think we could do that? |
@chrisgilmerproj, @staale: I'm happy to talk about collaborating. It does seem that the scope of the two projects is pretty different though, so we should probably discuss that. Can you describe what the motivations behind your project are? It seems pretty focused on date formatting just from a quick look. Like the other python-openxml projects, python-pptx for example, the scope of the python-xlsx project is a full read/write library for .xlsx files, allowing all content types and settings allowed by the file format. One option might be for me to check in the code I have so far and see if the features this module provides can be addressed as a priority. That's one reason why I'm asking to understand it better. Can you help me understand the whys and wherefores? |
I am all for a comprehensive package. Basically my usage is simple: grab the data from a specific cell of a specific sheet in an xlsx file. Iterating through the columns and rows is pretty basic and I expect you'll have that too. I'd love to have the code you have so far as a package I can use from pypi. If the API interface is similar then I'd use it. The motivation for my ticket on the other repo is simply to get a package up into PyPI. As long as there is working code in a package there then I don't care so much which repo I use. |
Since I am not activly using python, nor accessing xlsx files, it's not much work I do on https://github.com/staale/py-xlsx. If you find the code there usefull, by all means use it. I haven't really put a license on it, but I consider it public domain. Grab it and use it without any constraints :). I have uploaded it to pypi now: https://pypi.python.org/pypi/py-xlsx/0.2 |
Thanks @staale, that's very good of you :) |
@chrisgilmerproj One or two considerations I think worth noodling a bit. One difference between the two libraries is the one I'm building will use lxml for XML access, where this one uses the built-in ElementTree library. I believe lxml performs somewhat better at scale, and has some other features that make it architecturally desirable for that project; but it does entail a C-compile step and having certain C libraries installed (libxml2, libxslt), a characteristic that's presented the occasional support challenge. Depending on the use case, user base and platform, one might reasonably prefer to avoid such an entailed dependency. I'm somewhat inclined to believe there's a place for both in the Python community, whatever we end up deciding re: merging the projects or whatever; a full-featured read-write Excel library as well as one that tidily lets you pull out some data and be done. What's your view? |
When I consume libraries, I like libraries with as few as possible dependencies, it avoids me having to import extra libraries, and it avoids version conflicts. Performance might or might not be an issue, depending on use case. For just importing data, which is what I used my library for, performance isn't that critical. Neither would performance be for export. But if the library is going to be used to process thousands of excel files, you might hit an issue. Another option would be to separate the code that accesses the zip file from the code that parses the xml. So you could plug in a different xml handler using lxml if you need to. This adds a bit more work in terms of designing the api, but will make it more flexible in use. As for merging, I think the best approach here would be to just look at the api that my library provides, and if that seems sane and usefull, just make your implementation support the same api for reading. Since I don't use Python in my day-job any longer, nor do I work with xlsx files, I will not be able to provide much code help here. I do think that my api does provide a nice interface for reading xlsx files though. Perhaps an even better api would be something like this:
What could also be cool, would be to have something to access cells using numpy:
This would return objects in numpy format, for easy summation. I am a bit rusty in numpy though, but I think something like this should be possible. |
I completely agree on the dependency question. Best case in my mind would be absolutely pure and self-contained Python :) I'm not sure I can achieve it for the general-purpose library, but I definitely consider it a characteristic of your library that's important to preserve. I would love to achieve the same for the general-purpose python-xlsx, but the architecture I've developed is strongly tied to the custom element classes I know I mentioned the performance aspect, but I agree it's unlikely to be a factor, at least in 99% of use cases, and certainly in 100% of the use cases I've encountered so far :) At a minimum it would be premature optimization :). Also, I've read there is a cElementTree implementation, and that might be used by folks who needed an extra boost. I definitely like the idea of having a consistent API. That way it would be like python-xlsx-lite or something like that :). You could 'upgrade' later to the general-purpose library with only the change of the import statement. Regarding the API you propose, I'm not sure about indexed access directly on workbook for sheets, as there would be other things to do with a workbook, possibly including other collections. And I suppose that would extend to sheets as well. Maybe something like: sheet = workbook.sheets['Sheet1'] # and support workbook.sheets[0] as well Regarding the 'Excel' string syntax for ranges and so on, why do you think that would be better? I haven't given a lot of thought to the likely use cases, but I'm not getting a clear idea when you would want to spell things out that way. I suppose if you were only interested in a known range or something then intent would be clearer in the code, and easier to map back to the sample sheet you were using. But I just kind of imagined a typical use case would look something more like this: for row in sheet.rows[1:]:
name = row.cells[0].value
address = row.cells[1].value
phone = row.cells[2].value
persons.append(Person(name, address, phone)) What's your sense of the typical range of use cases? |
Your proposal for workbook.sheets[] seems a good option as well. But I would think both could be used, so you could implent __getitem__in workbook as:
Having 1 getitem wouldn't exclude separate properties. I don't know what is more pythonic, I am just thinking about the simplest case possible. If you had excel style syntax, your for loop could be rewritten as:
Well, being able to write the above would be cool. sheet["A1:C1000"] gets a 2 dimensional array that we explode into fields. So the entire thing can be written as 1 for comprehension. Makes for a neat and powerfull API. Though perhaps both usage patterns need to be supported, for easy transition. |
Dear all, Thanks a lot for all your work regarding the docx project and this one. In case you haven't seen, there is already another python project regarding xlsx files www.python-excel.org I'm not owner or developer of this packages, just a user that thought that could help your work here. Cheers, |
Hi, I am the maintainer of another python-xlsx library - https://github.com/staale/python-xlsx
I have an outstanding issue to upload to pypi (staale/py-xlsx#18), however, I am not able to upload as you have reserved that name in pypi already. As far as I can tell, your repository is empty, would you mind transfering python-xlsx to me in pypi?
The text was updated successfully, but these errors were encountered: