PyPi conflict #1

staale · 2013-11-21T08:09:34Z

Hi, I am the maintainer of another python-xlsx library - https://github.com/staale/python-xlsx

I have an outstanding issue to upload to pypi (staale/py-xlsx#18), however, I am not able to upload as you have reserved that name in pypi already. As far as I can tell, your repository is empty, would you mind transfering python-xlsx to me in pypi?

scanny · 2013-11-21T08:15:36Z

Hi Ståle, apologies, but while the python-xlsx repo here on GitHub is currently empty the work is already in progress and the naming was chosen to be consistent with python-docx and python-pptx, companion projects I'm committer for that are both already well along :(

Sorry about that.

Would love to chat with you about what you're doing though, might be an opportunity to work together as I've got a fair amount of infrastructure in place now that applies to all three of the MS Office XML formats, let me know :)

chrisgilmerproj · 2013-12-16T19:24:15Z

I would highly recommend combining efforts here. I wrote the original ticket to upload @staale's library to PyPI. If you combine projects it would be really nice if the same API was available at first and slowly changed as the library evolved with both your efforts. Do you think we could do that?

scanny · 2013-12-16T22:36:39Z

@chrisgilmerproj, @staale: I'm happy to talk about collaborating. It does seem that the scope of the two projects is pretty different though, so we should probably discuss that.

Can you describe what the motivations behind your project are? It seems pretty focused on date formatting just from a quick look.

Like the other python-openxml projects, python-pptx for example, the scope of the python-xlsx project is a full read/write library for .xlsx files, allowing all content types and settings allowed by the file format.

One option might be for me to check in the code I have so far and see if the features this module provides can be addressed as a priority. That's one reason why I'm asking to understand it better.

Can you help me understand the whys and wherefores?

chrisgilmerproj · 2013-12-16T23:48:02Z

I am all for a comprehensive package. Basically my usage is simple: grab the data from a specific cell of a specific sheet in an xlsx file. Iterating through the columns and rows is pretty basic and I expect you'll have that too.

I'd love to have the code you have so far as a package I can use from pypi. If the API interface is similar then I'd use it. The motivation for my ticket on the other repo is simply to get a package up into PyPI. As long as there is working code in a package there then I don't care so much which repo I use.

staale · 2013-12-17T11:26:21Z

Since I am not activly using python, nor accessing xlsx files, it's not much work I do on https://github.com/staale/py-xlsx. If you find the code there usefull, by all means use it. I haven't really put a license on it, but I consider it public domain. Grab it and use it without any constraints :).

I have uploaded it to pypi now: https://pypi.python.org/pypi/py-xlsx/0.2

scanny · 2013-12-17T22:54:42Z

Thanks @staale, that's very good of you :)

scanny · 2013-12-18T02:40:52Z

@chrisgilmerproj One or two considerations I think worth noodling a bit. One difference between the two libraries is the one I'm building will use lxml for XML access, where this one uses the built-in ElementTree library. I believe lxml performs somewhat better at scale, and has some other features that make it architecturally desirable for that project; but it does entail a C-compile step and having certain C libraries installed (libxml2, libxslt), a characteristic that's presented the occasional support challenge.

Depending on the use case, user base and platform, one might reasonably prefer to avoid such an entailed dependency.

I'm somewhat inclined to believe there's a place for both in the Python community, whatever we end up deciding re: merging the projects or whatever; a full-featured read-write Excel library as well as one that tidily lets you pull out some data and be done.

What's your view?

staale · 2013-12-18T10:11:15Z

When I consume libraries, I like libraries with as few as possible dependencies, it avoids me having to import extra libraries, and it avoids version conflicts.

Performance might or might not be an issue, depending on use case. For just importing data, which is what I used my library for, performance isn't that critical. Neither would performance be for export. But if the library is going to be used to process thousands of excel files, you might hit an issue.

Another option would be to separate the code that accesses the zip file from the code that parses the xml. So you could plug in a different xml handler using lxml if you need to. This adds a bit more work in terms of designing the api, but will make it more flexible in use.

As for merging, I think the best approach here would be to just look at the api that my library provides, and if that seems sane and usefull, just make your implementation support the same api for reading. Since I don't use Python in my day-job any longer, nor do I work with xlsx files, I will not be able to provide much code help here.

I do think that my api does provide a nice interface for reading xlsx files though. Perhaps an even better api would be something like this:

# Access a single cell value
woorkbook["sheet"]["C1"] 

# Access a row object, that is iterable both as a collection, and as a map (since it's a sparse array)
woorkbook["sheet"]["C"] 
woorkbook["sheet"]["5"] 

# Access a collection of values, empty values should probably be represented as None here
woorkbook["sheet"]["C1:C5"] 
woorkbook["sheet"]["C1:F1"] 

# Access as a 2 dimensional array, empty cells would be None
woorkbook["sheet"]["C1:F5"]

What could also be cool, would be to have something to access cells using numpy:

woorkbook["sheet"].numpy["C1:F5"]

This would return objects in numpy format, for easy summation. I am a bit rusty in numpy though, but I think something like this should be possible.

scanny · 2013-12-19T02:06:49Z

I completely agree on the dependency question. Best case in my mind would be absolutely pure and self-contained Python :)

I'm not sure I can achieve it for the general-purpose library, but I definitely consider it a characteristic of your library that's important to preserve. I would love to achieve the same for the general-purpose python-xlsx, but the architecture I've developed is strongly tied to the custom element classes lxml provides. I've started noodling though on how I might re-implement that aspect so I can use the built-in ElementTree. My requirements other than the custom classes are pretty modest and I think I could do just fine with the built-in capability otherwise.

I know I mentioned the performance aspect, but I agree it's unlikely to be a factor, at least in 99% of use cases, and certainly in 100% of the use cases I've encountered so far :) At a minimum it would be premature optimization :). Also, I've read there is a cElementTree implementation, and that might be used by folks who needed an extra boost.

I definitely like the idea of having a consistent API. That way it would be like python-xlsx-lite or something like that :). You could 'upgrade' later to the general-purpose library with only the change of the import statement.

Regarding the API you propose, I'm not sure about indexed access directly on workbook for sheets, as there would be other things to do with a workbook, possibly including other collections. And I suppose that would extend to sheets as well. Maybe something like:

sheet = workbook.sheets['Sheet1']  # and support workbook.sheets[0] as well

Regarding the 'Excel' string syntax for ranges and so on, why do you think that would be better? I haven't given a lot of thought to the likely use cases, but I'm not getting a clear idea when you would want to spell things out that way. I suppose if you were only interested in a known range or something then intent would be clearer in the code, and easier to map back to the sample sheet you were using. But I just kind of imagined a typical use case would look something more like this:

for row in sheet.rows[1:]:
    name = row.cells[0].value
    address = row.cells[1].value
    phone = row.cells[2].value
    persons.append(Person(name, address, phone))

What's your sense of the typical range of use cases?

staale · 2013-12-19T08:43:28Z

Your proposal for workbook.sheets[] seems a good option as well. But I would think both could be used, so you could implent __getitem__in workbook as:

def __getitem__(self, workbook):
    return self.sheets[workbook]

Having 1 getitem wouldn't exclude separate properties. I don't know what is more pythonic, I am just thinking about the simplest case possible.

If you had excel style syntax, your for loop could be rewritten as:

persons = [Person(name, address, phone) for name, address, phone in sheet["A1:C1000"]]

Well, being able to write the above would be cool. sheet["A1:C1000"] gets a 2 dimensional array that we explode into fields. So the entire thing can be written as 1 for comprehension. Makes for a neat and powerfull API. Though perhaps both usage patterns need to be supported, for easy transition.

steph-ben · 2014-07-10T16:27:21Z

Dear all,

Thanks a lot for all your work regarding the docx project and this one.

In case you haven't seen, there is already another python project regarding xlsx files www.python-excel.org
I'm currently using the xlrd package for reading Excel files, and it works really good. I've never try the writing package through.

I'm not owner or developer of this packages, just a user that thought that could help your work here.

Cheers,
Stephane

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyPi conflict #1

PyPi conflict #1

staale commented Nov 21, 2013

scanny commented Nov 21, 2013

chrisgilmerproj commented Dec 16, 2013

scanny commented Dec 16, 2013

chrisgilmerproj commented Dec 16, 2013

staale commented Dec 17, 2013

scanny commented Dec 17, 2013

scanny commented Dec 18, 2013

staale commented Dec 18, 2013

scanny commented Dec 19, 2013

staale commented Dec 19, 2013

steph-ben commented Jul 10, 2014

PyPi conflict #1

PyPi conflict #1

Comments

staale commented Nov 21, 2013

scanny commented Nov 21, 2013

chrisgilmerproj commented Dec 16, 2013

scanny commented Dec 16, 2013

chrisgilmerproj commented Dec 16, 2013

staale commented Dec 17, 2013

scanny commented Dec 17, 2013

scanny commented Dec 18, 2013

staale commented Dec 18, 2013

scanny commented Dec 19, 2013

staale commented Dec 19, 2013

steph-ben commented Jul 10, 2014