Digitalcourage @digitalcourage

**Computo** @computo@mathstodon.xyz · Jul 15

Summer read: a new paper on model-based clustering just appeared in Computo!

Julien Jacques and Brendan Thomas Murphy publish a new method for clustering multivariate count data. The method combines feature selection and clustering, and is based on conditionally independent Poisson mixture models and Poisson generalized linear models.

On simulations, the Adjusted Rand Index (ARI) of the model with selected variables is close to the optimal ARI obtained with the true clustering variables.

The paper and accompanying R code are available at https://computo-journal.org/published-202507-jacques-count-data/

#machineLearning #clustering #Rstats

**Fabrice Tshimanga** @fabrice13@neuromatch.social · Mar 27

Mar 27

Fabrice Tshimanga @fabrice13@neuromatch.social

Exciting news, our paper is out!

"Behavioral Clusters and Lesion Distributions in Ischemic Stroke, Based on NIHSS Similarity Network" on Springer Journal of Healthcare Informatics Research https://rdcu.be/efgma

With my co-first-author Andrea Zanola and co-authors, we explore the relations between behavioral measures of impairment after stroke, and the underlying brain lesions.
Rather than focusing on covariances at the population level, we first cluster individual behavioral phenotypes, and then explore the typical and significant lesions of each cluster.

Our technique, Repeated Spectral Clustering is performed on a similarity network (derived from the General Distance Measure, handy for ordinal scales!), and the partitions are statistically robust thanks to the aggregation of results from multiple random initializations.

We end up with 5 clusters, 3 of which show reknown principal components of deficits (Left Motor, Righ Motor, Language), and their associate lesions.

Interestingly, this multi-item and multimodal approach allows to distinguish different etiologies for the same deficits, thanks to their different behavioral associations, and the different lesions characterizing each cluster. Even when the single NIHSS measure is a bit "vague"...

We hope that popularizing the General Distance Measure, Repeated Spectral Clustering and this clustering perspective aside of PCA / CCA studies can inspire multimodal approaches in other neuroscientific and biomedical domains!

Many thanks to our co-authors, Antonio Luigi Bisogno, Silvia Facchini, Lorenzo Pini, Manfredo Atzori and Maurizio Corbetta for data, analytic and medical insights, and their guidance throughout the whole process!

#stroke #neuroscience #clustering

**Greg Cocks** @GregCocks@techhub.social · Feb 26

Feb 26

Greg Cocks @GregCocks@techhub.social

A Methodology For The Multitemporal Analysis Of Land Cover Changes And Urban Expansion Using Synthetic Aperture Radar (SAR) Imagery - A Case Study Of The Aburrá Valley In Colombia
--
https://doi.org/10.3390/rs17030554 <-- shared paper
--
#GIS #spatial #mapping #SyntheticApertureRadar #SAR #remotesensing #multitemporalanalysis #landcover #landcoverchange #clustering #kurtosis #fuzzylogic #kernelbasedmethod #machinelearning #spatialanalysis #spatiotemporal #geostatistics #model #modeling #AburráValley #Columbia #urban #urbanexpansion #population #growth #topography #monitoring #satellite #sentinel #valley #landuse #distribution #infrastructure #building #roads #naturalresources #environmental #conservation #monitoring #multitemporal

Replied to JuliaR

**devSJR** @devSJR@fosstodon.org · Feb 4

Feb 4

devSJR @devSJR@fosstodon.org

@jromanowska

#Rstats #PeerReview #softwaredevelopment #OpenSource #programming #clustering #DataScience #Bioinformatics

I've just got the okay from my colleague. He will get in contact with you soon.

I will also try to help you out. I will also get in contact with you. By the way, I'm registrated as a potential reviewer for JOSS for quite some time.

Replied to JuliaR

**devSJR** @devSJR@fosstodon.org · Feb 3

Feb 3

devSJR @devSJR@fosstodon.org

@jromanowska

I am occupied at the moment, but if you do not find somebody within a reasonable time, I will offer my help or try to ask a colleague of mine.

#Rstats #PeerReview #softwaredevelopment

**JuliaR** @jromanowska@fosstodon.org · Feb 3

Feb 3

JuliaR @jromanowska@fosstodon.org

Hi all #Rstats enthusiasts!
I'm looking for someone who has time now to conduct a review of a piece of software for Journal of Open Source Software (JOSS). Details are here:
https://github.com/openjournals/joss-reviews/issues/7319

The review process is quite simple - you get a checklist and you run some tests. It's all open, on GitHub.

GitHub[REVIEW]: corrp: An R package for multiple correlation-like analysis and clustering in mixed data · Issue #7319 · openjournals/joss-reviewsBy editorialbot

#PeerReview #softwaredevelopment #OpenSource

**Barry Schwartz** @rustybrick@c.im · Dec 6, 2024

Dec 6, 2024

Barry Schwartz @rustybrick@c.im

How clustering works with localization in Google Search https://www.seroundtable.com/google-search-clustering-localization-38531.html

#google #seo #localizations

**Barry Schwartz** @rustybrick@c.im · Dec 6, 2024

Dec 6, 2024

Barry Schwartz @rustybrick@c.im

Google on the difference between clustering and canonicalization: "Clustering is basically taking the pages that we think are the same. And then canonicalization is, from those pages, which one is the best one" @johnmu said https://www.seroundtable.com/google-search-clustering-canonicalization-38529.html

#seo #google #canonicalization

Replied in thread

**Kevin Karhan** @kkarhan@infosec.space · Nov 24, 2024

Nov 24, 2024

Kevin Karhan @kkarhan@infosec.space

@ai6yr @dthacker9 @fuchsiii I just found them cheap as surplus - there are also others from Dell (WYSE), Fujitsu (Futro) & IGEL.

Basically almost all of them are cheap (like €50 at most, sometimes <€10 in a 10-pack lot) and fanless, so ideal to do some #BareMetal #clustering or just to have chugging along silently in the background...

**Greg Cocks** @GregCocks@techhub.social · Oct 21, 2024 *

Oct 21, 2024 *

Greg Cocks @GregCocks@techhub.social

Stanford Researchers Map ‘White-Only’ Properties In Santa Clara Co. Using AI [ historic deeds / covenants ]
--
https://www.kron4.com/news/bay-area/stanford-researchers-map-white-only-properties-in-santa-clara-co-using-ai/ <-- shared media article
--
https://dho.stanford.edu/wp-content/uploads/Covenants.pdf <-- shared research
--
https://reglab.github.io/racialcovenants/static/maps/dotmap_lot_level.html <-- link to shared webmap
--
#GIS #spatial #mapping #California #deeds #property #racial #racism #redlining #covenenants #race #minorities #propertyrecords #discrimination #history #historical #USHistory #legalreform #records #AI #machinelearning #openlargelanguagemodel #model #modeling #geography #clustering #demographics #spatialanalysis #spatiotemporal

Replied in thread

**Kevin Karhan** @kkarhan@infosec.space · Oct 17, 2024

Oct 17, 2024

Kevin Karhan @kkarhan@infosec.space

@perry_mitchell I'd avoid not just #SMR but all #Helium-filled drives as a matter of principle.

Also isn't #UnRaid that weird #KVM-Distro?

I mean, I know #trueNAS SCALE & #ProxMox doing #ZFS + #Ceph for #clustering and #redundancy...

**Daniel Pomarède** @pomarede@mastodon.social · Oct 8, 2024

Oct 8, 2024

Daniel Pomarède @pomarede@mastodon.social

in the #arXiv

2D watershed void clustering for probing the cosmic large-scale structure

by Yingxiao Song and co-authors
https://arxiv.org/abs/2410.04898

#Cosmology #universe #voids

Continued thread

**Harald Klinke** @HxxxKxxx@det.social · Aug 26, 2024

Aug 26, 2024

Harald Klinke @HxxxKxxx@det.social

Two great sources to explore the use of pan and zoom techniques in data visualization:

1. Shneiderman's "information-seeking mantra" emphasizes the importance of overview, zoom, and filter in exploring data clusters.
https://infovis-wiki.net/wiki/Visual_Information-Seeking_Mantra
2. "Zoomland" (de Gruyter, 2023), edited by Armaselu and Fickers, offers insights on zooming in data visualization.
https://www.degruyter.com/document/doi/10.1515/9783111317779/html

infovis-wiki.netVisual Information-Seeking Mantra - InfoVis:Wiki

#DataViz #KenBurnsEffect #Clustering

**Richard R Lee** @InfoMgmtExec@mastodon.social · May 22, 2024

May 22, 2024

Richard R Lee @InfoMgmtExec@mastodon.social

Rest in Peace old mate. C. Gordon Bell was a pioneer in Computer Scalability. He took the notion of #Clustering from its infancy to productization. All Computer Architects have stood on his shoulders over the past decades. #Scalability #Clusters #DEC.
An #AmericanTreasure in the #Engineering domain. #GordonBell.

https://www.nytimes.com/2024/05/21/technology/c-gordon-bell-dead.html?smid=url-share

The New York Times · May 21, 2024C. Gordon Bell, Creator of a Personal Computer Prototype, Dies at 89By Glenn Rifkin

**Kevin Karhan** @kkarhan@infosec.space · May 3, 2024

May 3, 2024

Kevin Karhan @kkarhan@infosec.space

@puppygirlhornypost well, AFAICT from people who used #DragonflyBSD (like @fuchsiii ) it's optimized for #Clustering with #HAMMER & #HAMMER2 filesystem as well as #LWKT which do allow higher throughput that scales I/O and network across multi-socket and -threaded architectures...

https://en.wikipedia.org/wiki/DragonFly_BSD

en.wikipedia.orgDragonFly BSD - Wikipedia

Continued thread

**Fabrice Tshimanga** @fabrice13@neuromatch.social · Nov 9, 2023

Nov 9, 2023

Fabrice Tshimanga @fabrice13@neuromatch.social

5/5

Our dataset comprises also CT and MRI scans with patients lesions segmented by an expert.
This allowed us to look at the distribution of lesions cluster-wise, and validate the associations between symptoms and lesions.

Check our pre-print and comment, make questions, offer suggestions!
Although it is not simple to share data, we will release code soon, as a means to replicate the approach on similar data and more.
The link is already in the paper!
And let us know if you have data you'd like to share and analyse with our developing methods

We are deciding on the best match for a journal to review and possibly publish this work, of which I am super proud and thankful to co-authors Andrea Zanola, Antonio Bisogno, Silvia Facchini, Lorenzo Pini, Manfredo Atzori, and Maurizio Corbetta!

#scicomm #paperthread #preprints

Continued thread

**Fabrice Tshimanga** @fabrice13@neuromatch.social · Nov 9, 2023

Nov 9, 2023

Fabrice Tshimanga @fabrice13@neuromatch.social

4/n

Reverting our General Distance matrix into the General Similarity matrix yields an ambiguous spectrum, whose eigenvalues do not help to determine the number of clusters in the data.
But repeating clustering and tracing which subjects consistently get clustered together, actually yields the right information, encoded in a co-occurrence matrix.
This latter is quite evidently composed of 5 main clusters.
Our second approach, affinity propagation, found autonomously 7 clusters, that are mainly finer grained partitions of the former 5.

#machinelearning #clustering

Continued thread

**Fabrice Tshimanga** @fabrice13@neuromatch.social · Nov 9, 2023

Nov 9, 2023

Fabrice Tshimanga @fabrice13@neuromatch.social

3/n

We thus decided to use the General Distance Measure to compute pairwise similarities between our 172 subjects, and obtained a matrix, which as math savy people know, is also the description of a network (an "adjacency matrix" for a "weighted undirected graph").
The problem was then to find cliques, communities or clusters of similar patients in such a network, and we used spectral clustering.
Spectral clustering is a family of techniques that use spectra of matrices describing networks, i.e. use eigenvalues of matrices to understand the structure of those networks.

#spectralanalysis #spectralclustering #clustering

**Fabrice Tshimanga** @fabrice13@neuromatch.social · Nov 9, 2023

Nov 9, 2023

Fabrice Tshimanga @fabrice13@neuromatch.social

1/n
Our pre-print is finally out!
Here's my first #paperthread
In this work, co-authors and I clustered ischaemic stroke patients profiles, and recovered common patterns of cognitive, sensorimotor damage.

...Historically many focal lesions to specific cortical areas were associated with specific distinction, but most strokes involve subcortical regions and bring multivariate patterns of deficits.
To characterize those patterns, many studies have turned to correlation analysis, factor analysis, PCA, focusing on the relations among variables==domains of impairments...

https://www.medrxiv.org/content/10.1101/2023.11.08.23297808

medRxiv · Nov 9, 2023Behavior Clusters in Ischemic Stroke using NIHSSBACKGROUND Stroke is one of the leading causes of death and disability. The resulting behavioral deficits can be measured with clinical scales of motor, sensory, and cognitive impairment. The most common of such scales is the National Institutes of Health Stroke Scale, or NIHSS. Computerized tomography (CT) and magnetic resonance imaging (MRI) scans show predominantly subcortical or subcortical-cortical lesions, with pure cortical lesions occurring less frequently. While many experimental studies have correlated specific deficits (e.g. motor or language impairment) with stroke lesion locations, the mapping between symptoms and lesions is not straightforward in clinical practice. The advancement of machine learning and data science in recent years has shown unprecedented opportunities even in the biomedical domain. Nevertheless, their application to medicine is not simple, and the development of data driven methods to learn general mathematical models of diseases from healthcare data is still an unsolved challenge. METHODS In this paper we measure statistical similarities of stroke patients based on their NIHSS scores, and we aggregate symptoms profiles through two different unsupervised machine learning techniques: spectral clustering and affinity propagation. RESULTS We identify clusters of patients with largely overlapping, coherent lesions, based on the similarity of behavioral profiles. CONCLUSIONS Overall, we show that an unsupervised learning workflow, open source and transferable to other conditions, can identify coherent mathematical representations of stroke lesions based only on NIHSS data. ### Competing Interest Statement The authors have declared no competing interest. ### Funding Statement This work was supported by the Department of excellence 2018-2022 initiative of the Italian Ministry of education (MIUR) awarded to the Department of Neuroscience-University of Padua. ### Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes The details of the IRB/oversight body that provided approval or exemption for the research described are given below: For data of patients of the Saint Louis cohort: the Internal Review Board of Washington University School of Medicine (WUSM) gave ethical approval for this work. For data of patients of the Padua cohort: the Ethics Committee of the Azienda Ospedale Universit&agrave Padova (AOUP) gave ethical approval for this work. I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable. Yes Data can be made available upon reasonable request to Maurizio Corbettta at maurizio.corbetta{at}unipd.it. * AP : Affinity Propagation. GDM : General Distance Measure. GSM : General Similarity Measure. NIHSS : National Institutes of Health Stroke Scale. RSC : Repeated Spectral Clustering.

#stroke #neuroscience #machinelearning

**Sven Lieber** @SvenLieber@hcommons.social · Oct 24, 2023

Oct 24, 2023

Sven Lieber @SvenLieber@hcommons.social

Hey #library folks ,

do you want to cluster your book editions with the well-known Work-set algorithm from #OCLC, but you don't find a suitable reusable tool?

I recently faced this issue while working on the #BELTRANS project at KBR (Royal Library of Belgium). All I found were many research papers describing the clustering and a few implementations that required me to install 2010-style Java software stacks.

So I decided to write an easily reusable small #Python script that follows the ideas of the Work-set algorithm: clustering based on descriptive keys. Nothing more, nothing less.

Check my blog post for more information and have a look at the script.

blog post: https://doi.org/10.59350/4hd4r-1tk44

script: https://doi.org/10.5281/zenodo.10011416

Sven Lieber · Oct 16, 2023Clustering Book editions | Sven LieberWhat do the books “The invention of Nature” and “De uitvinder van de natuur” have in common? Well, they are both different versions of the same work “The invention of nature” by Andrea Wulf, whether it is in a different format or a different language. In this blog post I will briefly introduce the advantages of keeping work-level records in library catalogs. Furthermore, I will introduce a fast Python implementation (DOI: 10.5281/zenodo.10011416) which we used in the BELTRANS project to identify the works in a corpus of book translations #FRBRization.

#FRBRization #FRBR #IFLA

Recent searches

Search options

Administered by:

Server stats:

#clustering